AI Showdown: Comparing GPT-5, Gemini, Claude, and Copilot on Complex Explanations

To assess the capabilities of OpenAI’s newly released GPT-5 alongside competing artificial intelligence models, researchers devised a series of challenges ranging from standardized test questions to mathematical puzzles. The ultimate goal was to identify strengths and weaknesses within platforms including Google’s Gemini 2.5 Flash, Anthropic’s Claude Sonnet 4, and Microsoft Copilot.
Initial tests involving complex SAT sentence completion exercises and unsolved math theorems proved too straightforward for the AI chatbots, with all systems quickly producing correct responses. A classic brain teaser—”19 people get off a train at the first stop. 17 people get on the train. Now there are 63 people on the train. How many people were on the train to begin with?”—was similarly solved instantly by each platform, yielding the answer of 65.
ChatGPT (powered by GPT-5), Gemini, Claude, and Copilot responded swiftly with accurate information, though their approaches differed significantly. ChatGPT’s response included an image lacking context and potentially introducing unfamiliar terminology such as “atoms.” Gemini’s explanation was deemed excellent, particularly for its oral delivery style and the inclusion of a charming illustration depicting atoms in a hug. Microsoft’s Copilot—believed to be utilizing GPT-4—surprisingly demonstrated greater clarity in explaining core concepts and provided an effective analogy (“It’s like trying to bake cookies without turning on the oven”). However, it could not generate inline illustrations.
Anthropic’s Claude AI emerged as the standout performer. While its text explanation was comparable to Gemini’s or Copilot’s quality, Claude uniquely produced what Anthropic calls an “Artifact”—an instantly shareable application designed specifically for explaining “Cold Fusion for Kids.” This functionality was generated without a specific request, highlighting a unique capability not observed in competing platforms.