Which AI model should I use?

I had dinner with a friend of mine on Friday who asked me which AI model he should use going into his new job. My response was very much ‘whatever works for you’ as my opinion is that the foundational models are all very similar.

But opinions should be based on fact, otherwise they’re too subjective to be worthy of any merit, so I went away and did some actual research using benchmark data, and here’s my more informed response to that question Ken!

Using the hypothesis that all models are fundamentally similar, it’s the jobs they’re used for that might create a different result and/or experience.

Looking at the benchmark data from the Microsoft Foundry, there are three areas that stand out when looking at evidence to challenge this hypothesis.

Question answering, reasoning and quality.

Great question answering and reasoning are essential for the quality, trustworthiness, and efficiency of AI in business settings. Without them, AI can’t deliver meaningful, actionable insights and companies risk making bad decisions or having to revert to manual work.

You’ll see the indifference in all three graphs between the models.

The one major stand-out difference is when you bring in the graph comparing quality to cost. There are such marginal differences in the quality of these models, however it’s the difference in cost that is huge.

This immediately brings to bear the potential additional costs businesses face when using these models and the justification from a value perspective. Is your business willing to pay double the price for roughly the same outcome? Do CFOs know that Claude costs twice as much as OpenAI for example?

This is where user experience, design and service delivery come into the equation. Where more context is needed, what purpose are you using it for? how easy is to get to the desired result from start to finish?

Like if you went to buy a car and said, ‘I need the best car’ and the dealer said, ‘well take this sportscar, it’s the best!’. But you need the best car to get you home, and you live on top of a mountain with the only access is a really bumpy dirt road. So the best car for your journey isn’t necessarily the best for someone else.

If we were going to use AI to do some coding (which we do), we’d perhaps use Claude Code. Why, well honestly when it comes to the job of coding, the UI is better for a developer, it’s IDE integrated, etc., so for that specific job, that end-to-end product (not just model) fits the use. Even though based on the benchmark report the GPT-5.2 model is marginally better at coding.

And this is also why the ‘SaaS is dead’ comments are just such bollocks. Purpose-based and productised solutions that fit the job in hand will get used and adopted than generic capability.

The other thing to think about is how you’re hearing about what’s good and what’s bad. Maybe posts like this one (although I hope this is a bit more well-rounded), or individual contributors who perhaps don’t understand the fundamentals of the tech, who could be drinking the Kool aid because they’ve been given free access. You have to remember we’re all being sold to all of the time and I think some of the big AI companies are doing a better job of that than others.

I don’t think the right debate to have is just model vs model. It’s based on use case, what would work best for me, what would become part of my workflow, my rituals that would make me question why I would ever need to switch, regardless of what I use it for.

I don’t think anyone is at a disadvantage using one model over another. The UI/UX design, integrations and workflows in the versions of these ‘products’ are where the advantages could lie.

So I guess my answer to the original question is still, it depends on what you’re using it for, and it depends what’s available to you.

Hope that helps Ken?