I’ve been reading rave reviews about OpenAI’s latest model o3-mini.
Any plans to add it to Cody soon?
I’ve been reading rave reviews about OpenAI’s latest model o3-mini.
Any plans to add it to Cody soon?
Hey @Halnex
We always consider new models after benchmarking and evaluating them with our own metrics.
It is easy to follow the hype, but we need to ensure to stay on enterprise-level grade implementations and reliability.
Fair enough. I tested it last night using direct API calls on a project I’m working on using PHP and it couldn’t even get the syntax right. I’ll stick with Claude Sonnet 3.5 for now.
I used the model o3-mini
but it’s not clear whether this is o3-mini-low
or o3-mini-high
.
According to this benchmark, o3-mini-high
obliterates every other models in “Coding Average”.
As far as I know, the reasoning mode is set to medium. So o3-mini should now in-between in the benchmarks.
@Halnex any feedback so far?
@jdorfman I’ve only used it a few times when Claude 3.5 Sonnet got stuck on a feature or bug. I found that o1 and o3 models talk too much and they’re unusable for prolonged coding tasks, at least for me.
I wouldn’t replace Claude especially now that we have 3.7!
This is not to downplay o1 or o3’s reasoning abilities but I think Claude 3.7 beats them by miles now.