placeholder

The GPT-4 barrier has finally been broken

Four weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of “vibes”. Almost everyone investing serious …

Click to view the original at simonwillison.net

Hasnain says:

Excited to try this out. Some of the new fuzzing results people posted today are mind blowing.

“Claude 3 Opus, March 4th. This is just a few days old and wow: the vibes on this one are really strong. People I know who evaluate LLMs closely are rating it as the first clear GPT-4 beater. I’ve switched to it as my default model for a bunch of things, most conclusively for code—I’ve had several experiences recently where a complex GPT-4 prompt that produced broken JavaScript gave me a perfect working answer when run through Opus instead (recent example). I also enjoyed Anthropic research engineer Amanda Askell’s detailed breakdown of their system prompt”

Posted on 2024-03-09T07:39:55+0000