Anthropic's Claude Sonnet 4.5 now scores 77% on a key software engineering benchmark and can work autonomously for over 30 hours on complex tasks. We put it to a first test. https://decrypt.co/342055/anthropic-claims-best-coding-model-world-claude-sonnet-4-5-tested