SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations

URL: surgehq.ai
1 comments

Is this because GPT-5 hallucinates less in general?