SWE-Bench Failures: When Coding Agents Spiral into 693 Lines of Hallucinations

By landonxi, 2 hours ago

URL: surgehq.ai

1 comments

By egillie, 2 hours ago

Is this because GPT-5 hallucinates less in general?