URL:
notdian.github.io
1 comments
Vibecoded this after seeing models do amazing things but still drift on simple recursive steps; tracks exact match, answer accuracy, prefix correctness. Feedback welcome.
Vibecoded this after seeing models do amazing things but still drift on simple recursive steps; tracks exact match, answer accuracy, prefix correctness. Feedback welcome.