News | drihu.com

By PranoyP, 9 hours ago

13 comments

By jlukecarlson, 3 hours ago

I appreciate the details shared in this paper but it'd be great if they open sourced their implementation!

By mlop99, 8 hours ago

Curious if the behaviour driven testing can be done by another LLM agent (or a group of agents) - one LLM agent testing another. Could lead to a self-improving loop?

6 hours ago

[deleted]

6 hours ago

[deleted]

By shailendra145, 8 hours ago

A powerful move beyond benchmarks — this paper redefines LLM evaluation through realistic, behavior-driven testing.

By papz2k, 8 hours ago

Very interesting work.

By raj_maddipati, 5 hours ago

Excellent work

By harshv_03, 6 hours ago

Interesting

By ankush9812, 8 hours ago

Nice Work

By ashyash518, 8 hours ago

Nice work

By saurabh_xen, 8 hours ago

Great work

By quanta9, 8 hours ago

interesting

By cs_exps, 6 hours ago

[dead]

Testing LLM Agents Like Software – Behaviour Driven Evals of AI Systems