Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When you share them, please also share the setup for people to easily rerun them. Nearly every eval I've seen shares the llm session transcript but not the actual harness setup etc. that they used.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: