Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is a lower bar (that gets lower over time), but ime, the config you are describing is too low still.

qwen/gemma in the 27/35B range @fp8 are better than gemini-2.5, but less than gemini-3.1, you can run DS4-flash @fp8 on two DGX spark, and things keep becoming better. DiffusionGemma came out recently with 4x token gen speeds.

tl;dr - the models you appear to be trying with are too small or too quant'd

 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: