I think gemma-4-26b-a4b and Qwen3.6-35B-A3B show that there's something very interesting about a local model that does mixture-of-experts (which helps a lot with performance) and has in the order of 30 billion parameters.
These models are very capable, and use around 20-30GB of RAM while they are running.
Provided you have 64GB of RAM that leaves space for running other applications at the same time.
Second this notion. After picking up an OEM Spark and running qwen36moe/dense, I was thoroughly impressed with what such small models can do and the (reasonable) speeds you can get. I'm back to using open weight models via an API (wanted more capability for the time being), but will be getting more hardware soon (re: ds4-flash and the fable shot heard round the world)
These models are very capable, and use around 20-30GB of RAM while they are running.
Provided you have 64GB of RAM that leaves space for running other applications at the same time.