Vienna startup Ora Computing raised €3.5M and proved a 70-billion-parameter large language model can be compressed for under ...
LCLMs compress LLM context before decode — 8.8x faster at 16x compression, beating every KV cache method tested. Open-sourced by NYU and Columbia.
Ora Computing is developing software that makes AI models smaller, faster, and more efficient. Its technology helps reduce ...
Multiverse Computing SL, a startup with technology that reduces the hardware footprint of artificial intelligence models, is reportedly raising new capital. Sources told Bloomberg today the Spanish ...
The new open reasoning model delivers 30B-class intelligence in a 16B-parameter footprint, with 3.1B active parameters, validated independently on NVIDIA accelerated computing infrastructure.
Even if you don’t know much about the inner workings of generative AI models, you probably know they need a lot of memory. Hence, it is currently almost impossible to buy a measly stick of RAM without ...