Quantization - Search News

1mon

Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

Using special tags embedded in the output, the model directly links every factual claim it makes to the specific source document or database row it pulled the information from.

Network World

Tether is shipping TurboQuant KV-cache quantization with Vulkan support into its QVAC SDK

Tether successfully integrated Google’s TurboQuant into the inference engine of its local AI framework, QVAC. It is the ...

Nature

A dichotomy color quantization algorithm for the HSI color space

Color quantization is used to obtain an image with the same number of pixels as the original but represented using fewer colors. Most existing color quantization algorithms are based on the Red Green ...

18d

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

You can now download Gemma 4 models with quantization-aware training to reduce the amount of mobile memory required to 1GB.

12d

Nota AI Has Two MoE Quantization Papers Accepted at ICML 2026 Workshop, Demonstrating Global Competitiveness in Large-Scale AI Optimization

Nota AI, a company specializing in AI model compression and optimization, announced that two of its papers on MoE-specific ...

Nature

Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors

With edge computing, real-time inference of deep neural networks (DNNs) on custom hardware has become increasingly relevant. Smartphone companies are incorporating artificial intelligence (AI) chips ...

Forbes

Show inaccessible results

Cohere cracks lossless quantization and native citations with first full Apache 2.0 licensed open model Command A+

Tether is shipping TurboQuant KV-cache quantization with Vulkan support into its QVAC SDK

A dichotomy color quantization algorithm for the HSI color space

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

Nota AI Has Two MoE Quantization Papers Accepted at ICML 2026 Workshop, Demonstrating Global Competitiveness in Large-Scale AI Optimization

Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors

How Mixed-Precision Quantization Could Break AI’s Power Addiction

What is model quantization? Smaller, faster LLMs

Elastic Introduces Better Binary Quantization Technique in Elasticsearch

Xiaomi MiMo Is Now 15x Faster Than ChatGPT: Here's What That Actually Means