Coding Decoding Reasoning

NVIDIA Diffusion LLM Hits 2.42x Throughput Without Retraining: Nemotron TwoTower Released

NVIDIA diffusion language model Nemotron TwoTower achieves 2.42x LLM inference throughput without a full retraining run, ...

DeepSeek open sources DSpark, a new framework to speed up LLM inference by up to 85%

DSpark can make decoding faster, but acceptance quality still determines how much speed the system actually realizes.

What is GLM-5.2: China’s AI model challenging Anthropic’s Claude Fable 5 in coding and long-context reasoning

In recent days, a new large language model from China has started circulating through technical circles with an unusual mix ...

Developer Tech

NVIDIA: DFlash block diffusion accelerates autoregressive LLMs

Deploying DFlash block diffusion on NVIDIA hardware accelerates autoregressive LLMs during latency-sensitive inference.

Developer Tech

What is GLM-5.2? Z.ai targets coding agents

Z.ai’s GLM-5.2 is an open-source model aimed at long-context coding-agent workflows, with support for a one million-token ...

techtimes

Speculative Decoding Bottleneck Broken: DFlash Hits 15x on Blackwell GPUs

Large language models have a speed problem that goes beyond raw hardware. Even on the fastest GPUs available, the standard autoregressive loop — generate one token, wait, generate the next — leaves ...

15d

Z.ai pitches GLM-5.2 for long-running software engineering tasks

The open-source model combines a one-million-token context window with architectural updates aimed at lowering the cost of repository-scale AI coding.

Explained: How China is narrowing the AI gap with the US one model at a time

Just when the AI industry’s attention seemed fixed on OpenAI, Google and Anthropic, a new Chinese model has stolen the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results