A barebones CPU-only inference engine for Qwen 3, written from scratch in pure C
Evolving story · 1 updatesQwen 3 CPU-only inference engine in pure CTimeline →A minimal CPU-only inference engine for Qwen 3 (≤4B) has been released as a pure C implementation with minimal dependencies, targeting local LLM enthusiasts.
TL;DR: The (very messy) code and writeups can be found at https://github.com/jakint0sh/qwen3-engine
Read the README for instructions on how to get started.
And for those who just want a bulleted list: - Inference engine for Qwen 3 sizes 4B and below - Written from scratch in pure C - No dependencies except libc, libm, and cJSON (and OpenMP if compiled with parallelization) - Loads directly from
Source: A barebones CPU-only inference engine for Qwen 3, written from scratch in pure C. Read the full piece at the source.
Summary and analysis generated by AI (mistral). Always verify against the original sources.

Suno launches Spark incubator program to feed independent artists to its AI machine

Ornith-1.0-35B GGUF update: native MTP speculative-decode graft + full serving/TTFT/long-context numbers (llama.cpp, tp=1)

DeepSpec - a deepseek-ai Collection
