μQ MicroQuant Request Audit →
Last-Mile TinyML Compiler Engineering · Co-Engineering

Your model.
Our compiler.
Their microcontroller.

The specialist team you call when generic edge-AI tools miss your flash, SRAM, latency, or firmware-integration budget. Our production quantization backend today is SPQ4 — scale-protected INT4 — with a zero-heap runtime and an explicit, capability-checked kernel architecture. The playground below runs the real SPQ4 pipeline in your browser: a live sample of the engineering, not a self-serve production compiler.

80%+measured flash reduction vs FP32
0.00 KBdynamic heap used
<1%fixed-point L1 error (host harness)
bit-exactRTOS async parity
Live Showcase

The SPQ4 Compiler Playground

Upload a weights file — or synthesize one — and watch the real pipeline run: scale-protected INT4 quantization, SRAM tile solving, dead-block sparsity analysis, and production C++ codegen. Every number below is computed live from your actual data.

1 · Model Ingestion

Upload raw float32 weights, or configure a synthetic layer.

OR SYNTHESIZE
One fixed-point scale (m0, nb) per block
Drives the 2D weight-tile solver
30%
Zero blocks compile to m0 = 0 and are skipped free

2 · Quantization Map

Live block Dead block (skipped) Clipped outlier

Hover any weight to inspect its INT4 code, block scale, and packed byte.

Nibble Packing (2 × INT4 → 1 byte)

w0
····
+
w1
····
Packed Byte
········
Compiled flash
vs FP32
Dead blocks skipped
SRAM tile
MACs / inference
Reconstruction RMSE

3 · Generated C++ Assets

model_assets.h — flash-aligned static arrays. An evaluation artifact from this browser demo: not calibrated against your dataset and not validated on your silicon. Production headers come from the full toolchain under an engagement's parity/HIL gates.


// Run the SPQ4 compiler to generate C++ assets...
                        

This is a browser demo of the SPQ4 pipeline, not a production compiler. Raw float32 weights are quantized live; ONNX/TFLite uploads synthesize a representative dense layer client-side (real graph parsing runs in the server-side toolchain, on a supported operator subset — see the support matrix). In an engagement we run the full compiler against your actual model, calibrate against your dataset, and produce validated target evidence.

Scope a paid audit →
What You're Hiring

Compiler Engineering, Not a Boxed Tool

SPQ4 — Scale-Protected Quantization, 4-bit — is one method engineered end-to-end for microcontrollers. We adapt it to your hardware.

🎯

Scale-Protected INT4

Per-block MSE-optimal clip search. Outliers saturate at the INT4 rail instead of destroying scale resolution for every neighboring weight. Two weights per byte — 80%+ smaller than FP32 end-to-end, measured.

🧮

FPU-less, Division-free

Fixed-point requantization (acc·m0) >> nb with round-to-nearest. The hot loop is integer MACs, bitwise masks, and shifts — nothing else. Designed for Cortex-M0-class cores (no FPU, no DSP extension required); the portable path is the always-safe fallback.

🧊

Zero-Allocation Runtime

Header-only C++11, standard library only. Every buffer statically allocated: 0.00 KB heap, no arena, no fragmentation, deterministic latency. Weights execute in place from flash.

⏱️

RTOS Cooperative Executor

Tile-by-tile stepping with ping-pong SRAM buffers and DMA-style prefetch. Designed for cooperative bounded-step integration so realtime threads yield cleanly — and stays bit-identical to blocking execution (verified on the host harness).

🕳️

Free Structural Sparsity

All-zero weight blocks compile to m0 == 0 and are skipped with zero mask storage. The skip pattern is a compile-time constant — timing stays input-invariant for WCET analysis.

📐

SRAM-Perfect Tiling

The compiler solves 2D tile geometry against your SRAM/TCM budget and serializes weights in streaming order, so every tile DMA-transfers whole into a tightly-coupled buffer.

🏎️

Explicit Kernel Selection

Operator kernels, hardware backends, and quantization schemes are three separate capability-checked registries. The ISA inner MAC is isolated behind one boundary: portable SWAR everywhere, an in-tree Cortex-M4/M7 __SMUAD backend (compiled + parity-tested), with Helium/MVE, RISC-V P/V, and Xtensa modeled as roadmap backends delivered per engagement.

🧾

Verification Evidence

Every delivery ships with a parity harness: bit-exact async-vs-blocking checks, fixed-vs-float error bounds, footprint and alignment audits you can cite in a safety file.

Production Truth

Support Matrix

Exactly what the toolchain compiles, runs, and can prove today — and what it does not. This mirrors docs/support_matrix.md in the repository; nothing on this site exceeds it. Gaps are additive registry extension points, not a different product.

Supported production path, tested Partial MVP / placeholder / host-only Roadmap modeled extension point, not in-tree Not yet explicitly rejected

Operators

Dense / Gemm / MatMul (constant weights, bias, transB)Supported
ReLU / Clip(clamp) fused as dense/conv activationsSupported
Identity / Flatten / Reshape (shape-safe)Supported
Conv2D — 1×1 pointwise & 3×3, stride 1/2, same/validCNN MVP
Depthwise Conv2D 3×3 (channel multiplier 1)CNN MVP
MaxPool / AvgPool 2×2 s2 · GlobalAveragePoolCNN MVP
Softmax (argmax / top-k on int32 logits)Placeholder
Attention / transformer, sigmoid / tanh / GELURejected
CNN MVP is correctness-first with host-sim ↔ C++ bit-exact parity (incl. padding/stride variants); NHWC only, static shapes.

Quantization Backends & Import Formats

SPQ4 — INT4, scale-protected, fixed-point m0/nb, dead-block skipCurrent backend
Additional quantization schemes (registry is scheme-blind)Additive
Raw float32 weights .bin / .rawSupported
ONNX — Gemm/MatMul/Relu/Clip/Identity/Flatten/Reshape (strict subset)Supported
cnn-json explicit NHWC specSupported
ONNX Conv → NHWC conversionRoadmap
TFLite ingestionNot yet
PyTorch / Keras (via export to ONNX)Indirect
SPQ4 is the current registered production backend, not a permanent architectural assumption. Unsupported ops fail with a structured error before codegen — no synthetic fallback in production mode.

MCU Families & Boards · ISA Features

generic-cortex-m7 (ARMv7E-M · arm-dsp, simd32, FPU)Compiles
generic-cortex-m55 (ARMv8.1-M · arm-dsp + Helium/MVE)Compiles
esp32-s3 (Xtensa LX7 · board esp32-s3-devkitc)Profile
generic-rv32imac (RISC-V RV32IMAC)Profile
Board nucleo-h743zi (concrete BSP template)Unverified
ISA caps: arm-dsp · arm-mve · riscv-p · riscv-v · xtensa-simd · custom:*Modeled
Targets are pure-data profiles (ISA, FPU, memory banks, DMA, linker sections, toolchain flags). A new MCU is profile data, not a code change. "Compiles" = host cross-compile verified this repo; on-silicon runs are per engagement.

Kernel Backends · Runtime Ports · Evidence

Kernel generic-swar (portable INT4 MAC, always-safe fallback)In-tree
Kernel arm-dsp-smuad (Cortex-M4/M7 __SMUAD)In-tree
Kernels Helium/MVE · RISC-V P/V · Xtensa SIMDRoadmap
Ports: host · baremetal-generic · cortex-m scaffold · custom MQ_PORT_HEADERSupported
Evidence — host benchmark & bit-exact parityIn repo
Evidence — QEMU semihosting (labeled qemu)If toolchain
Evidence — physical HIL cycle counts on real siliconNone in repo
The HIL pipeline (runners, probe adapters: openocd/pyocd/jlink/stm32cubeprog/esptool, board profiles) is implemented and labels evidence physical_hil / qemu / host_simulation — but no board has been run from this repo, so physical evidence honestly reports unavailable. Intrinsics stay isolated behind ISA-gated backend files; generic builds pull in no vendor headers.
The Engagement

Paid Architecture Audit

A fixed-scope first step. The deliverable is a firmware-grade feasibility answer for your model on your silicon — valuable even if the conclusion is "keep this layer at int8" or "MicroQuant is not your fit."

What you get

Typical timeline: 1–2 weeks from receiving your model + target profile. Deliverable is a written report and a review call — not production firmware.

What you provide · What we guarantee

You bring

We guarantee

We do not guarantee

Measured, Not Marketed

Verified Harness Results

Numbers from the repository's own benchmark harness on the reference 2-layer demo model (64×128 → 32×64). Reproduce them yourself — the evaluation sandbox license covers it.

Benchmark Harness Output

MetricFP32 ReferenceMicroQuant SPQ4Delta
Weight flash footprint 40.00 KB 7.81 KB −80.5%
Dynamic heap allocation allocator-dependent 0.00 KB static memory model
Inference latency (host) 3.07 µs 1.74 µs 1.77× faster
Fixed-point accuracy baseline 0.28% relative L1 round-to-nearest requant
Async vs blocking parity bit-identical PASS
Host measurements only (x86/Apple Silicon, -O3); latency is host wall-clock, noise-sensitive, and not a target metric. The Cortex-M4/M7 __SMUAD backend is compiled and bit-exact-parity-tested against the portable reference, but no on-silicon cycle counts are published here — cycle-accurate numbers are produced on your board, per engagement, via the HIL path.

Reproduce It Yourself

Three commands, no dependencies beyond Python 3 and a C++ compiler:

terminal
$ python3 compiler/test_compiler.py Ran 12 tests ... OK $ python3 compiler/main.py --synthetic-demo --out benchmark/model_assets.h SUCCESS: bare-metal assets written $ g++ -std=c++11 -O3 -Iruntime/include -Ibenchmark \ benchmark/benchmark.cpp -o benchmark_run && ./benchmark_run Async vs. Blocking exact bit parity : PASS (bit-identical) Relative L1 error : 0.2785 % Flash size reduction : 80.47% vs FP32 Dynamic heap allocation : 0.00 KB
Legal Clarity

One License. No Copyleft. No Surprises.

MicroQuant is commercial-only proprietary software under a single clean EULA — nothing in this stack can contaminate your codebase.

💎 Commercial Production License

For Shipping Products

Read the Commercial EULA
🧪 Evaluation & Recruitment Sandbox

For Reviewers & Hiring Teams

Read the Sandbox Waiver (§4)
Qualify Yourself First

When MicroQuant Fits — and When It Doesn't

We'd rather lose a bad-fit lead than oversell. If you're on the right side of this line, the audit will pay for itself. If you're on the left, we'll tell you on the first call.

✅ Strong fit
🚫 Not the right tool (yet)
Start Here

Request a Paid Architecture Audit

Tell us about your model and your silicon. A compiler engineer — not a salesperson — reviews every request and replies with a scope, or an honest "not a fit." The more technical detail you give, the faster we can triage.

What the First Call Covers

🔍

Feasibility, Honestly

We look at your model class, accuracy budget, and memory map and tell you what INT4 will and won't do for it — before any contract.

📉

Bottleneck Diagnosis

Flash pressure, SRAM contention, cycle budgets, watchdog constraints — we map where your current inference path actually hurts.

🗺️

A Concrete Integration Plan

You leave with a scoped proposal: target kernels, expected footprint and latency envelopes, verification deliverables, and timeline.

Audit Intake

Model
Target hardware
Budgets & timeline
A compiler engineer reviews every submission. Under NDA on request. No marketing list.
✔️

Audit Request Received

A TinyML compiler engineer will review your model + target details and reply to name@company.com with a scope and next steps — or an honest no-fit — typically within two business days.