Your model.
Our compiler.
Their microcontroller.
The specialist team you call when generic edge-AI tools miss your flash, SRAM, latency, or firmware-integration budget. Our production quantization backend today is SPQ4 — scale-protected INT4 — with a zero-heap runtime and an explicit, capability-checked kernel architecture. The playground below runs the real SPQ4 pipeline in your browser: a live sample of the engineering, not a self-serve production compiler.
The SPQ4 Compiler Playground
Upload a weights file — or synthesize one — and watch the real pipeline run: scale-protected INT4 quantization, SRAM tile solving, dead-block sparsity analysis, and production C++ codegen. Every number below is computed live from your actual data.
1 · Model Ingestion
Upload raw float32 weights, or configure a synthetic layer.
2 · Quantization Map
Hover any weight to inspect its INT4 code, block scale, and packed byte.
Nibble Packing (2 × INT4 → 1 byte)
3 · Generated C++ Assets
model_assets.h — flash-aligned static arrays. An evaluation artifact from this browser demo: not calibrated against your dataset and not validated on your silicon. Production headers come from the full toolchain under an engagement's parity/HIL gates.
// Run the SPQ4 compiler to generate C++ assets...
This is a browser demo of the SPQ4 pipeline, not a production compiler. Raw float32 weights are quantized live; ONNX/TFLite uploads synthesize a representative dense layer client-side (real graph parsing runs in the server-side toolchain, on a supported operator subset — see the support matrix). In an engagement we run the full compiler against your actual model, calibrate against your dataset, and produce validated target evidence.
Scope a paid audit →Compiler Engineering, Not a Boxed Tool
SPQ4 — Scale-Protected Quantization, 4-bit — is one method engineered end-to-end for microcontrollers. We adapt it to your hardware.
Scale-Protected INT4
Per-block MSE-optimal clip search. Outliers saturate at the INT4 rail instead of destroying scale resolution for every neighboring weight. Two weights per byte — 80%+ smaller than FP32 end-to-end, measured.
FPU-less, Division-free
Fixed-point requantization (acc·m0) >> nb with round-to-nearest. The hot loop is integer MACs, bitwise masks, and shifts — nothing else. Designed for Cortex-M0-class cores (no FPU, no DSP extension required); the portable path is the always-safe fallback.
Zero-Allocation Runtime
Header-only C++11, standard library only. Every buffer statically allocated: 0.00 KB heap, no arena, no fragmentation, deterministic latency. Weights execute in place from flash.
RTOS Cooperative Executor
Tile-by-tile stepping with ping-pong SRAM buffers and DMA-style prefetch. Designed for cooperative bounded-step integration so realtime threads yield cleanly — and stays bit-identical to blocking execution (verified on the host harness).
Free Structural Sparsity
All-zero weight blocks compile to m0 == 0 and are skipped with zero mask storage. The skip pattern is a compile-time constant — timing stays input-invariant for WCET analysis.
SRAM-Perfect Tiling
The compiler solves 2D tile geometry against your SRAM/TCM budget and serializes weights in streaming order, so every tile DMA-transfers whole into a tightly-coupled buffer.
Explicit Kernel Selection
Operator kernels, hardware backends, and quantization schemes are three separate capability-checked registries. The ISA inner MAC is isolated behind one boundary: portable SWAR everywhere, an in-tree Cortex-M4/M7 __SMUAD backend (compiled + parity-tested), with Helium/MVE, RISC-V P/V, and Xtensa modeled as roadmap backends delivered per engagement.
Verification Evidence
Every delivery ships with a parity harness: bit-exact async-vs-blocking checks, fixed-vs-float error bounds, footprint and alignment audits you can cite in a safety file.
Support Matrix
Exactly what the toolchain compiles, runs, and can prove today — and what it does not.
This mirrors docs/support_matrix.md in the repository; nothing on this site exceeds it.
Gaps are additive registry extension points, not a different product.
Operators
| Dense / Gemm / MatMul (constant weights, bias, transB) | Supported |
| ReLU / Clip(clamp) fused as dense/conv activations | Supported |
| Identity / Flatten / Reshape (shape-safe) | Supported |
| Conv2D — 1×1 pointwise & 3×3, stride 1/2, same/valid | CNN MVP |
| Depthwise Conv2D 3×3 (channel multiplier 1) | CNN MVP |
| MaxPool / AvgPool 2×2 s2 · GlobalAveragePool | CNN MVP |
| Softmax (argmax / top-k on int32 logits) | Placeholder |
| Attention / transformer, sigmoid / tanh / GELU | Rejected |
Quantization Backends & Import Formats
| SPQ4 — INT4, scale-protected, fixed-point m0/nb, dead-block skip | Current backend |
| Additional quantization schemes (registry is scheme-blind) | Additive |
Raw float32 weights .bin / .raw | Supported |
| ONNX — Gemm/MatMul/Relu/Clip/Identity/Flatten/Reshape (strict subset) | Supported |
| cnn-json explicit NHWC spec | Supported |
ONNX Conv → NHWC conversion | Roadmap |
| TFLite ingestion | Not yet |
| PyTorch / Keras (via export to ONNX) | Indirect |
MCU Families & Boards · ISA Features
generic-cortex-m7 (ARMv7E-M · arm-dsp, simd32, FPU) | Compiles |
generic-cortex-m55 (ARMv8.1-M · arm-dsp + Helium/MVE) | Compiles |
esp32-s3 (Xtensa LX7 · board esp32-s3-devkitc) | Profile |
generic-rv32imac (RISC-V RV32IMAC) | Profile |
Board nucleo-h743zi (concrete BSP template) | Unverified |
| ISA caps: arm-dsp · arm-mve · riscv-p · riscv-v · xtensa-simd · custom:* | Modeled |
Kernel Backends · Runtime Ports · Evidence
Kernel generic-swar (portable INT4 MAC, always-safe fallback) | In-tree |
Kernel arm-dsp-smuad (Cortex-M4/M7 __SMUAD) | In-tree |
| Kernels Helium/MVE · RISC-V P/V · Xtensa SIMD | Roadmap |
Ports: host · baremetal-generic · cortex-m scaffold · custom MQ_PORT_HEADER | Supported |
| Evidence — host benchmark & bit-exact parity | In repo |
Evidence — QEMU semihosting (labeled qemu) | If toolchain |
| Evidence — physical HIL cycle counts on real silicon | None in repo |
physical_hil / qemu / host_simulation — but no board has been run from this repo, so physical evidence honestly reports unavailable. Intrinsics stay isolated behind ISA-gated backend files; generic builds pull in no vendor headers.
Paid Architecture Audit
A fixed-scope first step. The deliverable is a firmware-grade feasibility answer for your model on your silicon — valuable even if the conclusion is "keep this layer at int8" or "MicroQuant is not your fit."
What you get
- Model & operator audit against the support matrix: what compiles today, what needs co-engineering, what should stay int8/FP.
- Memory & latency risk register: flash / SRAM / TCM pressure, accumulator-overflow bounds, cooperative-scheduling and watchdog risks.
- SPQ4 compression study on your weights with per-layer error and saturation evidence from the host harness.
- Target feasibility matrix: kernel backend and ISA-feature fit for your MCU family, with the compiler flags and port work required.
- Scoped pilot SOW: acceptance gates, verification plan (host → QEMU → physical HIL), and a footprint/latency envelope to validate.
What you provide · What we guarantee
You bring
- Model artifact (ONNX / raw weights / architecture description) under NDA if needed.
- Target MCU / board, memory map, RTOS or bare-metal environment, and toolchain.
- Flash / SRAM / latency (and power, if relevant) budgets.
- A representative calibration/test set if accuracy must be validated.
We guarantee
- Reproducible measurement method with explicit pass/fail gates and hashed artifacts.
- Honest operator-support boundaries and a written "keep at int8 / not a fit" answer when that's the truth.
- No hidden heap in any delivered runtime path.
We do not guarantee
- A specific INT4 accuracy before the audit measures your model.
- On-silicon cycle numbers without a physical HIL run on your board.
- Support for operators or frameworks outside the matrix in the audit window.
Verified Harness Results
Numbers from the repository's own benchmark harness on the reference 2-layer demo model (64×128 → 32×64). Reproduce them yourself — the evaluation sandbox license covers it.
Benchmark Harness Output
| Metric | FP32 Reference | MicroQuant SPQ4 | Delta |
|---|---|---|---|
| Weight flash footprint | 40.00 KB | 7.81 KB | −80.5% |
| Dynamic heap allocation | allocator-dependent | 0.00 KB | static memory model |
| Inference latency (host) | 3.07 µs | 1.74 µs | 1.77× faster |
| Fixed-point accuracy | baseline | 0.28% relative L1 | round-to-nearest requant |
| Async vs blocking parity | — | bit-identical | PASS |
-O3); latency is host wall-clock,
noise-sensitive, and not a target metric. The Cortex-M4/M7 __SMUAD backend is compiled and
bit-exact-parity-tested against the portable reference, but no on-silicon cycle counts are
published here — cycle-accurate numbers are produced on your board, per engagement, via the HIL path.
Reproduce It Yourself
Three commands, no dependencies beyond Python 3 and a C++ compiler:
How an engagement runs
- Audit — model, activation ranges, memory map, timing budget.
- Design — SPQ4 tuned to your accuracy target and silicon.
- Integrate — kernels, RTOS scheduling, build-system wiring.
- Verify — parity, WCET, and footprint evidence delivered.
One License. No Copyleft. No Surprises.
MicroQuant is commercial-only proprietary software under a single clean EULA — nothing in this stack can contaminate your codebase.
For Shipping Products
- Object-form distribution of the runtime and generated assets inside your licensed product lines — fully closed-source, no disclosure obligations, ever.
- No open-source strings attached: there is no AGPL/GPL track anywhere in this stack; a single proprietary EULA governs everything.
- Explicit zero-liability terms: quantization precision, runtime behavior, timing, and execution outcomes are validated by you for your application — the EULA is unambiguous about where responsibility sits.
- Engagement-scoped deliverables: custom kernels and integration work are licensed per Statement of Work with clear IP boundaries.
For Reviewers & Hiring Teams
- Free technical review: hiring managers, assessors, and prospective clients may clone, read, build, and benchmark this repository locally at no charge.
- Zero contamination guarantee: sandbox review creates no license obligations for your employer's proprietary code — no copyleft, no reciprocity, no disclosure.
- Zero liability, both directions: evaluation runs "as-is" at your own risk, and we assert no claims for good-faith evaluation activity.
- Clean scope: anything production, commercial, or customer-facing requires the commercial license — the boundary is explicit.
When MicroQuant Fits — and When It Doesn't
We'd rather lose a bad-fit lead than oversell. If you're on the right side of this line, the audit will pay for itself. If you're on the left, we'll tell you on the first call.
- A dense/MLP, small 1D/2D CNN, KWS, or anomaly model that barely misses a flash / SRAM / latency / power target.
- Firmware that cannot tolerate heap or needs deterministic, cooperative execution under an RTOS.
- A Cortex-M (or RISC-V / Xtensa) target where generic vendor tools left performance or memory on the table.
- A team that controls the firmware build and can share a model + representative data under NDA.
- You want evidence — parity, footprint, and per-engagement cycle reports — not a black box.
- You want a self-serve "upload any ONNX → production C++" download. That does not exist here, by design.
- You need full computer-vision, transformer/LLM, or general TFLite deployment today (outside the support matrix).
- You need safety-certified (automotive/medical) evidence without funding a safety-grade scope.
- You can't share model details or representative data, so accuracy can't be validated.
- Your constraint is comfortably met by CMSIS-NN, TFLM, Edge Impulse, or ST tooling — use those; we're the last-mile specialist for when they aren't enough.
Request a Paid Architecture Audit
Tell us about your model and your silicon. A compiler engineer — not a salesperson — reviews every request and replies with a scope, or an honest "not a fit." The more technical detail you give, the faster we can triage.
What the First Call Covers
Feasibility, Honestly
We look at your model class, accuracy budget, and memory map and tell you what INT4 will and won't do for it — before any contract.
Bottleneck Diagnosis
Flash pressure, SRAM contention, cycle budgets, watchdog constraints — we map where your current inference path actually hurts.
A Concrete Integration Plan
You leave with a scoped proposal: target kernels, expected footprint and latency envelopes, verification deliverables, and timeline.