Your model.
Our compiler.
Their microcontroller.
We design and integrate custom quantization frameworks directly into proprietary RTOS and bare-metal firmware. The playground below is our actual SPQ4 compiler running in your browser — a live sample of the engineering you're hiring.
The SPQ4 Compiler Playground
Upload a weights file — or synthesize one — and watch the real pipeline run: scale-protected INT4 quantization, SRAM tile solving, dead-block sparsity analysis, and production C++ codegen. Every number below is computed live from your actual data.
1 · Model Ingestion
Upload raw float32 weights, or configure a synthetic layer.
2 · Quantization Map
Hover any weight to inspect its INT4 code, block scale, and packed byte.
Nibble Packing (2 × INT4 → 1 byte)
3 · Generated C++ Assets
model_assets.h — flash-aligned static arrays, drops straight into an arm-none-eabi-gcc build.
// Run the SPQ4 compiler to generate C++ assets...
This is a browser demo of the real pipeline. In an engagement we run the full compiler against your actual model, calibrate against your dataset, and hand-tune the kernels for your silicon.
Get this done for your product →Compiler Engineering, Not a Boxed Tool
SPQ4 — Scale-Protected Quantization, 4-bit — is one method engineered end-to-end for microcontrollers. We adapt it to your hardware.
Scale-Protected INT4
Per-block MSE-optimal clip search. Outliers saturate at the INT4 rail instead of destroying scale resolution for every neighboring weight. Two weights per byte, ~87% smaller than FP32.
FPU-less, Division-free
Fixed-point requantization (acc·m0) >> nb with round-to-nearest. The hot loop is integer MACs, bitwise masks, and shifts — nothing else. Runs on Cortex-M0.
Zero-Allocation Runtime
Header-only C++11, standard library only. Every buffer statically allocated: 0.00 KB heap, no arena, no fragmentation, deterministic latency. Weights execute in place from flash.
RTOS Cooperative Executor
Tile-by-tile stepping with ping-pong SRAM buffers and DMA prefetch. Never blocks a realtime thread, never trips a watchdog — and stays bit-identical to blocking execution.
Free Structural Sparsity
All-zero weight blocks compile to m0 == 0 and are skipped with zero mask storage. The skip pattern is a compile-time constant — timing stays input-invariant for WCET analysis.
SRAM-Perfect Tiling
The compiler solves 2D tile geometry against your SRAM/TCM budget and serializes weights in streaming order, so every tile DMA-transfers whole into a tightly-coupled buffer.
Hardware-Specific Kernels
SWAR nibble unpacking everywhere; __SMUAD dual-MAC paths on Cortex-M4/M7; Helium, RISC-V RVV, and Tensilica DSP ports built to order in engagements.
Verification Evidence
Every delivery ships with a parity harness: bit-exact async-vs-blocking checks, fixed-vs-float error bounds, footprint and alignment audits you can cite in a safety file.
Verified Harness Results
Numbers from the repository's own benchmark harness on the reference 2-layer demo model (64×128 → 32×64). Reproduce them yourself — the evaluation sandbox license covers it.
Benchmark Harness Output
| Metric | FP32 Reference | MicroQuant SPQ4 | Delta |
|---|---|---|---|
| Weight flash footprint | 40.00 KB | 7.81 KB | −80.5% |
| Dynamic heap allocation | allocator-dependent | 0.00 KB | static memory model |
| Inference latency (host) | 3.07 µs | 1.74 µs | 1.77× faster |
| Fixed-point accuracy | baseline | 0.28% relative L1 | round-to-nearest requant |
| Async vs blocking parity | — | bit-identical | PASS |
-O3). On Cortex-M4/M7 the
__SMUAD dual-MAC path and dead-block skipping widen the gap further —
we provide cycle-accurate target numbers during an engagement.
Reproduce It Yourself
Three commands, no dependencies beyond Python 3 and a C++ compiler:
How an engagement runs
- Audit — model, activation ranges, memory map, timing budget.
- Design — SPQ4 tuned to your accuracy target and silicon.
- Integrate — kernels, RTOS scheduling, build-system wiring.
- Verify — parity, WCET, and footprint evidence delivered.
One License. No Copyleft. No Surprises.
MicroQuant is commercial-only proprietary software under a single clean EULA — nothing in this stack can contaminate your codebase.
For Shipping Products
- Object-form distribution of the runtime and generated assets inside your licensed product lines — fully closed-source, no disclosure obligations, ever.
- No open-source strings attached: there is no AGPL/GPL track anywhere in this stack; a single proprietary EULA governs everything.
- Explicit zero-liability terms: quantization precision, runtime behavior, timing, and execution outcomes are validated by you for your application — the EULA is unambiguous about where responsibility sits.
- Engagement-scoped deliverables: custom kernels and integration work are licensed per Statement of Work with clear IP boundaries.
For Reviewers & Hiring Teams
- Free technical review: hiring managers, assessors, and prospective clients may clone, read, build, and benchmark this repository locally at no charge.
- Zero contamination guarantee: sandbox review creates no license obligations for your employer's proprietary code — no copyleft, no reciprocity, no disclosure.
- Zero liability, both directions: evaluation runs "as-is" at your own risk, and we assert no claims for good-faith evaluation activity.
- Clean scope: anything production, commercial, or customer-facing requires the commercial license — the boundary is explicit.
Request a Technical Architecture Evaluation
Tell us about your model and your silicon. A compiler engineer — not a salesperson — reviews every request.
What the First Call Covers
Feasibility, Honestly
We look at your model class, accuracy budget, and memory map and tell you what INT4 will and won't do for it — before any contract.
Bottleneck Diagnosis
Flash pressure, SRAM contention, cycle budgets, watchdog constraints — we map where your current inference path actually hurts.
A Concrete Integration Plan
You leave with a scoped proposal: target kernels, expected footprint and latency envelopes, verification deliverables, and timeline.