One model. Every language. Zero friction.

The Universal Deep Learning Bridge

Shrew is a runtime and DSL that decouples deep learning from any single technology. Define your model once in .sw, then train and deploy it from Python, Rust, JavaScript, C++, or any language in your stack.

PythonRustC/C++JavaScriptWASMJava
Get Started
Features

Why choose Shrew

Built from the ground up in Rust for performance, flexibility, and ease of use.

Modular by Design

Use only what you need. Pick individual crates like shrew-core for tensors, shrew-cuda for GPU, or shrew-nn for layers — each works on its own.

Write Once, Run Anywhere

Define your model in a .sw file — a clean, human-readable format that separates architecture from code. No boilerplate, no transpilation headaches.

Rust Safety, Familiar Feel

Eager-mode autograd with an intuitive API, plus Rust's type safety and fearless concurrency. Catch shape bugs at compile time, not at 3am.

Python Native, Rust Powered

Research in Python, deploy in Rust — seamlessly. Zero-copy NumPy interop and a familiar API make the transition painless.

Deploy Anywhere

Small binaries, minimal dependencies, and native quantization (INT8/INT4). From cloud servers to edge devices, Shrew goes where you need it.

Built-in CLI Tools

Inspect, validate, and benchmark your models from the command line. Run shrew dump, validate, bench, or info — no extra setup required.

Code Examples

See it in action

Real examples showing how Shrew works — from basic tensor math to training a GPT model.

main.rs
use shrew::prelude::*;

fn main() -> shrew::Result<()> {
    let dev = CpuDevice;

    // Broadcasting: [3,1] + [1,2] → [3,2]
    let a = CpuTensor::from_f64_slice(
        &[1.0, 2.0, 3.0], (3, 1), DType::F64, &dev
    )?;
    let b = CpuTensor::from_f64_slice(
        &[10.0, 20.0], (1, 2), DType::F64, &dev
    )?;
    let c = a.add(&b)?;

    // Reverse-mode autograd
    let w = CpuTensor::randn((3, 3), DType::F64, &dev)?
        .set_variable();
    let x = CpuTensor::randn((2, 3), DType::F64, &dev)?;
    let loss = x.matmul(&w)?.sum_all()?;
    let grads = loss.backward()?;
    let dw = grads.get(&w).unwrap();  // ∂loss/∂w

    Ok(())
}
Performance

Blazing Fast by Default

Shrew's CPU backend uses hardware-level optimizations to squeeze every ounce of performance from your machine. Pair it with the CUDA GPU backend for even more speed.

7

DType support

60+

Tensor ops

10

Crates

CPU Benchmark — release mode (AMD/Intel)

matmul 256×256~370 µs
matmul 512×512~3.5 ms
add 1M elements~1.3 ms
linear fwd [64,512]×[512,512]+bias~3.9 ms
Architecture

Modular by design

Every component is its own independent package — use what you need, swap what you don't. The same code runs on CPU and GPU without changes.

.sw Model File

Your model definition — architecture, config, and training in one place

shrew-ir

Parse → Optimize → Prepare for execution

JIT Compiler

Compile into fast, ready-to-run instructions

CPU

SIMD + rayon

CUDA

cuBLAS + PTX

Python

PyO3 + NumPy

shrew-core

The foundation: tensors, shapes, data types, and automatic differentiation.

shrew-ir

Parses .sw model files into optimized execution graphs.

shrew-cpu

High-performance CPU computations with SIMD acceleration.

shrew-cuda

NVIDIA GPU acceleration with optimized memory management.

shrew-nn

Ready-to-use neural network layers: Linear, Conv, Transformer, and more.

shrew-optim

Optimizers (Adam, SGD, etc.) and learning rate schedulers.

shrew-data

Data loading utilities and built-in datasets like MNIST.

shrew

The main package: JIT compiler, trainer, quantization, ONNX export, and more.

shrew-python

Python bindings with seamless NumPy integration.

Serialization Formats

.shrew

Native binary checkpoints

Safetensors

HuggingFace compatible

ONNX

Opset 17, zero deps

Documentation

Guides for the .sw DSL syntax, language bindings, API references, and tutorials. Everything you need to get started with Shrew.

Start building with Shrew

Add Shrew to your Rust project, use the Python bindings, or define models in .sw files. One runtime, every language.

MNIST with MLP

cargo run -p mnist-example

MNIST with CNN

cargo run -p mnist-cnn-example

Char-level GPT

cargo run --release -p char-gpt-example

Python bindings

pip install shrew-python