The Universal Deep Learning Bridge
Shrew is a runtime and DSL that decouples deep learning from any single technology. Define your model once in .sw, then train and deploy it from Python, Rust, JavaScript, C++, or any language in your stack.
Why choose Shrew
Built from the ground up in Rust for performance, flexibility, and ease of use.
Modular by Design
Use only what you need. Pick individual crates like shrew-core for tensors, shrew-cuda for GPU, or shrew-nn for layers — each works on its own.
Write Once, Run Anywhere
Define your model in a .sw file — a clean, human-readable format that separates architecture from code. No boilerplate, no transpilation headaches.
Rust Safety, Familiar Feel
Eager-mode autograd with an intuitive API, plus Rust's type safety and fearless concurrency. Catch shape bugs at compile time, not at 3am.
Python Native, Rust Powered
Research in Python, deploy in Rust — seamlessly. Zero-copy NumPy interop and a familiar API make the transition painless.
Deploy Anywhere
Small binaries, minimal dependencies, and native quantization (INT8/INT4). From cloud servers to edge devices, Shrew goes where you need it.
Built-in CLI Tools
Inspect, validate, and benchmark your models from the command line. Run shrew dump, validate, bench, or info — no extra setup required.
See it in action
Real examples showing how Shrew works — from basic tensor math to training a GPT model.
use shrew::prelude::*;
fn main() -> shrew::Result<()> {
let dev = CpuDevice;
// Broadcasting: [3,1] + [1,2] → [3,2]
let a = CpuTensor::from_f64_slice(
&[1.0, 2.0, 3.0], (3, 1), DType::F64, &dev
)?;
let b = CpuTensor::from_f64_slice(
&[10.0, 20.0], (1, 2), DType::F64, &dev
)?;
let c = a.add(&b)?;
// Reverse-mode autograd
let w = CpuTensor::randn((3, 3), DType::F64, &dev)?
.set_variable();
let x = CpuTensor::randn((2, 3), DType::F64, &dev)?;
let loss = x.matmul(&w)?.sum_all()?;
let grads = loss.backward()?;
let dw = grads.get(&w).unwrap(); // ∂loss/∂w
Ok(())
}Blazing Fast by Default
Shrew's CPU backend uses hardware-level optimizations to squeeze every ounce of performance from your machine. Pair it with the CUDA GPU backend for even more speed.
7
DType support
60+
Tensor ops
10
Crates
CPU Benchmark — release mode (AMD/Intel)
Modular by design
Every component is its own independent package — use what you need, swap what you don't. The same code runs on CPU and GPU without changes.
.sw Model File
Your model definition — architecture, config, and training in one place
shrew-ir
Parse → Optimize → Prepare for execution
JIT Compiler
Compile into fast, ready-to-run instructions
CPU
SIMD + rayon
CUDA
cuBLAS + PTX
Python
PyO3 + NumPy
The foundation: tensors, shapes, data types, and automatic differentiation.
Parses .sw model files into optimized execution graphs.
High-performance CPU computations with SIMD acceleration.
NVIDIA GPU acceleration with optimized memory management.
Ready-to-use neural network layers: Linear, Conv, Transformer, and more.
Optimizers (Adam, SGD, etc.) and learning rate schedulers.
Data loading utilities and built-in datasets like MNIST.
The main package: JIT compiler, trainer, quantization, ONNX export, and more.
Python bindings with seamless NumPy integration.
Serialization Formats
.shrew
Native binary checkpoints
Safetensors
HuggingFace compatible
ONNX
Opset 17, zero deps
Documentation
Guides for the .sw DSL syntax, language bindings, API references, and tutorials. Everything you need to get started with Shrew.
Start building with Shrew
Add Shrew to your Rust project, use the Python bindings, or define models in .sw files. One runtime, every language.
MNIST with MLP
cargo run -p mnist-example
MNIST with CNN
cargo run -p mnist-cnn-example
Char-level GPT
cargo run --release -p char-gpt-example
Python bindings
pip install shrew-python