v0.9.1 — Silicon Photonic Runtime

Compile to
Light.

Route neural network inference through silicon photonic chips. Matrix multiplications at the speed of a laser pulse — not a clock cycle. 0.4ns latency. Real hardware. No simulation.

0.8 PFLOPS
Throughput
2.1ns
Inference Latency
18W
TDP
λ₀MATRIX_MUL0.4ns / opWAVELENGTH1550nm ± 0.1PHOTON-SPC-7 // 28nm CMOS
Performance Benchmarks

Photon vs GPU vs TPU.
No asterisks.

All benchmarks run on identical workloads: PyTorch models compiled via photon.compile(), measured end-to-end including I/O. Hardware: Photon SPC-7 card, A100 80GB, TPUv4.

Photon SPC-7Silicon Photonic
NVIDIA A100CUDA 12.3
Google TPUv4JAX 0.4
Model
Metric
Photon
CUDA A100
TPUv4
ResNet-50
Inference Latency
0.4ns
340ns
82ns
100%
BERT-Large
Inference Latency
1.2ns
890ns
210ns
100%
ViT-H/14
Inference Latency
2.1ns
1240ns
380ns
100%
ResNet-50
Energy / Inference
0.8µJ
48µJ
12µJ
98%
BERT-Large
Throughput
1.2PFLOPS
0.18PFLOPS
0.52PFLOPS
567%

All measurements: median of 10,000 runs · batch_size=1 · fp16 precision · Feb 2026

Feature Matrix

Every layer of the stack.
Fully specified.

Click any row to expand code snippets, architecture diagrams, and latency specs.

Compiler Toolchain

PyTorch → photonic IR → silicon

Photon's frontend accepts any PyTorch nn.Module or JAX function. A single decorator triggers compilation to photonic IR without modifying model architecture.

import photon
@photon.compile(target="spc-7", precision="fp16")
def model_forward(x):
return resnet50(x)
# Compiles on first call, cached to .photon/cache/
output = model_forward(input_tensor)
PyTorch versions
2.1, 2.2, 2.3
JAX versions
0.4.x
Compile time (ResNet-50)
~4.2s cold, 0ms warm
IR format
Photonic MLIR dialect

The optimizer maps tensor operations to physical MZI arrays, minimizes optical path length, and resolves phase conflicts across wavelength channels.

Optimization passes
14 (configurable)
MZI utilization
94% average
Wavelength channels
64 (WDM)
Phase resolution
0.001 rad

Photonic computation's energy cost scales with precision. The quantization engine finds the optimal precision floor per layer using calibration data.

photon.quantize(
model,
calibration_data=loader,
target_precision="opt4", # optical 4-bit
per_layer=True
)
Supported precisions
fp32, fp16, bf16, opt8, opt4
Accuracy drop (opt4)
<0.3% on ImageNet
Energy reduction
8× vs fp32
Get Started

One command.
Inference at light speed.

No email form. No waitlist. Paste the command, run your model, measure the nanoseconds yourself.

# Requires Python 3.10+ · CUDA 12.3+ or Photon SPC-7 driver
$ pip install photon-compiler[spc7]
~4.2s compile time (cold)Requires Photon SPC-7 or compatible
Try in Browser Sandbox

Browser Sandbox requires GitHub OAuth — no email, no credit card. Your models never leave your browser session.

Field Reports

Engineers who measured it
themselves.

0.4nsVerified latency (ResNet-50)
"I've benchmarked every accelerator on the market. Photon is the first time I've had to rewrite my measurement code because my timer resolution wasn't fine enough. 0.4ns is not a marketing number."
Priya Subramaniam profile photo
Priya Subramaniam
Principal ML Infrastructure Engineer
Cohere
3 daysInvestor due diligence closed
"Our Series B investors asked us to prove the photonic accelerator claim. I ran photon bench, exported the signed PDF, and sent it over. Due diligence closed in three days. That report is worth more than the hardware."
Marcus Delacroix profile photo
Marcus Delacroix
CTO & Co-founder
Luminary AI
1 afternoonHAL integration time
"The compiler correctly maps our custom attention variant to MZI arrays without manual waveguide routing. I expected to spend three weeks on the HAL integration. It was a Thursday afternoon."
Dr. Yuki Tanaka profile photo
Dr. Yuki Tanaka
Research Scientist, Neuromorphic Photonics
MIT Research Lab
Ready to compile at light speed?