Ornith AI: Self-Improving Models for Agentic Coding
A practical guide to the Ornith 1.0 model family from DeepReinforce AI.
Explore self-scaffolding coding models, benchmark highlights, hardware choices, and local deployment paths.
Compare 9B, 31B, 35B MoE, and 397B MoE variants before choosing a local or production setup.
Key Ornith AI Signals
The essentials from the current Ornith 1.0 guide.
Release
Jun 25
2026
Model sizes
9B-397B
Dense + MoE
Context window
262K
tokens
SWE-Bench
82.4
Verified
What is Ornith AI?
Ornith AI centers on Ornith 1.0, a family of open-source large language models designed for repository-scale agentic coding. The models learn not only to write code, but also to build the scaffold around the work: planning, tool use, retries, and verification.
- Self-scaffolding agentsOrnith learns task plans, tool calls, error recovery, and code patches as part of the same reinforcement-learning loop.
- Open model familyChoose from 9B Dense, 31B Dense, 35B MoE, and 397B MoE variants under an MIT license.
- Built for coding workflowsUse it for terminal-native agents, multi-file refactors, bug localization, test-driven patches, and offline coding assistants.
How Ornith AI Works
The site structure mirrors the Ornith guide: understand the training idea, compare benchmarks, then pick a model that fits your hardware.



Model data
Ornith 1.0 Model Specs
Model size, architecture, base model, VRAM, and use-case data adapted from the Ornith model guide.
Ornith-1.0-9B
9B Dense on Qwen 3.5 for low-VRAM devices and fast coding triage.
- Architecture
- Dense
- All parameters active at inference time
- VRAM
- ~19GB bf16 / ~6GB Q4
- Q4 fits entry-level local setups
- Context
- 262K tokens
- Large enough for broad repository context
- Best for
- Edge / Offline
- Private coding, triage, lightweight agents
Ornith-1.0-31B
31B Dense on Gemma 4 for teams that prefer dense-model stability.
- Architecture
- Dense
- Stable dense behavior with higher resource needs
- VRAM
- ~62GB bf16 / ~20GB Q4
- 80GB-class GPU or quantized deployment
- Context
- 262K tokens
- Long-context coding tasks
- Best for
- Balanced
- Quality and speed without MoE routing
Ornith-1.0-35B MoE
35B MoE with about 3B active parameters per token, recommended for most local developers.
- Architecture
- MoE
- More total knowledge with fewer active computations
- VRAM
- ~25GB Q5_K_M
- Practical for a single 24GB+ GPU
- Speed
- Faster than 9B dense
- MoE reduces per-token compute
- Best for
- Best Value
- Local agents, refactors, daily coding
Ornith-1.0-397B MoE
397B MoE for maximum accuracy in production-grade agent pipelines.
- Architecture
- MoE
- Based on Qwen 3.5 397B
- VRAM
- ~200GB FP8 / ~400GB bf16
- Typically 8x 80GB GPUs
- Top score
- 82.4 SWE-Bench
- Verified benchmark
- Best for
- Production
- High-accuracy autonomous coding systems
The 35B MoE model is the recommended sweet spot for most local developers; 397B targets production agent pipelines.
Benchmark data
Ornith 1.0 Benchmark Data
Comparison data for Terminal-Bench, SWE-Bench, NL2Repo, and ClawEval, covering both the flagship 397B model and smaller local models.
397B vs Frontier Models
Ornith-1.0-397B compared with Qwen, GLM, DeepSeek, and Claude Opus scores.
| Benchmark | Ornith 397B | Qwen 3.5 | Qwen 3.7 | GLM 5.2 | DeepSeek V4 | Opus 4.7 | Opus 4.8 |
|---|---|---|---|---|---|---|---|
| Terminal-Bench 2.1 | 77.5 | 53.5 | 73.5 | 81.0 | 64 | 70.3 | 85 |
| SWE-Bench Verified | 82.4 | 76.4 | 80.4 | - | 80.6 | 80.8 | 87.6 |
| SWE-Bench Pro | 62.2 | 51.6 | 60.6 | 62.1 | 55.4 | 64.3 | 69.2 |
| SWE-Bench Multilingual | 78.9 | 69.3 | 78.3 | - | 76.2 | - | - |
| NL2Repo | 48.2 | 36.8 | 47.2 | 48.9 | - | - | 69.7 |
| ClawEval Avg | 77.1 | 70.7 | 65.2 | - | 75.8 | 78.2 | - |
Small Model Comparison
9B and 35B MoE results against similarly sized Qwen and Gemma baselines.
| Benchmark | Ornith 9B | Ornith 35B | Qwen 3.5 9B | Qwen 3.5 35B | Gemma 12B | Gemma 31B |
|---|---|---|---|---|---|---|
| Terminal-Bench 2.1 | 43.1 | 64.2 | 21.3 | 41.4 | 21 | 42.1 |
| SWE-Bench Verified | 69.4 | 75.6 | 53.2 | 70 | 44.2 | 52 |
| SWE-Bench Pro | 42.9 | 44.6 | 31.3 | 44.6 | 27.6 | 35.7 |
| SWE-Bench Multilingual | 52 | 60.3 | 39.7 | 60.3 | 32.5 | 51.7 |
| NL2Repo | 27.2 | 20.5 | 16.2 | 20.5 | 10.3 | 15.5 |
| ClawEval Avg | 63.1 | 65.4 | 53.2 | 65.4 | 32.5 | 48.5 |
Note: these scores are from DeepReinforce official evaluation; re-test on your own repository tasks before production rollout.
Run locally
Runtime and Deployment Data
Serving and integration notes for vLLM, Ollama, LM Studio, SGLang, llama.cpp, and OpenAI-compatible coding agents.
vLLM
OpenAI-compatible serving for production deployments with prefix caching, tool parsing, and reasoning parsing.
- Port
- 8000
- OpenAI /v1 endpoint
- Context
- 262144
- --max-model-len
- Tool calls
- qwen3_xml
- --enable-auto-tool-choice
- Reasoning
- qwen3
- reasoning_content field
Ollama / LM Studio
Best for local trials and GUI workflows; use GGUF Q4_K_M or Q5_K_M quantization.
- Ollama
- hf.co/...-GGUF
- Pull and run in one command
- LM Studio
- Search Ornith-1.0
- Download and load quantized weights
- 9B Q4
- ~6GB VRAM
- Low-VRAM entry point
- 35B Q5
- ~25GB VRAM
- Recommended local quality
SGLang / llama.cpp
SGLang is useful for MoE scheduling; llama.cpp is a lightweight C++ serving path.
- SGLang parser
- qwen3_coder
- Different from vLLM parser
- llama.cpp
- llama-server
- -c 262144
- Agents
- Claude Code / OpenHands
- Point to local OPENAI_BASE_URL
- API key
- EMPTY
- Placeholder for local services
Evaluation note
Benchmark data is from DeepReinforce official evaluation; treat it as a selection signal and re-test before production rollout.
- Terminal-Bench
- 5-run average
- 4h timeout, 32 CPU, 48GB RAM
- SWE-Bench
- OpenHands
- 256K context
- NL2Repo
- 400K context
- 48K output
- ClawEval
- real-user tasks
- 256K context
Ornith AI Use Cases and Model Choices
A compact map of where Ornith AI fits in real developer workflows.
Repository refactoring
Plan and apply coordinated edits across many files while checking intermediate results.
Bug localization
Search a codebase, identify likely root causes, and produce focused patches with tests.
Terminal agents
Power terminal-native coding agents that need structured tool calls and recovery loops.
Private local coding
Run smaller variants locally for offline assistance and code privacy.
35B MoE sweet spot
Use the 35B MoE variant when you want a practical balance of speed, quality, and hardware cost.
397B production scale
Use the 397B MoE variant for maximum accuracy in production-grade agent pipelines.
Ornith AI FAQ
Fast answers for model selection, setup, and positioning.
What is Ornith AI?
Ornith AI is this site theme and guide around Ornith 1.0, an open-source family of agentic coding models from DeepReinforce AI.
What makes Ornith different?
Its key idea is self-scaffolding: the model learns how to plan, use tools, recover from errors, and solve coding tasks together.
Which Ornith model should I choose?
For many users, the 35B MoE variant is the practical middle ground. The 9B model is better for constrained local hardware, while 397B targets high-accuracy production agents.
Can Ornith AI run locally?
Yes. The guide focuses on local and self-hosted paths such as vLLM, Ollama, LM Studio, quantized weights, and GPU memory tradeoffs.
Build with Ornith AI
Start from the model family, compare the benchmark signals, then choose the deployment path that matches your hardware and coding workflow.