Ornith 1.0MIT-licensed coding model family

Ornith AI: Self-Improving Models for Agentic Coding

A practical guide to the Ornith 1.0 model family from DeepReinforce AI.
Explore self-scaffolding coding models, benchmark highlights, hardware choices, and local deployment paths.

Compare 9B, 31B, 35B MoE, and 397B MoE variants before choosing a local or production setup.

Ornith 1.0

Key Ornith AI Signals

The essentials from the current Ornith 1.0 guide.

Release

Jun 25

2026

Model sizes

9B-397B

Dense + MoE

Context window

262K

tokens

SWE-Bench

82.4

Verified

What is Ornith AI?

Ornith AI centers on Ornith 1.0, a family of open-source large language models designed for repository-scale agentic coding. The models learn not only to write code, but also to build the scaffold around the work: planning, tool use, retries, and verification.

  • Self-scaffolding agents
    Ornith learns task plans, tool calls, error recovery, and code patches as part of the same reinforcement-learning loop.
  • Open model family
    Choose from 9B Dense, 31B Dense, 35B MoE, and 397B MoE variants under an MIT license.
  • Built for coding workflows
    Use it for terminal-native agents, multi-file refactors, bug localization, test-driven patches, and offline coding assistants.
Training loop

How Ornith AI Works

The site structure mirrors the Ornith guide: understand the training idea, compare benchmarks, then pick a model that fits your hardware.

The model optimizes the orchestration strategy and final code output together instead of depending on a fixed human-written harness.

Joint scaffold and solution learning
Reasoning with tool calls
Guarded benchmark training

Model data

Ornith 1.0 Model Specs

Model size, architecture, base model, VRAM, and use-case data adapted from the Ornith model guide.

Ornith-1.0-9B

9B Dense on Qwen 3.5 for low-VRAM devices and fast coding triage.

Local entry point
Architecture
Dense
All parameters active at inference time
VRAM
~19GB bf16 / ~6GB Q4
Q4 fits entry-level local setups
Context
262K tokens
Large enough for broad repository context
Best for
Edge / Offline
Private coding, triage, lightweight agents

Ornith-1.0-31B

31B Dense on Gemma 4 for teams that prefer dense-model stability.

Balanced dense
Architecture
Dense
Stable dense behavior with higher resource needs
VRAM
~62GB bf16 / ~20GB Q4
80GB-class GPU or quantized deployment
Context
262K tokens
Long-context coding tasks
Best for
Balanced
Quality and speed without MoE routing

Ornith-1.0-35B MoE

35B MoE with about 3B active parameters per token, recommended for most local developers.

Recommended
Architecture
MoE
More total knowledge with fewer active computations
VRAM
~25GB Q5_K_M
Practical for a single 24GB+ GPU
Speed
Faster than 9B dense
MoE reduces per-token compute
Best for
Best Value
Local agents, refactors, daily coding

Ornith-1.0-397B MoE

397B MoE for maximum accuracy in production-grade agent pipelines.

Flagship
Architecture
MoE
Based on Qwen 3.5 397B
VRAM
~200GB FP8 / ~400GB bf16
Typically 8x 80GB GPUs
Top score
82.4 SWE-Bench
Verified benchmark
Best for
Production
High-accuracy autonomous coding systems

The 35B MoE model is the recommended sweet spot for most local developers; 397B targets production agent pipelines.

Benchmark data

Ornith 1.0 Benchmark Data

Comparison data for Terminal-Bench, SWE-Bench, NL2Repo, and ClawEval, covering both the flagship 397B model and smaller local models.

397B vs Frontier Models

Ornith-1.0-397B compared with Qwen, GLM, DeepSeek, and Claude Opus scores.

BenchmarkOrnith 397BQwen 3.5Qwen 3.7GLM 5.2DeepSeek V4Opus 4.7Opus 4.8
Terminal-Bench 2.177.553.573.581.06470.385
SWE-Bench Verified82.476.480.4-80.680.887.6
SWE-Bench Pro62.251.660.662.155.464.369.2
SWE-Bench Multilingual78.969.378.3-76.2--
NL2Repo48.236.847.248.9--69.7
ClawEval Avg77.170.765.2-75.878.2-

Small Model Comparison

9B and 35B MoE results against similarly sized Qwen and Gemma baselines.

BenchmarkOrnith 9BOrnith 35BQwen 3.5 9BQwen 3.5 35BGemma 12BGemma 31B
Terminal-Bench 2.143.164.221.341.42142.1
SWE-Bench Verified69.475.653.27044.252
SWE-Bench Pro42.944.631.344.627.635.7
SWE-Bench Multilingual5260.339.760.332.551.7
NL2Repo27.220.516.220.510.315.5
ClawEval Avg63.165.453.265.432.548.5

Note: these scores are from DeepReinforce official evaluation; re-test on your own repository tasks before production rollout.

Run locally

Runtime and Deployment Data

Serving and integration notes for vLLM, Ollama, LM Studio, SGLang, llama.cpp, and OpenAI-compatible coding agents.

vLLM

OpenAI-compatible serving for production deployments with prefix caching, tool parsing, and reasoning parsing.

Production throughput
Port
8000
OpenAI /v1 endpoint
Context
262144
--max-model-len
Tool calls
qwen3_xml
--enable-auto-tool-choice
Reasoning
qwen3
reasoning_content field

Ollama / LM Studio

Best for local trials and GUI workflows; use GGUF Q4_K_M or Q5_K_M quantization.

Fastest setup
Ollama
hf.co/...-GGUF
Pull and run in one command
LM Studio
Search Ornith-1.0
Download and load quantized weights
9B Q4
~6GB VRAM
Low-VRAM entry point
35B Q5
~25GB VRAM
Recommended local quality

SGLang / llama.cpp

SGLang is useful for MoE scheduling; llama.cpp is a lightweight C++ serving path.

Self-hosting options
SGLang parser
qwen3_coder
Different from vLLM parser
llama.cpp
llama-server
-c 262144
Agents
Claude Code / OpenHands
Point to local OPENAI_BASE_URL
API key
EMPTY
Placeholder for local services

Evaluation note

Benchmark data is from DeepReinforce official evaluation; treat it as a selection signal and re-test before production rollout.

Self-reported
Terminal-Bench
5-run average
4h timeout, 32 CPU, 48GB RAM
SWE-Bench
OpenHands
256K context
NL2Repo
400K context
48K output
ClawEval
real-user tasks
256K context

Ornith AI Use Cases and Model Choices

A compact map of where Ornith AI fits in real developer workflows.

Repository refactoring

Plan and apply coordinated edits across many files while checking intermediate results.

Bug localization

Search a codebase, identify likely root causes, and produce focused patches with tests.

Terminal agents

Power terminal-native coding agents that need structured tool calls and recovery loops.

Private local coding

Run smaller variants locally for offline assistance and code privacy.

35B MoE sweet spot

Use the 35B MoE variant when you want a practical balance of speed, quality, and hardware cost.

397B production scale

Use the 397B MoE variant for maximum accuracy in production-grade agent pipelines.

FAQ

Ornith AI FAQ

Fast answers for model selection, setup, and positioning.

1

What is Ornith AI?

Ornith AI is this site theme and guide around Ornith 1.0, an open-source family of agentic coding models from DeepReinforce AI.

2

What makes Ornith different?

Its key idea is self-scaffolding: the model learns how to plan, use tools, recover from errors, and solve coding tasks together.

3

Which Ornith model should I choose?

For many users, the 35B MoE variant is the practical middle ground. The 9B model is better for constrained local hardware, while 397B targets high-accuracy production agents.

4

Can Ornith AI run locally?

Yes. The guide focuses on local and self-hosted paths such as vLLM, Ollama, LM Studio, quantized weights, and GPU memory tradeoffs.

Build with Ornith AI

Start from the model family, compare the benchmark signals, then choose the deployment path that matches your hardware and coding workflow.