Ornith 1.0MIT-licensed coding model family

Ornith AI: Self-Improving Models for Agentic Coding

A practical guide to the Ornith 1.0 model family from DeepReinforce AI.
Explore self-scaffolding coding models, benchmark highlights, hardware choices, and local deployment paths.

Compare 9B, 31B, 35B MoE, and 397B MoE variants before choosing a local or production setup.

Ornith 1.0

Key Ornith AI Signals

The essentials from the current Ornith 1.0 guide.

Release

Jun 25

2026

Model sizes

9B-397B

Dense + MoE

Context window

262K

tokens

SWE-Bench

82.4

Verified

What is Ornith AI?

Ornith AI centers on Ornith 1.0, a family of open-source large language models designed for repository-scale agentic coding. The models learn not only to write code, but also to build the scaffold around the work: planning, tool use, retries, and verification.

Self-scaffolding agents
Ornith learns task plans, tool calls, error recovery, and code patches as part of the same reinforcement-learning loop.
Open model family
Choose from 9B Dense, 31B Dense, 35B MoE, and 397B MoE variants under an MIT license.
Built for coding workflows
Use it for terminal-native agents, multi-file refactors, bug localization, test-driven patches, and offline coding assistants.

Training loop

How Ornith AI Works

The site structure mirrors the Ornith guide: understand the training idea, compare benchmarks, then pick a model that fits your hardware.

The model optimizes the orchestration strategy and final code output together instead of depending on a fixed human-written harness.

Model data

Ornith 1.0 Model Specs

Model size, architecture, base model, VRAM, and use-case data adapted from the Ornith model guide.

Ornith-1.0-9B

9B Dense on Qwen 3.5 for low-VRAM devices and fast coding triage.

Local entry point

Architecture: Dense; All parameters active at inference time
VRAM: ~19GB bf16 / ~6GB Q4; Q4 fits entry-level local setups
Context: 262K tokens; Large enough for broad repository context
Best for: Edge / Offline; Private coding, triage, lightweight agents

Ornith-1.0-31B

31B Dense on Gemma 4 for teams that prefer dense-model stability.

Balanced dense

Architecture: Dense; Stable dense behavior with higher resource needs
VRAM: ~62GB bf16 / ~20GB Q4; 80GB-class GPU or quantized deployment
Context: 262K tokens; Long-context coding tasks
Best for: Balanced; Quality and speed without MoE routing

Ornith-1.0-35B MoE

35B MoE with about 3B active parameters per token, recommended for most local developers.

Recommended

Architecture: MoE; More total knowledge with fewer active computations
VRAM: ~25GB Q5_K_M; Practical for a single 24GB+ GPU
Speed: Faster than 9B dense; MoE reduces per-token compute
Best for: Best Value; Local agents, refactors, daily coding

Ornith-1.0-397B MoE

397B MoE for maximum accuracy in production-grade agent pipelines.

Flagship

Architecture: MoE; Based on Qwen 3.5 397B
VRAM: ~200GB FP8 / ~400GB bf16; Typically 8x 80GB GPUs
Top score: 82.4 SWE-Bench; Verified benchmark
Best for: Production; High-accuracy autonomous coding systems

The 35B MoE model is the recommended sweet spot for most local developers; 397B targets production agent pipelines.

Benchmark data

Ornith 1.0 Benchmark Data

Comparison data for Terminal-Bench, SWE-Bench, NL2Repo, and ClawEval, covering both the flagship 397B model and smaller local models.

397B vs Frontier Models

Ornith-1.0-397B compared with Qwen, GLM, DeepSeek, and Claude Opus scores.

Benchmark	Ornith 397B	Qwen 3.5	Qwen 3.7	GLM 5.2	DeepSeek V4	Opus 4.7	Opus 4.8
Terminal-Bench 2.1	77.5	53.5	73.5	81.0	64	70.3	85
SWE-Bench Verified	82.4	76.4	80.4	-	80.6	80.8	87.6
SWE-Bench Pro	62.2	51.6	60.6	62.1	55.4	64.3	69.2
SWE-Bench Multilingual	78.9	69.3	78.3	-	76.2	-	-
NL2Repo	48.2	36.8	47.2	48.9	-	-	69.7
ClawEval Avg	77.1	70.7	65.2	-	75.8	78.2	-

Small Model Comparison

9B and 35B MoE results against similarly sized Qwen and Gemma baselines.

Benchmark	Ornith 9B	Ornith 35B	Qwen 3.5 9B	Qwen 3.5 35B	Gemma 12B	Gemma 31B
Terminal-Bench 2.1	43.1	64.2	21.3	41.4	21	42.1
SWE-Bench Verified	69.4	75.6	53.2	70	44.2	52
SWE-Bench Pro	42.9	44.6	31.3	44.6	27.6	35.7
SWE-Bench Multilingual	52	60.3	39.7	60.3	32.5	51.7
NL2Repo	27.2	20.5	16.2	20.5	10.3	15.5
ClawEval Avg	63.1	65.4	53.2	65.4	32.5	48.5

Note: these scores are from DeepReinforce official evaluation; re-test on your own repository tasks before production rollout.

Run locally

Runtime and Deployment Data

Serving and integration notes for vLLM, Ollama, LM Studio, SGLang, llama.cpp, and OpenAI-compatible coding agents.

vLLM

OpenAI-compatible serving for production deployments with prefix caching, tool parsing, and reasoning parsing.

Production throughput

Port: 8000; OpenAI /v1 endpoint
Context: 262144; --max-model-len
Tool calls: qwen3_xml; --enable-auto-tool-choice
Reasoning: qwen3; reasoning_content field

Ollama / LM Studio

Best for local trials and GUI workflows; use GGUF Q4_K_M or Q5_K_M quantization.

Fastest setup

Ollama: hf.co/...-GGUF; Pull and run in one command
LM Studio: Search Ornith-1.0; Download and load quantized weights
9B Q4: ~6GB VRAM; Low-VRAM entry point
35B Q5: ~25GB VRAM; Recommended local quality

SGLang / llama.cpp

SGLang is useful for MoE scheduling; llama.cpp is a lightweight C++ serving path.

Self-hosting options

SGLang parser: qwen3_coder; Different from vLLM parser
llama.cpp: llama-server; -c 262144
Agents: Claude Code / OpenHands; Point to local OPENAI_BASE_URL
API key: EMPTY; Placeholder for local services

Evaluation note

Benchmark data is from DeepReinforce official evaluation; treat it as a selection signal and re-test before production rollout.

Self-reported

Terminal-Bench: 5-run average; 4h timeout, 32 CPU, 48GB RAM
SWE-Bench: OpenHands; 256K context
NL2Repo: 400K context; 48K output
ClawEval: real-user tasks; 256K context

Ornith AI Use Cases and Model Choices

A compact map of where Ornith AI fits in real developer workflows.

Repository refactoring

Plan and apply coordinated edits across many files while checking intermediate results.

Bug localization

Search a codebase, identify likely root causes, and produce focused patches with tests.

Terminal agents

Power terminal-native coding agents that need structured tool calls and recovery loops.

Private local coding

Run smaller variants locally for offline assistance and code privacy.

35B MoE sweet spot

Use the 35B MoE variant when you want a practical balance of speed, quality, and hardware cost.

397B production scale

Use the 397B MoE variant for maximum accuracy in production-grade agent pipelines.

FAQ

Ornith AI FAQ

Fast answers for model selection, setup, and positioning.

What is Ornith AI?

Ornith AI is this site theme and guide around Ornith 1.0, an open-source family of agentic coding models from DeepReinforce AI.

What makes Ornith different?

Its key idea is self-scaffolding: the model learns how to plan, use tools, recover from errors, and solve coding tasks together.

Which Ornith model should I choose?

For many users, the 35B MoE variant is the practical middle ground. The 9B model is better for constrained local hardware, while 397B targets high-accuracy production agents.

Can Ornith AI run locally?

Yes. The guide focuses on local and self-hosted paths such as vLLM, Ollama, LM Studio, quantized weights, and GPU memory tradeoffs.

Build with Ornith AI

Start from the model family, compare the benchmark signals, then choose the deployment path that matches your hardware and coding workflow.

Ornith AI: Self-Improving Models for Agentic Coding

Key Ornith AI Signals

What is Ornith AI?

How Ornith AI Works

Joint scaffold and solution learning

Reasoning with tool calls

Guarded benchmark training

Ornith 1.0 Model Specs

Ornith-1.0-9B

Ornith-1.0-31B

Ornith-1.0-35B MoE

Ornith-1.0-397B MoE

Ornith 1.0 Benchmark Data

397B vs Frontier Models

Small Model Comparison

Runtime and Deployment Data

vLLM

Ollama / LM Studio

SGLang / llama.cpp

Evaluation note

Ornith AI Use Cases and Model Choices

Repository refactoring

Bug localization

Terminal agents

Private local coding

35B MoE sweet spot

397B production scale

Ornith AI FAQ

What is Ornith AI?

What makes Ornith different?

Which Ornith model should I choose?

Can Ornith AI run locally?

Build with Ornith AI