If you're looking for a device that can run 7B-13B LLMs locally but can't stretch to the E1's 十余元,999 price point, the KAIHE D1 might be your only serious answer.
The D1 packs an NVIDIA Jetson Orin NX 16GB module — 100 TOPS AI compute, 16GB unified memory, 256GB NVMe storage — at ¥4,999. It occupies a critical position in the KAIHE lineup: everything below it (A1/B1/C1) can only run cloud APIs or extremely small 1B-7B models; everything above it (E1/F1/G1) enters the x86 + large VRAM world.
In other words, the D1 is the minimum ticket to the "local LLM" world. After a week of real-world testing, here's my verdict on whether that ticket is worth it.

1. Hardware Overview: Low Power ≠ Low Capability
| Spec | Detail |
|---|---|
| Module | NVIDIA Jetson Orin NX 16GB |
| AI Compute | 100 TOPS (INT8) / 50 TFLOPS (FP16) |
| CPU | 8-core ARM Cortex-A78AE |
| GPU | 1024-core Ampere + 32 Tensor Cores |
| Memory | 16GB LPDDR5 (unified memory architecture) |
| Storage | 256GB NVMe SSD |
| Networking | Gigabit Ethernet + Wi-Fi 6 |
| Ports | HDMI 2.0, USB 3.2 ×4, GPIO 40-pin |
| Power | 10W-25W (passive cooling, zero noise) |
| Dimensions | 103×79×37mm (smaller than a smartphone) |
A few noteworthy details:
Unified memory architecture. The 16GB is shared between CPU and GPU — no boundary between VRAM and system RAM. For LLM inference, this means all 16GB is available for model loading, unlike PCs where 4-6GB is consumed by the OS.
Passive cooling. Like the A1, the D1 is entirely fanless. At 25W full load, the chassis is barely warm. Zero audible noise — a critical advantage over traditional desktops for bedroom or studio use.
Edge-AI-optimized Ampere GPU. Not a cut-down desktop GPU, but an architecture purpose-built for inference workloads. The 1024 CUDA Cores + 32 Tensor Cores deliver dramatically better performance-per-watt on Transformer models than comparably priced x86 solutions.
2. Model Benchmarks: What Can the D1 Actually Run?
Ollama Deployment Tests
The D1 ships with JetPack 6.0 (Ubuntu 22.04), supporting one-click Docker and Ollama installation. I tested the following models via ollama:
| Model | Params | Quantization | VRAM Usage | Inference Speed | Verdict |
|---|---|---|---|---|---|
| Qwen2.5-7B | 7B | Q4_K_M | ~5.5GB | 15-18 tok/s | ✅ Smooth |
| Qwen2.5-14B | 14B | Q4_K_M | ~9.5GB | 7-9 tok/s | ✅ Usable, slightly slow |
| Llama 3.1-8B | 8B | Q4_K_M | ~6GB | 16-20 tok/s | ✅ Smooth |
| DeepSeek-V2-Lite | 16B (2.4B active) | Q4_K_M | ~8GB | 12-15 tok/s | ✅ Smooth (MoE advantage) |
| Gemma 2-9B | 9B | Q4_K_M | ~6.5GB | 14-17 tok/s | ✅ Smooth |
| Qwen2.5-32B | 32B | Q2_K | ~14GB | 2-4 tok/s | ⚠️ Barely usable |
| Qwen2.5-72B | 72B | IQ1_S | OOM | — | ❌ Cannot run |
Core takeaway: The D1's sweet spot is 7B-14B (Q4 quantization), the most mature and well-supported open-source model scale. 32B is marginal; 72B is impossible — entirely expected at 1/3 the price of the E1.
Multi-Agent Concurrency Test
This is where the D1 differentiates itself from ordinary mini PCs. I ran 3 OpenClaw Agents simultaneously: - Agent 1: News summarization (Qwen2.5-7B) - Agent 2: FAQ customer support (another Qwen2.5-7B instance) - Agent 3: Code review (DeepSeek Coder 6.7B)
Three Agents loaded in memory, total model footprint ~17GB. Under Orin NX's unified memory architecture, memory management was automatic — no OOM. Each Agent maintained >80% of its single-Agent inference speed. No severe performance degradation under multi-tasking.
The D1's real value isn't running one "bigger model" — it's running multiple mid-scale Agents simultaneously, in collaborative workflows.
3. Who Is the D1 For? And Not For?
✅ Best Fit
Independent developers and small teams. You need local AI capability without a noisy desktop tower. The D1 runs 24/7, with electricity costs under $2/month.
IoT/AIoT scenarios. 40-pin GPIO + 100 TOPS AI compute + OpenClaw automation — ideal for smart home hubs, edge quality inspection, and security analytics.
AI learners. JetPack SDK + full CUDA ecosystem + OpenClaw's visual Agent orchestration make this a superb environment for learning AI deployment and Agent development.
Budget-conscious enterprise users. In-store AI customer service, warehouse security analytics, remote device monitoring — deployment costs under 1/5 of traditional servers, with zero noise and minimal footprint.
❌ Not For
Those chasing the largest models. You need the E1 (32GB, 14B-30B) or even the G1 (128GB, 70B-405B). The D1 has a clear ceiling on model size — unrealistic expectations will lead to disappointment.
x86-dependent workflows. Some enterprise software is x86-only, and certain peripheral drivers lack ARM support. If your tech stack heavily depends on x86, the E1 is the better choice.
4. Competitive Comparison
| Dimension | KAIHE D1 | Jetson Orin Nano Dev Kit | Mac Mini M4 |
|---|---|---|---|
| Price | ¥4,999 | ¥4,999 (board only, no SSD/case) | ¥4,799 (16GB) |
| AI Compute | 100 TOPS | 67 TOPS | 38 TOPS (ANE) |
| Memory | 16GB unified | 8GB | 16GB unified |
| Software Ecosystem | CUDA + OpenClaw pre-installed | JetPack SDK (bare metal) | Core ML (non-CUDA) |
| Power | 10-25W | 7-15W | 15-40W |
| LLM Deployment | Ollama out-of-box | Manual setup required | mlx-llm (non-standard) |
| Multi-Agent | ✅ Native OpenClaw | ❌ No built-in solution | ❌ No built-in solution |
The Mac Mini M4 has stronger CPU performance (better for compilation, video editing, etc.), but in LLM inference and Agent orchestration, the D1's CUDA ecosystem + OpenClaw pre-installation is an overwhelming advantage.
5. Verdict
The D1 isn't the most powerful KAIHE product — that's clearly the G1. But it is the smartest product in the lineup: precisely positioned at the entry price point for local LLMs, delivering essential AI capability without excess compute.
For ¥4,999, you get: - A device that runs 7B-14B models within the CUDA ecosystem - OpenClaw Agent OS pre-installed - A 24/7 AI node that runs cool and silent
If you're torn between the 7 KAIHE models, here's my simple recommendation: - Cloud API only → A1 (¥999) - Local LLM on a budget → D1 (¥4,999) - Serious local deployment → E1 (十余元,999) - This is your production tool → G1 (¥34,999)
The D1 is the best stepping stone into the world of local AI.
tags: KAIHE D1, NVIDIA Jetson Orin NX, local LLM, product review, AI edge computing, Ollama, OpenClaw