Behind Zero-Code Multi-Model Gateways: KAIHE's Architecture Decisions

The "unified AI gateway" concept has exploded in 2026. From API aggregation platforms to model relay services, the market is flooded. KAIHE's model aggregation gateway made five deliberate architecture decisions worth examining.

Decision 1: Zero-Code vs. Flexibility — The 80/20 Rule

Zero-code and maximum flexibility are in genuine tension. KAIHE chose 80/20 layering: 80% of use cases (text generation, code completion, translation, document analysis) go through a zero-code config panel; 20% advanced cases (A/B testing, custom prompt chains, streaming customization) use the programmable API. The hard rule: zero-code convenience must never constrain the 20%.

Decision 2: Protocol Normalization with Provider Pass-Through

While OpenAI's Chat Completions format is the de-facto standard, "compatible" means different things across providers. KAIHE applies core protocol normalization with a provider_params field for model-specific features — ensuring frictionless common use while preserving advanced capabilities.

Decision 3: Cost-Optimal vs. Quality-Optimal Routing

"Best model for the task" is a dual optimization problem. KAIHE lets users define priority per task type: quality-first, cost-first, or balanced. The gateway dynamically routes based on priority and real-time availability.

Decision 4: Synchronous Gateway + Asynchronous Queue

Two channels: synchronous for real-time (chat), asynchronous with webhook callbacks for batch processing (document analysis). The async channel also enables post-processing nodes: content safety review, format conversion, audit logging.

Decision 5: Three-Tier Data Isolation

Level 1 (non-sensitive): unrestricted routing. Level 2 (sensitive): route only to providers meeting data privacy commitments. Level 3 (classified): local private deployment only. The gateway automatically classifies and routes — every decision is auditable.

These five decisions define KAIHE not as a simple proxy, but as an AI infrastructure layer that makes model capability as accessible as electricity — infrastructure, not a technical asset requiring dedicated teams.