Unified Generation Layer

The Unified Generation Layer is a backend architecture improvement introduced by Luker that consolidates generation logic scattered across various API endpoints into a shared module, achieving unified wrapping for multiple backends.

Problem Background

In SillyTavern, different AI backends (OpenAI, Anthropic, Google, Kobold, etc.) each have independent code paths. Each backend endpoint file independently handles request construction, streaming response parsing, error handling, and other logic. This leads to the following issues:

Duplicated code — Similar streaming processing and error retry logic is duplicated across multiple files
Inconsistent behavior — Token statistics and error response formats differ across backends
Hard to extend — Adding a new backend requires implementing the complete request/response processing chain from scratch
No unified metering — Lacks cross-backend token usage tracking capability

Solution

Luker introduces a Unified Generation Layer endpoint (/api/backends/luker-generation) as the primary path for the frontend to initiate AI generation requests. This endpoint receives generation requests from the frontend, forwards them to the corresponding upstream API based on the current Chat Completion Source, and completes streaming response handling, token metering, generation acknowledgment, and persistence within a unified processing pipeline.

Multi-Backend Unified Wrapping

The Unified Generation Layer supports unified processing for the following backends:

OpenAI and its compatible APIs
Anthropic (Claude)
Google (Gemini / Vertex AI)
Kobold / TabbyAPI
Other Chat Completion compatible backends

The frontend initiates generation requests through the Unified Generation Layer, which is responsible for routing to the correct upstream service, rather than the frontend directly calling each backend's independent endpoint. The individual backend endpoints (chat-completions.js, kobold.js, etc.) still exist and independently integrate the Request Inspector, but the Unified Generation Layer provides a more complete processing pipeline.

Shared Token Metering

The Unified Generation Layer automatically calls the Request Inspector before and after generation requests:

After streaming completes, calls completeInspectionFromStream to record token usage from stream events
On generation failure, calls failInspection to record error information
On generation abort, calls abortInspection to record the interruption

Note: startInspection is called by the backend endpoint (e.g. chat-completions.js), not by the generation layer itself.

This ensures that regardless of which backend is used, token consumption is accurately tracked.

Unified Streaming Processing

For SSE streaming responses, the Unified Generation Layer provides shared stream parsing logic:

Parses SSE event formats from different backends
Extracts token usage information from streaming events
Unified stream interruption and recovery handling
Works with the WebSocket Proxy to support stream offset recovery

Unified Error Handling

Different backends return errors in various formats. The Unified Generation Layer normalizes them into a consistent error response structure, simplifying the frontend's error handling logic.

Architecture Relationship

NOTE

The Unified Generation Layer is an internal module, transparent to the frontend. Users don't need to be aware of its existence — just enjoy the consistent experience it provides.

Relationship with Other Modules

Request Inspector — The Unified Generation Layer is the primary caller of the Request Inspector
Auth & Quota — Storage quota middleware executes before the Unified Generation Layer, intercepting over-quota requests
chats.js — After generation completes, triggers the acknowledge-generation flow to associate generation results with chat records

Unified Generation Layer ​

Problem Background ​

Solution ​

Multi-Backend Unified Wrapping ​

Shared Token Metering ​

Unified Streaming Processing ​

Unified Error Handling ​

Architecture Relationship ​

Relationship with Other Modules ​