Generation
APIs for sending LLM requests, registering tools into the global tool registry, and resolving connection configuration.
Sending LLM Requests
The recommended API is context.generateTask — a one-stop function that handles profile resolution, world-info activation, prompt assembly, dispatch, and response normalization in a single call. Built-in extensions (search-tools, completion-preset-assistant, character-editor-assistant, memory-graph, orchestrator) all route through it. Third-party extensions should use it too instead of stitching together sendOpenAIRequest + buildPresetAwarePromptMessages + connectionProfiles.resolve themselves.
Why one API
Manual stitching means every extension reimplements profile resolution, world-info activation, family dispatch (openai vs kobold/novel/textgen), and response parsing. generateTask consolidates all of that and returns a normalized result shape regardless of the underlying API family.
Quick Start
For a plain text request that respects the active prompt preset, character card, and chat world info:
const context = Luker.getContext();
const result = await context.generateTask({
taskMessages: [
{ role: 'system', content: 'You are a translation assistant.' },
{ role: 'user', content: 'Translate this text into French: hello world.' },
],
worldInfoSource: 'chat', // activate WI from current chat history
abortSignal: controller.signal,
});
console.log(result.assistantText);Option Reference
context.generateTask({
taskMessages: Array<{role, content, ...}>, // required: system / user / assistant / tool turns
includeCharacterCard?: boolean = true, // include character card in the envelope
worldInfoSource?: 'none' | 'task' | 'chat' | 'custom' = 'none',
customWorldInfoMessages?: Array | null = null, // required when worldInfoSource is 'custom'
runtimeWorldInfo?: object | null = null, // pre-resolved snapshot; short-circuits resolution
forceWorldInfoResimulate?: boolean = false,
worldInfoType?: string = 'quiet',
apiPresetName?: string = '', // connection profile name (e.g. 'claude')
llmPresetName?: string = '', // chat completion preset name (e.g. 'low-temp')
tools?: Array | null = null, // OpenAI-style tool definitions
toolChoice?: 'auto' | 'none' | object = 'auto',
jsonSchema?: object | null = null, // for structured-output mode (mutually exclusive with tools)
functionCallMode?: 'auto' | 'native' | 'prompt_xml' | 'prompt_json' = 'auto',
functionCallOptions?: object | null = null, // e.g. { protocolStyle, requiredFunctionName }
abortSignal?: AbortSignal | null = null,
substituteMacros?: boolean = true, // resolve {{...}} in taskMessages.content; opt out for authoring flows
}): Promise<{
assistantText: string,
toolCalls: Array<{ name, args, raw }>,
jsonData: any, // populated when jsonSchema mode succeeds
reasoning: string | null,
finishReason: string | null,
usage: object | null,
raw: any, // sender-specific raw response (for advanced inspection)
}>worldInfoSource modes
| Value | Meaning |
|---|---|
'none' (default) | Skip world-info activation. Use this when you've already pre-resolved runtimeWorldInfo, or your task doesn't need WI at all. |
'task' | Activate WI based on taskMessages. Use when the task itself drives WI matching. |
'chat' | Activate WI based on the current chat history (uses fallbackToCurrentChat: true internally). |
'custom' | Activate WI based on customWorldInfoMessages you supply explicitly. |
If you already have a resolved WI snapshot (e.g., cached across retries), pass runtimeWorldInfo directly with worldInfoSource: 'none' to skip re-resolution.
Macro Substitution
When substituteMacros is true (the default), generateTask runs substituteParams over each task message's string content before assembly. This lets plugin requests resolve the same {{...}} macros the main chat path resolves — Luker built-ins ({{user}}, {{char}}, {{persona}}, {{datetime}}, {{random:a,b}}, ...) and any extension-registered macros that flow through the same engine (e.g. MagVarUpdate's {{getvar::}} family).
Side-effect macros ({{setvar::}}, {{addvar::}}, {{incvar::}}, {{decvar::}}, {{deletevar::}}) are stripped via skipSideEffects: true. Without this, every plugin request would re-fire those mutations and corrupt chat_metadata.variables on each dispatch.
When to opt out
Set substituteMacros: false for authoring flows where the AI's job is to read or edit text containing literal {{...}} placeholders that must remain unrendered. If {{user}} is replaced before the model sees it, the model can't reason about, diff, or edit the source template.
Concrete examples already in this codebase:
- Character card editor — AI is editing card fields that include
{{user}}/{{char}}placeholders. - Lorebook diff analysis — the diff payload contains lorebook entries whose
{{...}}placeholders are part of the comparison. - Preset editor — AI is editing prompt-preset bodies that ship
{{...}}macros for end-user rendering. - CardApp Studio AI — conversations may quote source-text fragments that the assistant is asked to modify.
Rule of thumb:
- AI is producing content that will be shown to the end user → leave
substituteMacros: true. - AI is reading or editing source text that contains
{{...}}placeholders → setsubstituteMacros: false.
Tool Calls
const result = await context.generateTask({
taskMessages: [
{ role: 'system', content: 'Use tool calls only.' },
{ role: 'user', content: 'Search the web for: claude opus 4 release notes.' },
],
worldInfoSource: 'none',
tools: [{
type: 'function',
function: {
name: 'search_web',
description: 'Search the web for information.',
parameters: {
type: 'object',
properties: { query: { type: 'string' } },
required: ['query'],
},
},
}],
toolChoice: 'auto',
functionCallMode: 'auto',
});
for (const call of result.toolCalls) {
console.log(call.name, call.args); // args is already parsed (object)
}result.toolCalls is always an array of { name, args, raw }. args is the parsed arguments object — you don't need to JSON.parse it. raw is the original tool-call object from the sender (useful when you need the original id).
Forced single function
When you want the model to invoke exactly one specific function:
toolChoice: { type: 'function', function: { name: 'my_fn' } },
functionCallOptions: { requiredFunctionName: 'my_fn' },Function call mode
| Mode | When to pick it |
|---|---|
'auto' (default) | Let the runtime decide based on the active connection profile. |
'native' | Force native tool-calling (e.g., OpenAI tools / Anthropic tool_use). |
'prompt_xml' | Embed tool definitions in the system prompt as XML — useful for models without native tool calling. |
'prompt_json' | Embed tool definitions as JSON in the prompt. |
Structured Output (JSON Schema)
For non-tool structured output:
const result = await context.generateTask({
taskMessages: [
{ role: 'system', content: 'Return user demographics as JSON.' },
{ role: 'user', content: 'Alice, 32, software engineer.' },
],
worldInfoSource: 'none',
jsonSchema: {
type: 'object',
properties: {
name: { type: 'string' },
age: { type: 'integer' },
occupation: { type: 'string' },
},
required: ['name', 'age', 'occupation'],
additionalProperties: false,
},
});
console.log(result.jsonData); // { name: 'Alice', age: 32, occupation: 'software engineer' }tools and jsonSchema are mutually exclusive — pass one or the other, never both.
Errors
All failures throw GenerateTaskError, exposed at context.GenerateTaskError:
try {
await context.generateTask({ ... });
} catch (error) {
if (error instanceof context.GenerateTaskError) {
console.warn('generateTask failed:', error.code, error.message);
if (error.code === 'rate_limit') {
// back off and retry
}
}
throw error;
}code | Meaning |
|---|---|
aborted | Request was aborted via abortSignal. |
network | Network-level failure (DNS, ECONNREFUSED, etc.). |
auth_missing | Authentication error (401, missing API key). |
rate_limit | Rate limited (429). |
invalid_input | The options object is malformed (e.g., tools and jsonSchema both set, worldInfoSource:'custom' without customWorldInfoMessages). |
unsupported_api | The resolved request API isn't supported by the runtime. |
tool_call_parse | Model returned a tool call whose arguments failed JSON.parse. |
json_schema_violation | jsonSchema mode failed validation. |
no_response | Sender returned no usable content. |
unknown | Catch-all for unclassified failures. |
error.cause carries the original underlying error when available; error.details carries diagnostic context (e.g., the rejected rawArgs for tool_call_parse).
End-to-End Example
A search agent that respects the user-selected connection profile, runs tool calls until the model finalizes, and aborts cleanly:
const context = Luker.getContext();
const settings = extension_settings.my_search_agent;
const result = await context.generateTask({
taskMessages: [
{ role: 'system', content: 'You are a search agent. Use search_web only.' },
{ role: 'user', content: userQuery },
],
worldInfoSource: 'chat', // activate WI from current chat
apiPresetName: settings.connectionProfileName, // user-selected, e.g. 'claude'
llmPresetName: settings.presetName, // user-selected, e.g. 'low-temp'
tools: [searchWebTool],
toolChoice: 'auto',
functionCallMode: 'auto',
abortSignal: controller.signal,
});
if (result.toolCalls.length === 0) {
return { text: result.assistantText, calls: [] };
}
return {
text: result.assistantText,
calls: result.toolCalls.map(c => ({ name: c.name, args: c.args })),
};Migration Cookbook
If your extension was previously calling sendOpenAIRequest directly, here's how to translate.
Mapping
| Old | New |
|---|---|
import { sendOpenAIRequest } from '../../openai.js' | use context.generateTask (no import needed) |
import { resolveChatCompletionRequestProfile } from '../connection-manager/profile-resolver.js' | drop — pass apiPresetName to generateTask |
import { extractAllFunctionCalls, getResponseMessageContent } from '../function-call-runtime.js' | drop — read result.toolCalls and result.assistantText |
context.buildPresetAwarePromptMessages({ messages, envelopeOptions, runtimeWorldInfo }) | drop — generateTask assembles internally |
responseData = await sendOpenAIRequest('quiet', msgs, signal, { llmPresetName, apiSettingsOverride, tools, toolChoice, requestScope: 'extension_internal', functionCallOptions }) | result = await context.generateTask({ taskMessages, llmPresetName, apiPresetName, tools, toolChoice, functionCallOptions, abortSignal }) |
calls = extractAllFunctionCalls(responseData, allowedNames) | calls = result.toolCalls.filter(c => allowedNames.has(c.name)) |
assistantText = getResponseMessageContent(responseData) | assistantText = result.assistantText |
Before / After
Before (manual stitching):
import { sendOpenAIRequest } from '../../openai.js';
import { resolveChatCompletionRequestProfile } from '../connection-manager/profile-resolver.js';
import { extractAllFunctionCalls } from '../function-call-runtime.js';
const { apiSettingsOverride, requestApi } = resolveChatCompletionRequestProfile({
profileName: settings.connectionProfileName,
defaultApi: context.mainApi || 'openai',
defaultSource: context.chatCompletionSettings?.chat_completion_source || '',
});
const worldInfo = await context.resolveWorldInfoForMessages(messages, {
type: 'quiet',
fallbackToCurrentChat: true,
});
const promptMessages = context.buildPresetAwarePromptMessages({
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt },
],
envelopeOptions: {
includeCharacterCard: true,
api: settings.presetName ? 'openai' : requestApi,
promptPresetName: settings.presetName,
},
promptPresetName: settings.presetName,
runtimeWorldInfo: worldInfo,
});
const responseData = await sendOpenAIRequest('quiet', promptMessages, abortSignal, {
tools,
toolChoice: 'auto',
replaceTools: true,
llmPresetName: settings.presetName,
apiSettingsOverride,
requestScope: 'extension_internal',
functionCallOptions: { protocolStyle: TOOL_PROTOCOL_STYLE.JSON_SCHEMA },
});
const calls = extractAllFunctionCalls(responseData, allowedNames);After (generateTask):
const result = await context.generateTask({
taskMessages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userPrompt },
],
includeCharacterCard: true,
worldInfoSource: 'chat',
apiPresetName: settings.connectionProfileName,
llmPresetName: settings.presetName,
tools,
toolChoice: 'auto',
functionCallMode: 'auto',
functionCallOptions: { protocolStyle: TOOL_PROTOCOL_STYLE.JSON_SCHEMA },
abortSignal,
});
const calls = result.toolCalls.filter(c => allowedNames.has(c.name));Caveats
generateTaskalways setsrequestScope: 'extension_internal'internally — you no longer need to pass it.replaceTools: trueis implied when you passtools. There's no separate flag.- The old
apiSettingsOverridepath is gone from the recommended API. If you still need raw override (advanced), useconnectionProfiles.resolveand the low-level dispatcher (see Low-Level Reference).
Tool Registration
Plugins can register tools into the global tool registry via getContext(). Registered tools appear in the main chat's tool calling flow — the model can invoke them during normal conversation.
const context = Luker.getContext();
context.registerFunctionTool({
name: 'my_plugin_tool',
displayName: 'My Tool',
description: 'Does something useful',
parameters: {
type: 'object',
properties: {
input: { type: 'string', description: 'Input text' },
},
required: ['input'],
},
action: async (args) => {
return `Result for: ${args.input}`;
},
formatMessage: (args) => {
return `Used my tool with input: ${args.input}`;
},
shouldRegister: async () => {
return true;
},
stealth: false,
});To remove a registered tool:
context.unregisterFunctionTool('my_plugin_tool');Utility methods:
| Method | Description |
|---|---|
context.registerFunctionTool(tool) | Register a tool to the global registry |
context.unregisterFunctionTool(name) | Remove a tool from the global registry |
context.isToolCallingSupported() | Check if the current API/model supports tool calling |
context.canPerformToolCalls(type) | Check if tool calls can be performed for a given request type |
Global vs Per-Request Tools
registerFunctionTool adds tools to the global registry — they are available in the main chat for the model to call. The tools parameter in generateTask provides tools for that specific request only and does not affect the global registry.
Low-Level Reference
These primitives back generateTask. Use them directly only when generateTask doesn't fit — for example, when you need streaming responses, custom retry logic that mutates the request between attempts, or integration with a non-standard pipeline.
Connection Profile Resolution
A connection profile is a bundle of connection configuration (API kind, model, secret, proxy, etc.) managed by Luker's Connection Manager. It's a separate concept from chat completion presets — profiles describe "where to connect", presets describe "how to generate". The two compose freely.
When a plugin needs to let the user pick a connection profile (e.g., a "which API config to use" dropdown), use context.connectionProfiles.list() to populate the UI:
context.connectionProfiles.list(): ConnectionProfile[]connectionProfiles.resolve is deprecated for direct extension use
With generateTask, you pass the profile name (apiPresetName) and resolution happens internally. The resolve(...) method is kept for backwards compatibility but is no longer recommended for new code.
sendOpenAIRequest
Direct LLM dispatcher. generateTask calls this internally for OpenAI-family requests after handling envelope assembly, world-info activation, and profile resolution.
import { sendOpenAIRequest } from '../../../openai.js';
const result = await sendOpenAIRequest('quiet', messages, signal, {
tools,
toolChoice: 'auto',
replaceTools: true,
llmPresetName,
apiSettingsOverride,
requestScope: 'extension_internal',
functionCallOptions: { protocolStyle: 'json_schema' },
});The first argument 'quiet' means this is a background request that won't appear in the chat UI.
| Parameter | Purpose |
|---|---|
llmPresetName | Load a chat completion preset to override generation parameters (temperature, top_p, frequency_penalty, max_tokens, etc.). Does not affect connection fields. |
apiPresetName | Connection profile name. Resolved internally. If both this and apiSettingsOverride are provided, the explicit override wins. |
apiSettingsOverride | Directly override connection settings with an object (typically from connectionProfiles.resolve). Takes precedence over apiPresetName. |
requestScope | Set to 'extension_internal' to skip main chat CHAT_COMPLETION hooks. |
buildPresetAwarePromptMessages
Envelope assembly only — no dispatch. Useful when you need to inspect the assembled prompt without sending it (e.g., a "show me what would be sent" preview tool).
const messages = context.buildPresetAwarePromptMessages({
messages: [
{ role: 'system', content: taskSystemPrompt },
{ role: 'user', content: taskUserPrompt },
],
envelopeOptions: {
includeCharacterCard: true,
api: 'openai',
promptPresetName: llmPresetName,
},
promptPresetName: llmPresetName,
runtimeWorldInfo: preResolvedWorldInfo,
});It arranges your messages according to the active prompt preset's prompt_order, optionally injecting the character card and world info entries. See Presets & Prompts for assembly details.
generateRaw
generateRaw(params: {
prompt?: string,
api?: string | null,
instructOverride?: boolean,
quietToLoud?: boolean,
systemPrompt?: string,
responseLength?: number | null,
trimNames?: boolean,
prefill?: string,
jsonSchema?: object | null,
llmPresetName?: string,
apiPresetName?: string,
apiSettingsOverride?: object | null,
}): Promise<string>Sends a literal prompt to the active backend without involving chat history, world info, character card, or extension prompts. Returns the post-processed response text. Use this for utility calls — titling, classification, rewrites — where you want full control over the input.
generateRawData
Same parameters as generateRaw but returns the raw API response object instead of a post-processed string. Use when you need fields like reasoning that generateRaw collapses away.
generateQuietPrompt
generateQuietPrompt(params: {
quietPrompt?: string,
quietToLoud?: boolean,
skipWIAN?: boolean,
quietImage?: string | null,
quietName?: string | null,
responseLength?: number | null,
forceChId?: number | null,
jsonSchema?: object | null,
removeReasoning?: boolean,
trimToSentence?: boolean,
}): Promise<string>Runs a "quiet" generation that reuses the live chat context (history, persona, world info, etc.) but injects quietPrompt as a final user instruction. Result is returned silently unless quietToLoud is set.
| Parameter | Description |
|---|---|
quietPrompt | The user-side instruction injected as the final user message |
quietToLoud | If true, the result is shown in the chat instead of returned silently |
skipWIAN | Skip world info / author's note injection |
quietImage | Image data URL for multimodal calls |
quietName | Sender name (defaults to 'System:') |
responseLength | Override max tokens for this call only |
forceChId | In group chats, bind to a particular member's persona |
jsonSchema | Structured-output schema; result becomes serialized JSON |
removeReasoning | Strip reasoning block per current template |
trimToSentence | Clip to last full sentence |
generateRaw vs generateQuietPrompt
generateRaw | generateQuietPrompt | |
|---|---|---|
| Chat context | Bypassed | Reused |
| World info | Off | Applied (unless skipWIAN) |
| Character card | Off | Applied |
| Use for | Independent utility (title, classify) | Sidebar / silent in-character generation |
sendStreamingRequest
sendStreamingRequest(type: string, data: object, options?: object): AsyncGeneratorStreaming counterpart to sendOpenAIRequest. Throws if the abort signal is already aborted. Fires event_types.GENERATION_BEFORE_API_REQUEST with stream: true so plugins can mutate the request before send. Returns an async generator.
sendGenerationRequest
sendGenerationRequest(type: string, data: object, options?: object): Promise<object>Non-streaming low-level dispatcher. Routes by mainApi to the appropriate sender (sendOpenAIRequest, generateHorde, or text-completion backends). Most plugins should not need this — generateTask covers the common cases.
stopGeneration
stopGeneration(): booleanCancels in-flight generations. Aborts the active controller, dismisses progress notifications, and returns whether anything was actually stopped.
streamingProcessor
context.streamingProcessor: StreamingProcessor | nullLive handle to the in-progress streaming generation. null when no stream is active. Read it to detect or inspect ongoing streams; do not mutate it directly.
Service Classes
Three class-as-namespace helpers expose request lifecycles without going through Generate. Use them when you need direct control over a chat-completion or text-completion backend (e.g., custom retry logic, custom token accounting).
ChatCompletionService
ChatCompletionService.createRequestData(custom: object): object
ChatCompletionService.sendRequest(data: object, extractData?: boolean, signal?: AbortSignal): Promise<{ content, reasoning } | object | AsyncGenerator>
ChatCompletionService.processRequest(requestData: object, options: object, extractData?: boolean, signal?: AbortSignal): Promise<...>Wraps the /api/backends/chat-completions/generate endpoint. processRequest adds named-preset application via getPresetManager('openai').
Non-stream returns { content, reasoning } (or raw JSON when extractData: false). Stream returns an async-generator factory yielding { text, swipes, state }.
TextCompletionService
TextCompletionService.createRequestData({ stream, prompt, ... }): object
TextCompletionService.sendRequest(data, extractData?, signal?): Promise<...>
TextCompletionService.constructPrompt(prompt: ChatMessage[], instructPreset, instructSettings): string
TextCompletionService.processRequest(requestData, options, extractData?, signal?): Promise<...>Mirror of ChatCompletionService for text-completion backends. constructPrompt formats a chat-message array into a single instruct-formatted string.
ConnectionManagerRequestService
ConnectionManagerRequestService.sendRequest(
profileId: string,
prompt: string | ChatMessage[],
maxTokens: number,
custom?: { stream?, signal?, extractData?, includePreset?, includeInstruct?, instructSettings? },
overridePayload?: object,
): Promise<{ content, reasoning } | AsyncGenerator>
ConnectionManagerRequestService.constructPrompt(prompt, profileId, instructSettings?): string
ConnectionManagerRequestService.getSupportedProfiles(): Profile[]
ConnectionManagerRequestService.getProfile(profileId): Profile | null
ConnectionManagerRequestService.getProfileIcon(profileId?): string
ConnectionManagerRequestService.getAllowedTypes(): { openai, textgenerationwebui }Sends a generation through a Connection Manager profile by id, regardless of which profile is currently active in the UI. Throws 'Connection Manager is not available' when the Connection Manager extension is disabled.
const ctx = Luker.getContext();
const result = await ctx.ConnectionManagerRequestService.sendRequest(
settings.profileId,
[
{ role: 'system', content: 'You are a translator.' },
{ role: 'user', content: 'Hello, world!' },
],
256,
{ signal: controller.signal },
);
console.log(result.content);generateTask vs Service classes
generateTask covers profile resolution + envelope assembly + WI activation + family dispatch in a single call. Use a Service class only when you need explicit control over message construction (e.g., raw text-completion strings) or you want to bypass envelope/WI entirely.
Response Helpers
extractMessageFromData
extractMessageFromData(data: object | string): stringExtracts the assistant text from a backend response object. Handles the multiple shapes returned by different backends (OpenAI, Anthropic, Cohere, KoboldAI, NovelAI, etc.). Returns the input unchanged when given a string.
Use this when working with sendGenerationRequest or service-class outputs that are not pre-extracted.
getChatCompletionModel
getChatCompletionModel(): stringReturns the model identifier currently selected for chat completion (e.g., 'claude-opus-4-7', 'gpt-4o'). Reads from the active connection profile's settings.