loadModel( )
Loads a machine learning model from a local path, remote URL, or Hyperdrive key.
// Load new model
function loadModel(options: LoadModelOptions, rpcOptions?: RPCOptions): Promise<string>;
// Hot-reload config on an already-loaded model
function loadModel(options: ReloadConfigOptions, rpcOptions?: RPCOptions): Promise<string>;Supports multiple model types: LLM, Whisper (speech recognition), Parakeet (NVIDIA NeMo transcription), embeddings, NMT (translation), TTS, and OCR. Handles local file paths, HTTP/HTTPS URLs, Hyperdrive URLs (pear://), and registry URLs.
When onProgress is provided, streaming is used for real-time download progress. Otherwise, a simple request-response pattern is used.
Parameters
| Name | Type | Required? | Description |
|---|---|---|---|
| options | LoadModelOptions | ReloadConfigOptions | ✓ | Configuration for loading or hot-reloading a model |
| rpcOptions | RPCOptions | ✗ | Optional RPC transport options |
LoadModelOptions
Common fields present in all variants:
| Field | Type | Required? | Default | Description |
|---|---|---|---|---|
| modelSrc | string | ModelDescriptor | ✓ | — | Model source — local path, HTTP(S) URL, Hyperdrive pear:// URL, registry URL, or a model constant object |
| modelType | string | ✓ | — | The type of model — see model type variants |
| modelConfig | object | ✗ | {} | Model-specific configuration (varies by modelType) |
| seed | boolean | ✗ | false | Whether to seed the model on Hyperdrive after download |
| delegate | Delegate | ✗ | — | Delegation configuration for remote inference |
| onProgress | (progress: ModelProgressUpdate) => void | ✗ | — | Callback for real-time download progress |
| logger | Logger | ✗ | — | Logger instance — model operation logs are forwarded to this logger |
Delegate
Optional delegation configuration for remote (P2P) inference:
| Field | Type | Required? | Default | Description |
|---|---|---|---|---|
| topic | string | ✓ | — | P2P topic for delegation |
| providerPublicKey | string | ✓ | — | Provider's public key |
| timeout | number | ✗ | — | Timeout in milliseconds (min 100) |
| fallbackToLocal | boolean | ✗ | false | Whether to fallback to local inference if delegation fails |
| forceNewConnection | boolean | ✗ | false | Force a new connection to the provider |
ReloadConfigOptions
Hot-reload configuration on an already-loaded model without reloading the model weights. Currently supported for Whisper models only.
| Field | Type | Required? | Description |
|---|---|---|---|
| modelId | string | ✓ | The ID of an existing loaded model (16-char hex) |
| modelType | string | ✓ | The type of model (must match the loaded model) |
| modelConfig | object | ✓ | New configuration to apply |
Model type variants
The modelType field determines which variant of modelConfig is accepted.
"llm"
All LLM-specific fields live inside modelConfig. See LLM modelConfig for the full reference.
"whisper"
All Whisper-specific fields live inside modelConfig. See Whisper modelConfig for the full reference.
"parakeet"
NVIDIA NeMo Parakeet models for speech recognition. modelConfig is required.
See Parakeet modelConfig for the full reference.
"embeddings"
All embeddings fields live inside modelConfig. See Embeddings modelConfig for the full reference.
"nmt"
modelConfig is required and is a discriminated union on engine. See NMT modelConfig for the full reference.
"tts"
modelConfig is required and is a discriminated union on ttsEngine:
Chatterbox engine (ttsEngine: "chatterbox"):
| Field | Type | Required? | Description |
|---|---|---|---|
| ttsEngine | "chatterbox" | ✓ | Engine discriminator |
| language | "en" | "es" | "de" | "it" | ✓ | Output language |
| ttsTokenizerSrc | string | ModelDescriptor | ✓ | Tokenizer model source |
| ttsSpeechEncoderSrc | string | ModelDescriptor | ✓ | Speech encoder model source |
| ttsEmbedTokensSrc | string | ModelDescriptor | ✓ | Embed tokens model source |
| ttsConditionalDecoderSrc | string | ModelDescriptor | ✓ | Conditional decoder model source |
| ttsLanguageModelSrc | string | ModelDescriptor | ✓ | Language model source |
| referenceAudioSrc | string | ModelDescriptor | ✓ | Reference WAV file for voice cloning |
Supertonic engine (ttsEngine: "supertonic"):
| Field | Type | Required? | Description |
|---|---|---|---|
| ttsEngine | "supertonic" | ✓ | Engine discriminator |
| language | "en" | "es" | "de" | "it" | ✓ | Output language |
| ttsTokenizerSrc | string | ModelDescriptor | ✓ | Tokenizer model source |
| ttsTextEncoderSrc | string | ModelDescriptor | ✓ | Text encoder model source |
| ttsLatentDenoiserSrc | string | ModelDescriptor | ✓ | Latent denoiser model source |
| ttsVoiceDecoderSrc | string | ModelDescriptor | ✓ | Voice decoder model source |
| ttsVoiceSrc | string | ModelDescriptor | ✓ | Voice .bin file source |
| ttsSpeed | number | ✗ | Speech speed multiplier |
| ttsNumInferenceSteps | number | ✗ | Number of inference steps |
"ocr"
All OCR-specific fields live inside modelConfig. See OCR modelConfig for the full reference.
Custom plugin
Any modelType string that is not a built-in type. modelConfig accepts Record<string, unknown>.
modelConfig reference
LLM modelConfig
| Field | Type | Default | Description |
|---|---|---|---|
| ctx_size | number | 1024 | Context window size |
| device | string | "gpu" | Device to use |
| gpu_layers | number | 99 | Number of layers offloaded to GPU |
| system_prompt | string | "You are a helpful assistant." | System prompt |
| temp | number | — | Temperature (0–2) |
| top_p | number | — | Top-p sampling (0–1) |
| top_k | number | — | Top-k sampling (0–128) |
| seed | number | — | Random seed |
| predict | number | — | Max tokens to predict. -1 = until stop token, -2 = until context filled |
| lora | string | — | LoRA adapter path |
| no_mmap | boolean | — | Disable memory-mapped I/O |
| verbosity | 0 | 1 | 2 | 3 | — | Engine verbosity — use exported VERBOSITY constant |
| presence_penalty | number | — | Presence penalty |
| frequency_penalty | number | — | Frequency penalty |
| repeat_penalty | number | — | Repeat penalty |
| stop_sequences | string[] | — | Custom stop sequences |
| n_discarded | number | — | Number of discarded tokens |
| tools | boolean | — | Enable tool calling support |
| projectionModelSrc | string | ModelDescriptor | — | Projection model source for multimodal models |
Whisper modelConfig
Common fields:
| Field | Type | Description |
|---|---|---|
| language | string | Language code (e.g., "en") |
| translate | boolean | Whether to translate to English |
| strategy | "greedy" | "beam_search" | Sampling strategy |
| temperature | number | Temperature |
| initial_prompt | string | Initial prompt for the decoder |
| detect_language | boolean | Auto-detect language |
| vad_params | object | VAD parameters — { threshold?, min_speech_duration_ms?, min_silence_duration_ms?, max_speech_duration_s?, speech_pad_ms?, samples_overlap? } |
| audio_format | "f32le" | "s16le" | Audio format |
| contextParams | object | Context parameters — { model?, use_gpu?, flash_attn?, gpu_device? } |
| miscConfig | object | Miscellaneous config — { caption_enabled? } |
| vadModelSrc | string | ModelDescriptor | VAD model source for voice activity detection |
Additional fields: n_threads, n_max_text_ctx, offset_ms, duration_ms, audio_ctx, no_context, no_timestamps, single_segment, print_special, print_progress, print_realtime, print_timestamps, token_timestamps, thold_pt, thold_ptsum, max_len, split_on_word, max_tokens, debug_mode, tdrz_enable, suppress_regex, suppress_blank, suppress_nst, length_penalty, temperature_inc, entropy_thold, logprob_thold, greedy_best_of, beam_search_beam_size. All optional. See whisperConfigSchema in the source for details.
Parakeet modelConfig
modelConfig is required. Parakeet models support three variants via modelType: "tdt" (default), "ctc", and "sortformer".
Runtime config:
| Field | Type | Default | Description |
|---|---|---|---|
| modelType | "tdt" | "ctc" | "sortformer" | "tdt" | Parakeet model variant |
| maxThreads | number | — | Maximum inference threads |
| useGPU | boolean | — | Use GPU acceleration |
| sampleRate | number | — | Audio sample rate |
| channels | number | — | Audio channels |
| captionEnabled | boolean | — | Enable caption mode |
| timestampsEnabled | boolean | — | Enable timestamps in output |
Model sources (all string \| ModelDescriptor, all optional):
| Field | Description |
|---|---|
| parakeetEncoderSrc | TDT encoder model source |
| parakeetEncoderDataSrc | TDT encoder data source |
| parakeetDecoderSrc | TDT decoder model source |
| parakeetVocabSrc | TDT vocabulary source |
| parakeetPreprocessorSrc | TDT preprocessor source |
| parakeetCtcModelSrc | CTC model source |
| parakeetCtcModelDataSrc | CTC model data source |
| parakeetTokenizerSrc | CTC tokenizer source |
| parakeetSortformerSrc | Sortformer model source |
Embeddings modelConfig
| Field | Type | Default | Description |
|---|---|---|---|
| gpuLayers | number | 99 | Number of layers offloaded to GPU |
| device | "gpu" | "cpu" | "gpu" | Device to use |
| batchSize | number | 1024 | Embedding batch size |
| pooling | "none" | "mean" | "cls" | "last" | "rank" | — | Pooling strategy |
| attention | "causal" | "non-causal" | — | Attention type |
| embdNormalize | number | — | Embedding normalization (integer) |
| flashAttention | "on" | "off" | "auto" | — | Flash attention toggle |
| mainGpu | number | "integrated" | "dedicated" | — | GPU device selection |
| verbosity | 0 | 1 | 2 | 3 | — | Engine verbosity — use exported VERBOSITY constant |
NMT modelConfig
Discriminated union on engine. Common generation parameters (all optional):
| Field | Type | Default | Description |
|---|---|---|---|
| mode | "full" | "full" | Translation mode |
| beamsize | number | 4 | Beam size |
| lengthpenalty | number | 1.0 | Length penalty |
| maxlength | number | 512 | Max output length |
| repetitionpenalty | number | 1.0 | Repetition penalty |
| norepeatngramsize | number | 0 | No-repeat n-gram size |
| temperature | number | 0.3 | Temperature |
| topk | number | 0 | Top-k sampling |
| topp | number | 1.0 | Top-p sampling |
Engine-specific:
- Opus:
from/toaccept"en" | "de" | "es" | "it" | "ru" | "ja" - Bergamot:
from/toaccept 24 languages (en, ar, bg, ca, cs, de, es, et, fi, fr, hu, is, it, ja, ko, lt, lv, nl, pl, pt, ru, sk, sl, uk, zh). Additional fields:srcVocabSrc,dstVocabSrc,normalize,pivotModel - IndicTrans:
from/toaccept 26 Indic language codes (e.g.,"eng_Latn","hin_Deva")
Bergamot pivotModel (optional) — for translation via an intermediate language:
| Field | Type | Required? | Description |
|---|---|---|---|
| modelSrc | string | ModelDescriptor | ✓ | Pivot model source |
| srcVocabSrc | string | ModelDescriptor | ✗ | Source vocabulary file |
| dstVocabSrc | string | ModelDescriptor | ✗ | Destination vocabulary file |
| normalize | number | ✗ | Normalization factor |
Plus all common generation parameters above.
OCR modelConfig
| Field | Type | Description |
|---|---|---|
| langList | string[] | Languages to detect |
| useGPU | boolean | Use GPU acceleration |
| timeout | number | Timeout in milliseconds |
| pipelineMode | "easyocr" | "doctr" | OCR pipeline mode |
| magRatio | number | Magnification ratio for detection |
| defaultRotationAngles | number[] | Rotation angles to try |
| contrastRetry | boolean | Retry with contrast adjustment |
| lowConfidenceThreshold | number | Threshold for low-confidence filtering |
| recognizerBatchSize | number | Batch size for recognizer |
| decodingMethod | "ctc" | "attention" | Decoding method |
| straightenPages | boolean | Straighten pages before recognition |
| detectorModelSrc | string | ModelDescriptor | Detector model source |
ModelProgressUpdate
| Field | Type | Description |
|---|---|---|
| type | "modelProgress" | Event type |
| downloaded | number | Bytes downloaded so far |
| total | number | Total bytes expected |
| percentage | number | Download percentage |
| downloadKey | string | Unique download key (use with cancel()) |
| shardInfo | object | Shard progress (optional, for sharded models) |
| shardInfo.currentShard | number | Current shard index |
| shardInfo.totalShards | number | Total number of shards |
| shardInfo.shardName | string | Current shard file name |
| shardInfo.overallDownloaded | number | Total bytes downloaded across all shards |
| shardInfo.overallTotal | number | Total bytes across all shards |
| shardInfo.overallPercentage | number | Overall percentage across all shards |
| onnxInfo | object | ONNX multi-file progress (optional, for ONNX models) |
| onnxInfo.currentFile | string | Current file being downloaded |
| onnxInfo.fileIndex | number | Current file index |
| onnxInfo.totalFiles | number | Total number of files |
| onnxInfo.overallDownloaded | number | Total bytes downloaded across all files |
| onnxInfo.overallTotal | number | Total bytes across all files |
| onnxInfo.overallPercentage | number | Overall percentage across all files |
Returns
Promise<string> — Resolves to the model ID (used to reference the model in subsequent API calls).
Throws
| Error | When |
|---|---|
MODEL_LOAD_FAILED | Model loading fails |
STREAM_ENDED_WITHOUT_RESPONSE | Streaming ends without a final response (when using onProgress) |
INVALID_RESPONSE_TYPE | Response type does not match expected "loadModel" |
Example
// Local file path
const modelId = await loadModel({
modelSrc: "/home/user/models/llama-7b.gguf",
modelType: "llm",
modelConfig: { ctx_size: 2048 }
});
// Remote URL with progress tracking
const modelId = await loadModel({
modelSrc: "https://huggingface.co/.../model.gguf",
modelType: "llm",
onProgress: (progress) => {
console.log(`Downloaded: ${progress.percentage}%`);
}
});
// Hyperdrive URL
const modelId = await loadModel({
modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
modelType: "llm",
modelConfig: { ctx_size: 2048 }
});
// Multimodal model with projection
const modelId = await loadModel({
modelSrc: "https://huggingface.co/.../main-model.gguf",
modelType: "llm",
modelConfig: {
ctx_size: 512,
projectionModelSrc: "https://huggingface.co/.../projection-model.gguf"
}
});
// Whisper with VAD model
const modelId = await loadModel({
modelSrc: "https://huggingface.co/.../whisper-model.gguf",
modelType: "whisper",
modelConfig: {
language: "en",
strategy: "greedy",
vadModelSrc: "https://huggingface.co/.../vad-model.bin"
}
});
// With logger forwarding
import { getLogger } from "@qvac/sdk";
const logger = getLogger("my-app");
const modelId = await loadModel({
modelSrc: "/path/to/model.gguf",
modelType: "llm",
logger
});