loadModel( )

Loads a machine learning model from a local path, remote URL, or Hyperdrive key.

// Load new model
function loadModel(options: LoadModelOptions, rpcOptions?: RPCOptions): Promise<string>;

// Hot-reload config on an already-loaded model
function loadModel(options: ReloadConfigOptions, rpcOptions?: RPCOptions): Promise<string>;

Supports multiple model types: LLM, Whisper (speech recognition), Parakeet (NVIDIA NeMo transcription), embeddings, NMT (translation), TTS, and OCR. Handles local file paths, HTTP/HTTPS URLs, Hyperdrive URLs (pear://), and registry URLs.

When onProgress is provided, streaming is used for real-time download progress. Otherwise, a simple request-response pattern is used.

Parameters

Name	Type	Required?	Description
options	`LoadModelOptions` `\|` `ReloadConfigOptions`	✓	Configuration for loading or hot-reloading a model
rpcOptions	`RPCOptions`	✗	Optional RPC transport options

`LoadModelOptions`

Common fields present in all variants:

Field	Type	Required?	Default	Description
modelSrc	`string \| ModelDescriptor`	✓	—	Model source — local path, HTTP(S) URL, Hyperdrive `pear://` URL, registry URL, or a model constant object
modelType	`string`	✓	—	The type of model — see model type variants
modelConfig	`object`	✗	`{}`	Model-specific configuration (varies by `modelType`)
seed	`boolean`	✗	`false`	Whether to seed the model on Hyperdrive after download
delegate	`Delegate`	✗	—	Delegation configuration for remote inference
onProgress	`(progress: ModelProgressUpdate) => void`	✗	—	Callback for real-time download progress
logger	`Logger`	✗	—	Logger instance — model operation logs are forwarded to this logger

`Delegate`

Optional delegation configuration for remote (P2P) inference:

Field	Type	Required?	Default	Description
topic	`string`	✓	—	P2P topic for delegation
providerPublicKey	`string`	✓	—	Provider's public key
timeout	`number`	✗	—	Timeout in milliseconds (min 100)
fallbackToLocal	`boolean`	✗	`false`	Whether to fallback to local inference if delegation fails
forceNewConnection	`boolean`	✗	`false`	Force a new connection to the provider

`ReloadConfigOptions`

Hot-reload configuration on an already-loaded model without reloading the model weights. Currently supported for Whisper models only.

Field	Type	Required?	Description
modelId	`string`	✓	The ID of an existing loaded model (16-char hex)
modelType	`string`	✓	The type of model (must match the loaded model)
modelConfig	`object`	✓	New configuration to apply

Model type variants

The modelType field determines which variant of modelConfig is accepted.

`"llm"`

All LLM-specific fields live inside modelConfig. See LLM modelConfig for the full reference.

`"whisper"`

All Whisper-specific fields live inside modelConfig. See Whisper modelConfig for the full reference.

`"parakeet"`

NVIDIA NeMo Parakeet models for speech recognition. modelConfig is required.

See Parakeet modelConfig for the full reference.

`"embeddings"`

All embeddings fields live inside modelConfig. See Embeddings modelConfig for the full reference.

`"nmt"`

modelConfig is required and is a discriminated union on engine. See NMT modelConfig for the full reference.

`"tts"`

modelConfig is required and is a discriminated union on ttsEngine:

Chatterbox engine (ttsEngine: "chatterbox"):

Field	Type	Required?	Description
ttsEngine	`"chatterbox"`	✓	Engine discriminator
language	`"en" \| "es" \| "de" \| "it"`	✓	Output language
ttsTokenizerSrc	`string \| ModelDescriptor`	✓	Tokenizer model source
ttsSpeechEncoderSrc	`string \| ModelDescriptor`	✓	Speech encoder model source
ttsEmbedTokensSrc	`string \| ModelDescriptor`	✓	Embed tokens model source
ttsConditionalDecoderSrc	`string \| ModelDescriptor`	✓	Conditional decoder model source
ttsLanguageModelSrc	`string \| ModelDescriptor`	✓	Language model source
referenceAudioSrc	`string \| ModelDescriptor`	✓	Reference WAV file for voice cloning

Supertonic engine (ttsEngine: "supertonic"):

Field	Type	Required?	Description
ttsEngine	`"supertonic"`	✓	Engine discriminator
language	`"en" \| "es" \| "de" \| "it"`	✓	Output language
ttsTokenizerSrc	`string \| ModelDescriptor`	✓	Tokenizer model source
ttsTextEncoderSrc	`string \| ModelDescriptor`	✓	Text encoder model source
ttsLatentDenoiserSrc	`string \| ModelDescriptor`	✓	Latent denoiser model source
ttsVoiceDecoderSrc	`string \| ModelDescriptor`	✓	Voice decoder model source
ttsVoiceSrc	`string \| ModelDescriptor`	✓	Voice `.bin` file source
ttsSpeed	`number`	✗	Speech speed multiplier
ttsNumInferenceSteps	`number`	✗	Number of inference steps

`"ocr"`

All OCR-specific fields live inside modelConfig. See OCR modelConfig for the full reference.

Custom plugin

Any modelType string that is not a built-in type. modelConfig accepts Record<string, unknown>.

`modelConfig` reference

LLM `modelConfig`

Field	Type	Default	Description
ctx_size	`number`	`1024`	Context window size
device	`string`	`"gpu"`	Device to use
gpu_layers	`number`	`99`	Number of layers offloaded to GPU
system_prompt	`string`	`"You are a helpful assistant."`	System prompt
temp	`number`	—	Temperature (0–2)
top_p	`number`	—	Top-p sampling (0–1)
top_k	`number`	—	Top-k sampling (0–128)
seed	`number`	—	Random seed
predict	`number`	—	Max tokens to predict. `-1` = until stop token, `-2` = until context filled
lora	`string`	—	LoRA adapter path
no_mmap	`boolean`	—	Disable memory-mapped I/O
verbosity	`0 \| 1 \| 2 \| 3`	—	Engine verbosity — use exported `VERBOSITY` constant
presence_penalty	`number`	—	Presence penalty
frequency_penalty	`number`	—	Frequency penalty
repeat_penalty	`number`	—	Repeat penalty
stop_sequences	`string[]`	—	Custom stop sequences
n_discarded	`number`	—	Number of discarded tokens
tools	`boolean`	—	Enable tool calling support
projectionModelSrc	`string \| ModelDescriptor`	—	Projection model source for multimodal models

Whisper `modelConfig`

Common fields:

Field	Type	Description
language	`string`	Language code (e.g., `"en"`)
translate	`boolean`	Whether to translate to English
strategy	`"greedy" \| "beam_search"`	Sampling strategy
temperature	`number`	Temperature
initial_prompt	`string`	Initial prompt for the decoder
detect_language	`boolean`	Auto-detect language
vad_params	`object`	VAD parameters — `{ threshold?, min_speech_duration_ms?, min_silence_duration_ms?, max_speech_duration_s?, speech_pad_ms?, samples_overlap? }`
audio_format	`"f32le" \| "s16le"`	Audio format
contextParams	`object`	Context parameters — `{ model?, use_gpu?, flash_attn?, gpu_device? }`
miscConfig	`object`	Miscellaneous config — `{ caption_enabled? }`
vadModelSrc	`string \| ModelDescriptor`	VAD model source for voice activity detection

Additional fields: n_threads, n_max_text_ctx, offset_ms, duration_ms, audio_ctx, no_context, no_timestamps, single_segment, print_special, print_progress, print_realtime, print_timestamps, token_timestamps, thold_pt, thold_ptsum, max_len, split_on_word, max_tokens, debug_mode, tdrz_enable, suppress_regex, suppress_blank, suppress_nst, length_penalty, temperature_inc, entropy_thold, logprob_thold, greedy_best_of, beam_search_beam_size. All optional. See whisperConfigSchema in the source for details.

Parakeet `modelConfig`

modelConfig is required. Parakeet models support three variants via modelType: "tdt" (default), "ctc", and "sortformer".

Runtime config:

Field	Type	Default	Description
modelType	`"tdt" \| "ctc" \| "sortformer"`	`"tdt"`	Parakeet model variant
maxThreads	`number`	—	Maximum inference threads
useGPU	`boolean`	—	Use GPU acceleration
sampleRate	`number`	—	Audio sample rate
channels	`number`	—	Audio channels
captionEnabled	`boolean`	—	Enable caption mode
timestampsEnabled	`boolean`	—	Enable timestamps in output

Model sources (all string \| ModelDescriptor, all optional):

Field	Description
parakeetEncoderSrc	TDT encoder model source
parakeetEncoderDataSrc	TDT encoder data source
parakeetDecoderSrc	TDT decoder model source
parakeetVocabSrc	TDT vocabulary source
parakeetPreprocessorSrc	TDT preprocessor source
parakeetCtcModelSrc	CTC model source
parakeetCtcModelDataSrc	CTC model data source
parakeetTokenizerSrc	CTC tokenizer source
parakeetSortformerSrc	Sortformer model source

Embeddings `modelConfig`

Field	Type	Default	Description
gpuLayers	`number`	`99`	Number of layers offloaded to GPU
device	`"gpu" \| "cpu"`	`"gpu"`	Device to use
batchSize	`number`	`1024`	Embedding batch size
pooling	`"none" \| "mean" \| "cls" \| "last" \| "rank"`	—	Pooling strategy
attention	`"causal" \| "non-causal"`	—	Attention type
embdNormalize	`number`	—	Embedding normalization (integer)
flashAttention	`"on" \| "off" \| "auto"`	—	Flash attention toggle
mainGpu	`number \| "integrated" \| "dedicated"`	—	GPU device selection
verbosity	`0 \| 1 \| 2 \| 3`	—	Engine verbosity — use exported `VERBOSITY` constant

NMT `modelConfig`

Discriminated union on engine. Common generation parameters (all optional):

Field	Type	Default	Description
mode	`"full"`	`"full"`	Translation mode
beamsize	`number`	`4`	Beam size
lengthpenalty	`number`	`1.0`	Length penalty
maxlength	`number`	`512`	Max output length
repetitionpenalty	`number`	`1.0`	Repetition penalty
norepeatngramsize	`number`	`0`	No-repeat n-gram size
temperature	`number`	`0.3`	Temperature
topk	`number`	`0`	Top-k sampling
topp	`number`	`1.0`	Top-p sampling

Engine-specific:

Opus: from/to accept "en" | "de" | "es" | "it" | "ru" | "ja"
Bergamot: from/to accept 24 languages (en, ar, bg, ca, cs, de, es, et, fi, fr, hu, is, it, ja, ko, lt, lv, nl, pl, pt, ru, sk, sl, uk, zh). Additional fields: srcVocabSrc, dstVocabSrc, normalize, pivotModel
IndicTrans: from/to accept 26 Indic language codes (e.g., "eng_Latn", "hin_Deva")

Bergamot pivotModel (optional) — for translation via an intermediate language:

Field	Type	Required?	Description
modelSrc	`string \| ModelDescriptor`	✓	Pivot model source
srcVocabSrc	`string \| ModelDescriptor`	✗	Source vocabulary file
dstVocabSrc	`string \| ModelDescriptor`	✗	Destination vocabulary file
normalize	`number`	✗	Normalization factor

Plus all common generation parameters above.

OCR `modelConfig`

Field	Type	Description
langList	`string[]`	Languages to detect
useGPU	`boolean`	Use GPU acceleration
timeout	`number`	Timeout in milliseconds
pipelineMode	`"easyocr" \| "doctr"`	OCR pipeline mode
magRatio	`number`	Magnification ratio for detection
defaultRotationAngles	`number[]`	Rotation angles to try
contrastRetry	`boolean`	Retry with contrast adjustment
lowConfidenceThreshold	`number`	Threshold for low-confidence filtering
recognizerBatchSize	`number`	Batch size for recognizer
decodingMethod	`"ctc" \| "attention"`	Decoding method
straightenPages	`boolean`	Straighten pages before recognition
detectorModelSrc	`string \| ModelDescriptor`	Detector model source

`ModelProgressUpdate`

Field	Type	Description
type	`"modelProgress"`	Event type
downloaded	`number`	Bytes downloaded so far
total	`number`	Total bytes expected
percentage	`number`	Download percentage
downloadKey	`string`	Unique download key (use with `cancel()`)
shardInfo	`object`	Shard progress (optional, for sharded models)
shardInfo.currentShard	`number`	Current shard index
shardInfo.totalShards	`number`	Total number of shards
shardInfo.shardName	`string`	Current shard file name
shardInfo.overallDownloaded	`number`	Total bytes downloaded across all shards
shardInfo.overallTotal	`number`	Total bytes across all shards
shardInfo.overallPercentage	`number`	Overall percentage across all shards
onnxInfo	`object`	ONNX multi-file progress (optional, for ONNX models)
onnxInfo.currentFile	`string`	Current file being downloaded
onnxInfo.fileIndex	`number`	Current file index
onnxInfo.totalFiles	`number`	Total number of files
onnxInfo.overallDownloaded	`number`	Total bytes downloaded across all files
onnxInfo.overallTotal	`number`	Total bytes across all files
onnxInfo.overallPercentage	`number`	Overall percentage across all files

Returns

Promise<string> — Resolves to the model ID (used to reference the model in subsequent API calls).

Throws

Error	When
`MODEL_LOAD_FAILED`	Model loading fails
`STREAM_ENDED_WITHOUT_RESPONSE`	Streaming ends without a final response (when using `onProgress`)
`INVALID_RESPONSE_TYPE`	Response type does not match expected `"loadModel"`

Example

// Local file path
const modelId = await loadModel({
  modelSrc: "/home/user/models/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Remote URL with progress tracking
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../model.gguf",
  modelType: "llm",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});

// Hyperdrive URL
const modelId = await loadModel({
  modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Multimodal model with projection
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../main-model.gguf",
  modelType: "llm",
  modelConfig: {
    ctx_size: 512,
    projectionModelSrc: "https://huggingface.co/.../projection-model.gguf"
  }
});

// Whisper with VAD model
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../whisper-model.gguf",
  modelType: "whisper",
  modelConfig: {
    language: "en",
    strategy: "greedy",
    vadModelSrc: "https://huggingface.co/.../vad-model.bin"
  }
});

// With logger forwarding
import { getLogger } from "@qvac/sdk";
const logger = getLogger("my-app");

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
  logger
});

loadModel( )

On this page