QVAC Logo

loadModel( )

Loads a machine learning model from a local path, remote URL, or Hyperdrive key.

// Load new model
function loadModel(options: LoadModelOptions, rpcOptions?: RPCOptions): Promise<string>;

// Hot-reload config on an already-loaded model
function loadModel(options: ReloadConfigOptions, rpcOptions?: RPCOptions): Promise<string>;

Supports multiple model types: LLM, Whisper (speech recognition), Parakeet (NVIDIA NeMo transcription), embeddings, NMT (translation), TTS, and OCR. Handles local file paths, HTTP/HTTPS URLs, Hyperdrive URLs (pear://), and registry URLs.

When onProgress is provided, streaming is used for real-time download progress. Otherwise, a simple request-response pattern is used.

Parameters

NameTypeRequired?Description
optionsLoadModelOptions | ReloadConfigOptionsConfiguration for loading or hot-reloading a model
rpcOptionsRPCOptionsOptional RPC transport options

LoadModelOptions

Common fields present in all variants:

FieldTypeRequired?DefaultDescription
modelSrcstring | ModelDescriptorModel source — local path, HTTP(S) URL, Hyperdrive pear:// URL, registry URL, or a model constant object
modelTypestringThe type of model — see model type variants
modelConfigobject{}Model-specific configuration (varies by modelType)
seedbooleanfalseWhether to seed the model on Hyperdrive after download
delegateDelegateDelegation configuration for remote inference
onProgress(progress: ModelProgressUpdate) => voidCallback for real-time download progress
loggerLoggerLogger instance — model operation logs are forwarded to this logger

Delegate

Optional delegation configuration for remote (P2P) inference:

FieldTypeRequired?DefaultDescription
topicstringP2P topic for delegation
providerPublicKeystringProvider's public key
timeoutnumberTimeout in milliseconds (min 100)
fallbackToLocalbooleanfalseWhether to fallback to local inference if delegation fails
forceNewConnectionbooleanfalseForce a new connection to the provider

ReloadConfigOptions

Hot-reload configuration on an already-loaded model without reloading the model weights. Currently supported for Whisper models only.

FieldTypeRequired?Description
modelIdstringThe ID of an existing loaded model (16-char hex)
modelTypestringThe type of model (must match the loaded model)
modelConfigobjectNew configuration to apply

Model type variants

The modelType field determines which variant of modelConfig is accepted.

"llm"

All LLM-specific fields live inside modelConfig. See LLM modelConfig for the full reference.

"whisper"

All Whisper-specific fields live inside modelConfig. See Whisper modelConfig for the full reference.

"parakeet"

NVIDIA NeMo Parakeet models for speech recognition. modelConfig is required.

See Parakeet modelConfig for the full reference.

"embeddings"

All embeddings fields live inside modelConfig. See Embeddings modelConfig for the full reference.

"nmt"

modelConfig is required and is a discriminated union on engine. See NMT modelConfig for the full reference.

"tts"

modelConfig is required and is a discriminated union on ttsEngine:

Chatterbox engine (ttsEngine: "chatterbox"):

FieldTypeRequired?Description
ttsEngine"chatterbox"Engine discriminator
language"en" | "es" | "de" | "it"Output language
ttsTokenizerSrcstring | ModelDescriptorTokenizer model source
ttsSpeechEncoderSrcstring | ModelDescriptorSpeech encoder model source
ttsEmbedTokensSrcstring | ModelDescriptorEmbed tokens model source
ttsConditionalDecoderSrcstring | ModelDescriptorConditional decoder model source
ttsLanguageModelSrcstring | ModelDescriptorLanguage model source
referenceAudioSrcstring | ModelDescriptorReference WAV file for voice cloning

Supertonic engine (ttsEngine: "supertonic"):

FieldTypeRequired?Description
ttsEngine"supertonic"Engine discriminator
language"en" | "es" | "de" | "it"Output language
ttsTokenizerSrcstring | ModelDescriptorTokenizer model source
ttsTextEncoderSrcstring | ModelDescriptorText encoder model source
ttsLatentDenoiserSrcstring | ModelDescriptorLatent denoiser model source
ttsVoiceDecoderSrcstring | ModelDescriptorVoice decoder model source
ttsVoiceSrcstring | ModelDescriptorVoice .bin file source
ttsSpeednumberSpeech speed multiplier
ttsNumInferenceStepsnumberNumber of inference steps

"ocr"

All OCR-specific fields live inside modelConfig. See OCR modelConfig for the full reference.

Custom plugin

Any modelType string that is not a built-in type. modelConfig accepts Record<string, unknown>.

modelConfig reference

LLM modelConfig

FieldTypeDefaultDescription
ctx_sizenumber1024Context window size
devicestring"gpu"Device to use
gpu_layersnumber99Number of layers offloaded to GPU
system_promptstring"You are a helpful assistant."System prompt
tempnumberTemperature (0–2)
top_pnumberTop-p sampling (0–1)
top_knumberTop-k sampling (0–128)
seednumberRandom seed
predictnumberMax tokens to predict. -1 = until stop token, -2 = until context filled
lorastringLoRA adapter path
no_mmapbooleanDisable memory-mapped I/O
verbosity0 | 1 | 2 | 3Engine verbosity — use exported VERBOSITY constant
presence_penaltynumberPresence penalty
frequency_penaltynumberFrequency penalty
repeat_penaltynumberRepeat penalty
stop_sequencesstring[]Custom stop sequences
n_discardednumberNumber of discarded tokens
toolsbooleanEnable tool calling support
projectionModelSrcstring | ModelDescriptorProjection model source for multimodal models

Whisper modelConfig

Common fields:

FieldTypeDescription
languagestringLanguage code (e.g., "en")
translatebooleanWhether to translate to English
strategy"greedy" | "beam_search"Sampling strategy
temperaturenumberTemperature
initial_promptstringInitial prompt for the decoder
detect_languagebooleanAuto-detect language
vad_paramsobjectVAD parameters — { threshold?, min_speech_duration_ms?, min_silence_duration_ms?, max_speech_duration_s?, speech_pad_ms?, samples_overlap? }
audio_format"f32le" | "s16le"Audio format
contextParamsobjectContext parameters — { model?, use_gpu?, flash_attn?, gpu_device? }
miscConfigobjectMiscellaneous config — { caption_enabled? }
vadModelSrcstring | ModelDescriptorVAD model source for voice activity detection

Additional fields: n_threads, n_max_text_ctx, offset_ms, duration_ms, audio_ctx, no_context, no_timestamps, single_segment, print_special, print_progress, print_realtime, print_timestamps, token_timestamps, thold_pt, thold_ptsum, max_len, split_on_word, max_tokens, debug_mode, tdrz_enable, suppress_regex, suppress_blank, suppress_nst, length_penalty, temperature_inc, entropy_thold, logprob_thold, greedy_best_of, beam_search_beam_size. All optional. See whisperConfigSchema in the source for details.

Parakeet modelConfig

modelConfig is required. Parakeet models support three variants via modelType: "tdt" (default), "ctc", and "sortformer".

Runtime config:

FieldTypeDefaultDescription
modelType"tdt" | "ctc" | "sortformer""tdt"Parakeet model variant
maxThreadsnumberMaximum inference threads
useGPUbooleanUse GPU acceleration
sampleRatenumberAudio sample rate
channelsnumberAudio channels
captionEnabledbooleanEnable caption mode
timestampsEnabledbooleanEnable timestamps in output

Model sources (all string \| ModelDescriptor, all optional):

FieldDescription
parakeetEncoderSrcTDT encoder model source
parakeetEncoderDataSrcTDT encoder data source
parakeetDecoderSrcTDT decoder model source
parakeetVocabSrcTDT vocabulary source
parakeetPreprocessorSrcTDT preprocessor source
parakeetCtcModelSrcCTC model source
parakeetCtcModelDataSrcCTC model data source
parakeetTokenizerSrcCTC tokenizer source
parakeetSortformerSrcSortformer model source

Embeddings modelConfig

FieldTypeDefaultDescription
gpuLayersnumber99Number of layers offloaded to GPU
device"gpu" | "cpu""gpu"Device to use
batchSizenumber1024Embedding batch size
pooling"none" | "mean" | "cls" | "last" | "rank"Pooling strategy
attention"causal" | "non-causal"Attention type
embdNormalizenumberEmbedding normalization (integer)
flashAttention"on" | "off" | "auto"Flash attention toggle
mainGpunumber | "integrated" | "dedicated"GPU device selection
verbosity0 | 1 | 2 | 3Engine verbosity — use exported VERBOSITY constant

NMT modelConfig

Discriminated union on engine. Common generation parameters (all optional):

FieldTypeDefaultDescription
mode"full""full"Translation mode
beamsizenumber4Beam size
lengthpenaltynumber1.0Length penalty
maxlengthnumber512Max output length
repetitionpenaltynumber1.0Repetition penalty
norepeatngramsizenumber0No-repeat n-gram size
temperaturenumber0.3Temperature
topknumber0Top-k sampling
toppnumber1.0Top-p sampling

Engine-specific:

  • Opus: from/to accept "en" | "de" | "es" | "it" | "ru" | "ja"
  • Bergamot: from/to accept 24 languages (en, ar, bg, ca, cs, de, es, et, fi, fr, hu, is, it, ja, ko, lt, lv, nl, pl, pt, ru, sk, sl, uk, zh). Additional fields: srcVocabSrc, dstVocabSrc, normalize, pivotModel
  • IndicTrans: from/to accept 26 Indic language codes (e.g., "eng_Latn", "hin_Deva")

Bergamot pivotModel (optional) — for translation via an intermediate language:

FieldTypeRequired?Description
modelSrcstring | ModelDescriptorPivot model source
srcVocabSrcstring | ModelDescriptorSource vocabulary file
dstVocabSrcstring | ModelDescriptorDestination vocabulary file
normalizenumberNormalization factor

Plus all common generation parameters above.

OCR modelConfig

FieldTypeDescription
langListstring[]Languages to detect
useGPUbooleanUse GPU acceleration
timeoutnumberTimeout in milliseconds
pipelineMode"easyocr" | "doctr"OCR pipeline mode
magRationumberMagnification ratio for detection
defaultRotationAnglesnumber[]Rotation angles to try
contrastRetrybooleanRetry with contrast adjustment
lowConfidenceThresholdnumberThreshold for low-confidence filtering
recognizerBatchSizenumberBatch size for recognizer
decodingMethod"ctc" | "attention"Decoding method
straightenPagesbooleanStraighten pages before recognition
detectorModelSrcstring | ModelDescriptorDetector model source

ModelProgressUpdate

FieldTypeDescription
type"modelProgress"Event type
downloadednumberBytes downloaded so far
totalnumberTotal bytes expected
percentagenumberDownload percentage
downloadKeystringUnique download key (use with cancel())
shardInfoobjectShard progress (optional, for sharded models)
shardInfo.currentShardnumberCurrent shard index
shardInfo.totalShardsnumberTotal number of shards
shardInfo.shardNamestringCurrent shard file name
shardInfo.overallDownloadednumberTotal bytes downloaded across all shards
shardInfo.overallTotalnumberTotal bytes across all shards
shardInfo.overallPercentagenumberOverall percentage across all shards
onnxInfoobjectONNX multi-file progress (optional, for ONNX models)
onnxInfo.currentFilestringCurrent file being downloaded
onnxInfo.fileIndexnumberCurrent file index
onnxInfo.totalFilesnumberTotal number of files
onnxInfo.overallDownloadednumberTotal bytes downloaded across all files
onnxInfo.overallTotalnumberTotal bytes across all files
onnxInfo.overallPercentagenumberOverall percentage across all files

Returns

Promise<string> — Resolves to the model ID (used to reference the model in subsequent API calls).

Throws

ErrorWhen
MODEL_LOAD_FAILEDModel loading fails
STREAM_ENDED_WITHOUT_RESPONSEStreaming ends without a final response (when using onProgress)
INVALID_RESPONSE_TYPEResponse type does not match expected "loadModel"

Example

// Local file path
const modelId = await loadModel({
  modelSrc: "/home/user/models/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Remote URL with progress tracking
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../model.gguf",
  modelType: "llm",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});

// Hyperdrive URL
const modelId = await loadModel({
  modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Multimodal model with projection
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../main-model.gguf",
  modelType: "llm",
  modelConfig: {
    ctx_size: 512,
    projectionModelSrc: "https://huggingface.co/.../projection-model.gguf"
  }
});

// Whisper with VAD model
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../whisper-model.gguf",
  modelType: "whisper",
  modelConfig: {
    language: "en",
    strategy: "greedy",
    vadModelSrc: "https://huggingface.co/.../vad-model.bin"
  }
});

// With logger forwarding
import { getLogger } from "@qvac/sdk";
const logger = getLogger("my-app");

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
  logger
});

On this page