Profiler
Measure and export timing metrics across model loading, inference, and P2P delegation.
Overview
@qvac/sdk npm package exposes a profiler object that you can import and use to measure and analyze how long SDK operations take in your application. You can enable profiling in two ways:
- Global: call
profiler.enable()to profile all subsequent operations untilprofiler.disable()is called. - Per-call: pass
{ profiling: { enabled: true } }in the options of an individual function call.
You can also enable profiling globally and use a per-call override to opt out of specific calls by passing { profiling: { enabled: false } }. Data collected through both modes is stored in and exported from the same profiler singleton. See Support for the list of SDK operations that support profiling.
profiler is a process-wide singleton — its state persists until the process exits.
profiler
The profiler object provides the following methods:
profiler.enable(options?)— enable profiling globallyprofiler.isEnabled()— return whether profiling is enabledprofiler.onRecord(callback)— subscribe to profiling events in real time- Export collected data using any of the following:
profiler.exportSummary()— return a high-level summary stringprofiler.exportTable()— return a detailed table of aggregated metricsprofiler.exportJSON(options?)— return a full export as structured JSON
profiler.disable()— disable profilingprofiler.clear()— clear collected data
See API — profiler for the complete reference, including the full metrics catalog for each export function and detailed usage.
Enable profiling
By default, profiling is disabled. You can enable it globally or per-call. If you enable profiling globally, you can opt out of individual function calls.
Global
Enable profiling globally by calling profiler.enable():
import { profiler } from "@qvac/sdk";
profiler.enable({
mode: "verbose", // "summary" (default) | "verbose"
includeServerBreakdown: true, // include server-side timing in responses
operationFilters: ["completion"], // only profile these operations (empty = all)
});Per-call
To profile a single call, pass the profiling option when invoking the function:
await embed(
{ modelId, text: "hello" },
{
profiling: {
enabled: true,
includeServerBreakdown: true,
mode: "verbose",
},
},
);See Support for the list of functions that accept the profiling option.
Opt-out
If you enabled profiling globally, you can pass the profiling option to opt out of a specific call:
await embed(
{ modelId, text: "hello" },
{ profiling: { enabled: false } },
);Support
The following SDK operations support profiling:
completion() | downloadAsset() | embed() | invokePlugin() | invokePluginStream() | loadModel() | ocr() | ragChunk() | ragCloseWorkspace() | ragDeleteEmbeddings() | ragDeleteWorkspace() | ragIngest() | ragListWorkspaces() | ragReindex() | ragSaveEmbeddings() | ragSearch() | textToSpeech() | transcribe() | transcribeStream() | translate()
Examples
Global
The following script enables profiling globally, loads a model, runs a completion, and exports timing data in all available formats:
import { completion, loadModel, unloadModel, LLAMA_3_2_1B_INST_Q4_0, profiler, } from "@qvac/sdk";
try {
// Enable profiling globally
profiler.enable({
mode: "verbose",
includeServerBreakdown: true,
});
console.log("Profiler enabled:", profiler.isEnabled());
const modelId = await loadModel({
modelSrc: LLAMA_3_2_1B_INST_Q4_0,
modelType: "llm",
onProgress: (p) => console.log(` ${p.percentage.toFixed(1)}%`),
});
console.log("Model loaded:", modelId);
console.log("\n→ Running completion...");
const result = completion({
modelId,
history: [{ role: "user", content: "Say hello in one sentence." }],
stream: true,
});
for await (const token of result.tokenStream) {
process.stdout.write(token);
}
console.log();
await unloadModel({ modelId });
// Export profiling data
console.log("\n=== Profiler Summary ===");
console.log(profiler.exportSummary());
console.log("\n=== Profiler Table ===");
console.log(profiler.exportTable());
const json = profiler.exportJSON();
console.log("\n=== Load Model Metrics ===");
// Filter for operation-level event (kind: "handler"), not RPC phase events
const loadModelEvent = json.recentEvents?.find((e) => e.op === "loadModel" && e.kind === "handler");
if (loadModelEvent) {
const tags = loadModelEvent.tags ?? {};
const gauges = loadModelEvent.gauges ?? {};
console.log(" sourceType:", tags["sourceType"] ?? "(not set)");
console.log(" cacheHit:", tags["cacheHit"] ?? "(not set)");
console.log(" totalLoadTime:", gauges["totalLoadTime"], "ms");
console.log(" modelInitializationTime:", gauges["modelInitializationTime"], "ms");
if (tags["cacheHit"] !== "true") {
console.log(" downloadTime:", gauges["downloadTime"] ?? "(cached)", "ms");
console.log(" totalBytesDownloaded:", gauges["totalBytesDownloaded"] ?? "(cached)");
console.log(" downloadSpeedBps:", gauges["downloadSpeedBps"] ?? "(cached)");
}
else {
console.log(" (download metrics omitted - cache hit)");
}
if (gauges["checksumValidationTime"] !== undefined) {
console.log(" checksumValidationTime:", gauges["checksumValidationTime"], "ms");
}
}
else {
console.log(" (no loadModel handler event captured)");
// Debug: show what ops are available
const ops = [...new Set(json.recentEvents?.map((e) => `${e.op}:${e.kind}`) ?? [])];
console.log(" Available ops:", ops.join(", "));
}
console.log("\n=== Profiler JSON (structure) ===");
console.log(" aggregates:", Object.keys(json.aggregates).length, "metrics");
console.log(" recentEvents:", json.recentEvents?.length ?? 0, "events");
console.log(" config:", json.config);
// Disable profiling
profiler.disable();
console.log("\nProfiler disabled:", !profiler.isEnabled());
}
catch (error) {
console.error("Error:", error);
process.exit(1);
}Per-call
The following script keeps the profiler disabled globally and selectively profiles individual embed() calls using the per-call option:
import { embed, loadModel, unloadModel, GTE_LARGE_FP16, profiler, } from "@qvac/sdk";
try {
profiler.disable();
console.log("Profiler globally enabled:", profiler.isEnabled());
const modelId = await loadModel({
modelSrc: GTE_LARGE_FP16,
modelType: "embeddings",
onProgress: (p) => console.log(` ${p.percentage.toFixed(1)}%`),
});
console.log("Model loaded:", modelId);
console.log("\n=== Embed with per-call profiling ===");
const embedding1 = await embed({ modelId, text: "Profile this specific call" }, { profiling: { enabled: true, includeServerBreakdown: true } });
console.log("Embedding dimensions:", embedding1.length);
console.log("\n=== Embed without profiling ===");
const embedding2 = await embed({
modelId,
text: "This call is not profiled",
});
console.log("Embedding dimensions:", embedding2.length);
console.log("\n=== Embed with profiling explicitly disabled ===");
const embedding3 = await embed({ modelId, text: "Profiling explicitly disabled for this call" }, { profiling: { enabled: false } });
console.log("Embedding dimensions:", embedding3.length);
await unloadModel({ modelId });
console.log("\n=== Profiler Summary (per-call data only) ===");
console.log(profiler.exportSummary());
}
catch (error) {
console.error("Error:", error);
process.exit(1);
}Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.