Profiler

Measure and export timing metrics across model loading, inference, and P2P delegation.

Overview

@qvac/sdk npm package exposes a profiler object that you can import and use to measure and analyze how long SDK operations take in your application. You can enable profiling in two ways:

Global: call profiler.enable() to profile all subsequent operations until profiler.disable() is called.
Per-call: pass { profiling: { enabled: true } } in the options of an individual function call.

You can also enable profiling globally and use a per-call override to opt out of specific calls by passing { profiling: { enabled: false } }. Data collected through both modes is stored in and exported from the same profiler singleton. See Support for the list of SDK operations that support profiling.

profiler is a process-wide singleton — its state persists until the process exits.

`profiler`

The profiler object provides the following methods:

profiler.enable(options?) — enable profiling globally
profiler.isEnabled() — return whether profiling is enabled
profiler.onRecord(callback) — subscribe to profiling events in real time
Export collected data using any of the following:
- profiler.exportSummary() — return a high-level summary string
- profiler.exportTable() — return a detailed table of aggregated metrics
- profiler.exportJSON(options?) — return a full export as structured JSON
profiler.disable() — disable profiling
profiler.clear() — clear collected data

See API — profiler for the complete reference, including the full metrics catalog for each export function and detailed usage.

Enable profiling

By default, profiling is disabled. You can enable it globally or per-call. If you enable profiling globally, you can opt out of individual function calls.

Global

Enable profiling globally by calling profiler.enable():

import { profiler } from "@qvac/sdk";

profiler.enable({
  mode: "verbose",                 // "summary" (default) | "verbose"
  includeServerBreakdown: true,    // include server-side timing in responses
  operationFilters: ["completion"], // only profile these operations (empty = all)
});

Per-call

To profile a single call, pass the profiling option when invoking the function:

await embed(
  { modelId, text: "hello" },
  {
    profiling: {
      enabled: true,
      includeServerBreakdown: true,
      mode: "verbose",
    },
  },
);

See Support for the list of functions that accept the profiling option.

Opt-out

If you enabled profiling globally, you can pass the profiling option to opt out of a specific call:

await embed(
  { modelId, text: "hello" },
  { profiling: { enabled: false } },
);

Support

The following SDK operations support profiling:

Examples

Global

The following script enables profiling globally, loads a model, runs a completion, and exports timing data in all available formats:

profiling-basic.js

import { completion, loadModel, unloadModel, LLAMA_3_2_1B_INST_Q4_0, profiler, } from "@qvac/sdk";
try {
    // Enable profiling globally
    profiler.enable({
        mode: "verbose",
        includeServerBreakdown: true,
    });
    console.log("Profiler enabled:", profiler.isEnabled());
    const modelId = await loadModel({
        modelSrc: LLAMA_3_2_1B_INST_Q4_0,
        modelType: "llm",
        onProgress: (p) => console.log(`  ${p.percentage.toFixed(1)}%`),
    });
    console.log("Model loaded:", modelId);
    console.log("\n→ Running completion...");
    const result = completion({
        modelId,
        history: [{ role: "user", content: "Say hello in one sentence." }],
        stream: true,
    });
    for await (const token of result.tokenStream) {
        process.stdout.write(token);
    }
    console.log();
    await unloadModel({ modelId });
    // Export profiling data
    console.log("\n=== Profiler Summary ===");
    console.log(profiler.exportSummary());
    console.log("\n=== Profiler Table ===");
    console.log(profiler.exportTable());
    const json = profiler.exportJSON();
    console.log("\n=== Load Model Metrics ===");
    // Filter for operation-level event (kind: "handler"), not RPC phase events
    const loadModelEvent = json.recentEvents?.find((e) => e.op === "loadModel" && e.kind === "handler");
    if (loadModelEvent) {
        const tags = loadModelEvent.tags ?? {};
        const gauges = loadModelEvent.gauges ?? {};
        console.log("  sourceType:", tags["sourceType"] ?? "(not set)");
        console.log("  cacheHit:", tags["cacheHit"] ?? "(not set)");
        console.log("  totalLoadTime:", gauges["totalLoadTime"], "ms");
        console.log("  modelInitializationTime:", gauges["modelInitializationTime"], "ms");
        if (tags["cacheHit"] !== "true") {
            console.log("  downloadTime:", gauges["downloadTime"] ?? "(cached)", "ms");
            console.log("  totalBytesDownloaded:", gauges["totalBytesDownloaded"] ?? "(cached)");
            console.log("  downloadSpeedBps:", gauges["downloadSpeedBps"] ?? "(cached)");
        }
        else {
            console.log("  (download metrics omitted - cache hit)");
        }
        if (gauges["checksumValidationTime"] !== undefined) {
            console.log("  checksumValidationTime:", gauges["checksumValidationTime"], "ms");
        }
    }
    else {
        console.log("  (no loadModel handler event captured)");
        // Debug: show what ops are available
        const ops = [...new Set(json.recentEvents?.map((e) => `${e.op}:${e.kind}`) ?? [])];
        console.log("  Available ops:", ops.join(", "));
    }
    console.log("\n=== Profiler JSON (structure) ===");
    console.log("  aggregates:", Object.keys(json.aggregates).length, "metrics");
    console.log("  recentEvents:", json.recentEvents?.length ?? 0, "events");
    console.log("  config:", json.config);
    // Disable profiling
    profiler.disable();
    console.log("\nProfiler disabled:", !profiler.isEnabled());
}
catch (error) {
    console.error("Error:", error);
    process.exit(1);
}

Per-call

The following script keeps the profiler disabled globally and selectively profiles individual embed() calls using the per-call option:

profiling-per-call.js

import { embed, loadModel, unloadModel, GTE_LARGE_FP16, profiler, } from "@qvac/sdk";
try {
    profiler.disable();
    console.log("Profiler globally enabled:", profiler.isEnabled());
    const modelId = await loadModel({
        modelSrc: GTE_LARGE_FP16,
        modelType: "embeddings",
        onProgress: (p) => console.log(`  ${p.percentage.toFixed(1)}%`),
    });
    console.log("Model loaded:", modelId);
    console.log("\n=== Embed with per-call profiling ===");
    const embedding1 = await embed({ modelId, text: "Profile this specific call" }, { profiling: { enabled: true, includeServerBreakdown: true } });
    console.log("Embedding dimensions:", embedding1.length);
    console.log("\n=== Embed without profiling ===");
    const embedding2 = await embed({
        modelId,
        text: "This call is not profiled",
    });
    console.log("Embedding dimensions:", embedding2.length);
    console.log("\n=== Embed with profiling explicitly disabled ===");
    const embedding3 = await embed({ modelId, text: "Profiling explicitly disabled for this call" }, { profiling: { enabled: false } });
    console.log("Embedding dimensions:", embedding3.length);
    await unloadModel({ modelId });
    console.log("\n=== Profiler Summary (per-call data only) ===");
    console.log(profiler.exportSummary());
}
catch (error) {
    console.error("Error:", error);
    process.exit(1);
}

Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.

On this page