Lightning Pose API

Train function

lightning_pose.train.train(cfg: DictConfig | ListConfig, model_dir: str | Path | None = None, skip_evaluation: bool = False) → Model[source]

Train a model using the configuration cfg, saving outputs to model_dir.

Parameters:

cfg – hydra config object.
model_dir – directory to save model outputs; defaults to cwd if unspecified.
skip_evaluation – if True, skip post-training evaluation.

Returns:

trained Model instance.

To train a model using config.yaml and output to outputs/doc_model:

import os
from lightning_pose.train import train
from omegaconf import OmegaConf

cfg = OmegaConf.load("config.yaml")
os.chdir("outputs/doc_model")
train(cfg)

To override settings before training:

cfg = OmegaConf.load("config.yaml")
overrides = {
    "training": {
        "min_epochs": 5,
        "max_epochs": 5
    }
}
cfg = OmegaConf.merge(cfg, overrides)
train(cfg)

Training returns a Model object, which is described next.

Model class

The Model class provides an easy-to-use interface to a lightning-pose model. It supports running inference and accessing model metadata. The set of supported Model operations will expand as we continue development.

You create a model object using Model.from_dir:

from lightning_pose.api.model import Model

model = Model.from_dir("outputs/doc_model")

Then, to predict on new data:

model.predict_on_video_file("path/to/video.mp4")

or:

model.predict_on_label_csv("path/to/csv_file.csv")

To predict on a single numpy frame (no file I/O):

import numpy as np

frame = np.array(...)  # (H, W, 3) uint8 RGB
result = model.predict_frame(frame)
keypoints = result["keypoints"]   # (num_kp, 2) float32
confidence = result["confidence"] # (num_kp,) float32

API Reference

class lightning_pose.api.model.Model[source]

High-level interface for inference with a trained lightning-pose model.

Load a saved model with Model.from_dir, then call prediction methods directly. Model weights are loaded lazily on the first prediction call.

model_dir

absolute path to the directory the model is stored in.

Type:: pathlib.Path

config

the model configuration as a ModelConfig object.

Type:: lightning_pose.api.model_config.ModelConfig

model

the underlying PyTorch model; None until the first prediction call.

Type:: lightning_pose.models.heatmap_tracker.HeatmapTracker | lightning_pose.models.heatmap_tracker.SemiSupervisedHeatmapTracker | lightning_pose.models.heatmap_tracker_mhcrnn.HeatmapTrackerMHCRNN | lightning_pose.models.heatmap_tracker_mhcrnn.SemiSupervisedHeatmapTrackerMHCRNN | lightning_pose.models.heatmap_tracker_multiview.HeatmapTrackerMultiviewTransformer | lightning_pose.models.heatmap_tracker_multiview.SemiSupervisedHeatmapTrackerMultiviewTransformer | lightning_pose.models.regression_tracker.RegressionTracker | lightning_pose.models.regression_tracker.SemiSupervisedRegressionTracker | None

Examples

>>> from lightning_pose.api import Model
>>> model = Model.from_dir("outputs/2024-01-01/12-00-00")

Single-frame inference (no file I/O): >>> import numpy as np >>> frame = np.zeros((256, 256, 3), dtype=np.uint8) >>> result = model.predict_frame(frame) >>> result[“keypoints”].shape # (num_keypoints, 2) >>> result[“confidence”].shape # (num_keypoints,)

Predict on a video file: >>> pred_result = model.predict_on_video_file(“path/to/video.mp4”) >>> pred_result.predictions # pd.DataFrame with MultiIndex columns >>> pred_result.metrics # ComputeMetricsSingleResult or None

Predict on a labeled CSV (also computes pixel error): >>> pred_result = model.predict_on_label_csv(“path/to/CollectedData.csv”)

property cfg: DictConfig | ListConfig: The model configuration as an omegaconf.DictConfig.

config: ModelConfig: The model configuration stored as a ModelConfig object. ModelConfig wraps the omegaconf.DictConfig and provides util functions over it.

cropped_csv_file_path(csv_file_path: str | Path) → Path[source]

Return the path where a cropzoom-adjusted CSV file will be saved.

Parameters:: csv_file_path – path to the original labeled CSV file.
Returns:: path of the form {model_dir}/image_preds/{csv_name}/cropped_{csv_name}.

cropped_data_dir() → Path[source]: Return the directory where cropzoom-cropped images are saved.

cropped_videos_dir() → Path[source]: Return the directory where cropzoom-cropped videos are saved.

static from_dir(model_dir: str | Path, precision: Literal['fp32', 'fp16', 'bf16'] = 'fp32') → Model[source]

Create a Model instance for a model stored at model_dir.

Parameters:

model_dir – path to a model output directory containing config.yaml and a .ckpt checkpoint file.
precision – precision to run inference at. One of "fp32" (default), "fp16", or "bf16" – same strings as the litpose predict --precision CLI flag. Does not affect the checkpoint itself – weights stay fp32 on disk; this only controls the precision used during the forward pass.

Returns:

Model ready for inference. Weights are loaded lazily on the first prediction call.

Examples

>>> from lightning_pose.api import Model
>>> model = Model.from_dir("outputs/2024-01-01/12-00-00")
>>> model.config.is_multi_view()
False

Run inference in FP16: >>> model = Model.from_dir(“outputs/2024-01-01/12-00-00”, precision=”fp16”)

image_preds_dir() → Path[source]: Return the directory where image/CSV predictions are saved.

labeled_videos_dir() → Path[source]: Return the directory where prediction-annotated videos are saved.

model_dir: Path: Directory the model is stored in.

property pl_precision: Literal['32-true', '16-mixed', 'bf16-mixed']

PyTorch Lightning Trainer precision string for self.precision.

Internal plumbing for the two pl.Trainer construction sites in lightning_pose.utils.predictions. User-facing code should read/set self.precision ("fp32"/"fp16"/"bf16") instead.

precision: Literal['fp32', 'fp16', 'bf16'] = 'fp32'

"fp32", "fp16", or "bf16" (same strings as the litpose predict --precision CLI flag). Does not affect the checkpoint on disk.

Type:: Precision used for inference

predict_frame(frame_rgb: ndarray, bbox: tuple[int, int, int, int] | None = None) → dict[str, ndarray][source]

Single-frame inference. No file I/O, no DALI.

Preprocessing uses cv2 (not DALI). Results will differ numerically from predict_on_video_file due to interpolation and normalization differences. Do not mix results from the two paths in quantitative analysis.

For MHCRNN (context) models, pass a (T, H, W, 3) array where T is the temporal context length (typically 5). Passing a single frame to a context model raises ValueError — use predict_on_video_file for proper temporal inference.

The first call triggers model loading and CUDA initialization, which may take several seconds. Subsequent calls are fast (~5-50ms depending on backbone). For latency-sensitive loops, call once on a dummy frame before entering the loop.

Parameters:

frame_rgb – (H, W, 3) uint8 RGB array for standard models, or (T, H, W, 3) uint8 RGB array for context (MHCRNN) models.
bbox – Optional (x, y, w, h) crop region. Note: this is (x, y, width, height), NOT (x1, y1, x2, y2). If provided, crops first, then remaps keypoints back to original coordinates.

Returns:

(num_kp, 2) float32 array (x, y) in original frame coords,

”confidence”: (num_kp,) float32 in [0, 1] – likelihood/confidence: per keypoint. For regression models, confidence is always 1.0.}

Return type:

{“keypoints”

Raises:

ValueError – If frame_rgb has wrong shape/dtype, bbox has non-positive dimensions, bbox produces an empty crop, or a context model receives single-frame input.

Examples

>>> import numpy as np
>>> frame = np.zeros((256, 256, 3), dtype=np.uint8)
>>> result = model.predict_frame(frame)
>>> result["keypoints"].shape    # (num_keypoints, 2)
>>> result["confidence"].shape   # (num_keypoints,)

With a bounding-box crop (x, y, width, height): >>> result = model.predict_frame(frame, bbox=(100, 50, 128, 128))

Predicts on a labeled dataset and computes error/loss metrics if applicable.

Parameters:

csv_file – path to the CSV file of images and keypoint locations.
data_dir – root path for relative image paths in the CSV file. Defaults to the data_dir used during training.
compute_metrics – whether to compute pixel error and loss metrics on predictions.
add_train_val_test_set – set to True when predicting on the training dataset to add a set column to the output.
bbox_file – optional path to a bbox CSV produced by litpose create_bbox (or any compatible source). When provided, each frame is cropped to its bounding box before being passed to the model, and predictions are returned in the original (un-cropped) coordinate space.

Returns:

A PredictionResult object containing the predictions and metrics.

Return type:

PredictionResult

Examples

>>> result = model.predict_on_label_csv("path/to/CollectedData.csv")
>>> result.predictions           # pd.DataFrame with MultiIndex columns
>>> result.metrics.pixel_error   # mean pixel error per keypoint

Skip metric computation for faster inference: >>> result = model.predict_on_label_csv( … “path/to/CollectedData.csv”, … compute_metrics=False, … )

Version of predict_on_label_csv that gives models access to all views of each frame.

Parameters:: csv_file_per_view – a list of csv files each from a different view of the same session; order must match view_names in the config file.

See predict_on_label_csv docstring for other arguments.

Predicts on a video file and computes unsupervised loss metrics if applicable.

Parameters:

video_file (str | Path) – Path to the video file.
output_dir (str | Path, optional) – The directory to save outputs to. Defaults to {model_dir}/image_preds/{csv_file_name}. If set to None, outputs are not saved.
compute_metrics (bool, optional) – Whether to compute pixel error and loss metrics on predictions.
generate_labeled_video (bool, optional) – Whether to save a labeled video. Defaults to False.
progress_file (Path, optional) – Path to a file to save progress information for the App. Defaults to None.
bbox_file (str | Path, optional) – Path to a per-frame bbox CSV (columns x, y, h, w; one row per frame). When provided, each frame is cropped to its bounding box before being passed to the model, and predictions are returned in the original coordinate space. Single-view only. Defaults to None.

Returns:

A PredictionResult object containing the predictions and metrics.

Return type:

PredictionResult

Examples

>>> result = model.predict_on_video_file("path/to/video.mp4")
>>> result.predictions   # pd.DataFrame, one row per frame

Save a keypoint-annotated video alongside the predictions CSV: >>> result = model.predict_on_video_file( … “path/to/video.mp4”, … generate_labeled_video=True, … )

predict_on_video_file_multiview(video_file_per_view: list[str] | list[Path], output_dir: str | Path | None = 'unspecified', compute_metrics: bool = True, generate_labeled_video: bool = False, progress_file: Path | None = None) → MultiviewPredictionResult[source]

Version of predict_on_video_file that accesses multiple camera views of each frame.

Parameters:

video_file_per_view – a list of video files each from a different view of the same session; number of files must match view_names in the config; order does not matter as files are matched to views by filename.
output_dir – directory to save outputs to; defaults to {model_dir}/video_preds; set to None to skip saving.
compute_metrics – whether to compute pixel error and loss metrics on predictions.
generate_labeled_video – whether to save a labeled video.
progress_file – path to a file to save progress information for the App.

Returns:

object containing the predictions and metrics for each view.

video_preds_dir() → Path[source]: Return the directory where video predictions are saved.

Return types

class lightning_pose.data.datatypes.PredictionResult[source]

metrics: ComputeMetricsSingleResult | None = None

predictions: DataFrame = <dataclasses._MISSING_TYPE object>

to_dict() → dict[str, Any][source]

Return predictions and metrics as a flat dict of named numpy arrays.

All arrays have shape (n_frames, n_keypoints) and share the same row order. Metric arrays are None when the metric was not computed.

Returns:

keypoint_names: list of keypoint name strings.
index: list of frame identifiers (file paths or integer indices).
x: float array of predicted x coordinates.
y: float array of predicted y coordinates.
confidence: float array of per-keypoint likelihood in [0, 1].
pixel_error: float array or None.
temporal_norm: float array or None.
pca_singleview_error: float array or None.
pca_multiview_error: float array or None.

Return type:

dict with keys

class lightning_pose.data.datatypes.MultiviewPredictionResult[source]

metrics: dict[str, ComputeMetricsSingleResult] | None = None

predictions: dict[str, DataFrame] = <dataclasses._MISSING_TYPE object>

to_dict() → dict[str, dict[str, Any]][source]

Return predictions and metrics for each view as a flat dict of named numpy arrays.

Wraps PredictionResult.to_dict() for each view.

Returns:: dict keyed by view name, where each value is the to_dict() output for that view.

class lightning_pose.data.datatypes.ComputeMetricsSingleResult[source]

pca_mv_df: DataFrame | None = None

pca_sv_df: DataFrame | None = None

pixel_error_df: DataFrame | None = None

temporal_norm_df: DataFrame | None = None

Lightning Pose API

Train function

Model class

API Reference

Return types

Lightning Pose Internal API