Lightning Pose 3D

This repo and the Lightning Pose App support multi-camera projects with an arbitrary number of cameras (tested with up to six). Before starting, cameras must be synchronized across views, and the resulting video files for a given session must each contain the same number of frames.

On this page:

Camera setup
Camera calibration (optional)
Data organization
Data annotation
Model training
Model inference
3D inference

Camera setup

We recommend using at least three cameras to maximize the number of views in which each keypoint is unoccluded. Cameras should be positioned at relatively orthogonal angles to one another so that each view provides complementary information.

Camera calibration (optional)

Camera calibration determines the intrinsic parameters of each camera (focal length, principal point, distortion coefficients) and the extrinsic parameters that describe how the cameras are positioned and oriented relative to each other. Together, these parameters make it possible to map 2D pixel coordinates in any view to a shared 3D world coordinate system.

We recommend using the Anipose package for calibration. If you use a different calibration tool, you will need to convert your files into the expected format.

How Lightning Pose uses calibration:

3D data augmentation: calibration parameters allow geometrically consistent augmentation across views during training (see 3D augmentations and loss for details).
3D reprojection loss: calibration enables a training loss that penalizes geometrically inconsistent 2D predictions across views.

Note

Camera calibration is not required to train a multi-view Lightning Pose model. However, calibration is required to obtain 3D coordinates unified across cameras (see 3D inference below).

Data organization

Using the App

Create a multi-view project by following the Create your first project guide. The App will store your data in the correct format automatically.

Without the App (or converting from another format)

See the multi-view directory structure reference for the expected layout.

Important for all users

Calibration files must be saved manually in the correct location, regardless of whether you use the App. See the calibration file format reference for the required location and format.

Data annotation

The App provides a multi-view annotation tool that lets you label a keypoint in two views and then uses the calibration information to automatically project those labels into the remaining views. In general, we recommend keeping the automatically projected label in each view even when the body part is occluded; doing so helps the 3D data augmentation and reprojection loss learn the geometric structure of the scene.

Multi-view annotation is time-consuming even with this assistance. We recommend the following workflow:

Label approximately 100 frames across as many individuals as possible.
Train an initial model.
Run inference on videos from new individuals (preferred) or new sessions from the same individuals to surface difficult frames.
Use the Viewer tab to identify those difficult frames and add them to your labeled set.

In general, labeling a smaller number of frames from a larger number of individuals leads to better generalization. For example, if your labeling budget is 200 frames, labeling 20 frames from 10 separate individuals is preferable to labeling 200 frames from a single individual.

Model training

Model training in the App is straightforward; see the Create your first project guide for a walkthrough.

For training via the CLI, see:

Training and inference (multi-view) — general training procedure for multi-view setups.
Patch masking and 3D loss — multi-view specific training features including patch masking and the 3D reprojection loss.

Model inference

Inference in the App follows the same workflow as for single-view projects.

For inference via the CLI, see:

Training and inference (multi-view) — covers the multi-view inference procedure and expected file layout.

3D inference

Note

Camera calibration information is required for 3D inference.

Once per-view 2D predictions are available, 3D coordinates can be reconstructed across cameras. We recommend the Ensemble Kalman Smoother (EKS) tool for this step (paper). EKS can operate on predictions from a single model or from an ensemble of models; ensembling improves accuracy and provides better-calibrated uncertainty estimates than the likelihood outputs of any single network.

Installation

git clone https://github.com/paninski-lab/eks
cd eks
pip install -e .

Alternatively, install from PyPI (no bundled example data):

pip install ensemble-kalman-smoother

Workflow

The recommended workflow is:

Train several Lightning Pose models with different random seeds (3+ recommended).
Run litpose predict with each model to produce per-view CSV files.
Organise the CSVs into a directory following the layout described below.
Run EKS to produce smoothed, ensembled predictions (and optionally 3D coordinates).

Input file layout

EKS expects Lightning Pose / DLC-format CSVs (three-row header: scorer, bodyparts, coords). For multi-camera setups with one CSV per view per seed, place all files in a single directory and include the camera name as a substring of each filename:

input_dir/
  session_Cam-A_rng=0.csv
  session_Cam-A_rng=1.csv
  session_Cam-A_rng=2.csv
  session_Cam-B_rng=0.csv
  session_Cam-B_rng=1.csv
  session_Cam-B_rng=2.csv
  calibration.toml          # Anipose-format calibration file

Note

Camera names must appear as substrings of the filenames, and no camera name may be a substring of another camera name.

Running EKS

For calibrated multi-camera data (nonlinear EKS with 3D triangulation):

eks multicam \
    --input-dir /path/to/input_dir \
    --camera-names Cam-A Cam-B \
    --calibration /path/to/input_dir/calibration.toml \
    --make-plot

For multi-camera data without calibration (linear EKS, smoothing only):

eks multicam \
    --input-dir /path/to/input_dir \
    --camera-names Cam-A Cam-B \
    --make-plot

See the EKS documentation for the full list of subcommands and options, including specialised workflows for mirrored multi-camera setups.