Cropzoom pipeline

For setups where an animal is freely moving in a large arena, it’s advantageous to crop around the animal before running pose estimation. Lightning Pose calls this technique “cropzoom”. This document describes how to set up such a pipeline.

Tip

This pipeline works equally well with bounding boxes from any external source (e.g., idtracker.ai, SAM3, or your own custom detection scripts). Wherever the examples below call litpose predict [detector_model] and litpose create_bbox [detector_model], substitute your own scripts that produce bbox CSV files in the required format (see Using bboxes from an external source below). All downstream steps are identical.

Conceptual overview

A cropzoom pipeline consists of two Lightning Pose models: a “detector model” and a “pose model”.

  • The detector model operates on the full image of the arena.

  • The pose model operates on the cropped animal.

These two models are trained and predicted like any other Lightning Pose model. We provide additional tools that help you compose these models:

  • litpose create_bbox: Given the detector model’s predictions, computes per-frame bounding boxes and saves them as CSV files.

  • litpose smooth_bbox: (optional) Applies temporal smoothing to bbox CSV files, which can reduce jitter in the cropped region.

  • litpose crop: Given a directory of bbox CSV files, crops the animal out of each frame or video.

  • litpose remap: Given the pose model’s predictions and the crop bounding boxes, remaps the predictions to the original coordinate space.

Alternatively, litpose predict --bbox_dir combines the crop, predict, and remap steps into a single command — see Prediction on videos for details.

For the full command-line reference for these tools, see the CLI page sections: Create bbox, Smooth bbox, Crop, and Remap.

Training

Training involves:

  1. Train a “detector model”.

  2. Predict on training data using the detector model.

  3. Create bounding boxes from detector predictions.

  4. Crop training data for the pose model.

  5. Train a “pose model”.

Inference

Inference involves:

  1. Predict using the “detector model”.

  2. Create bounding boxes from detector predictions.

  3. (optional) Smooth the bounding boxes.

  4. Crop the data using the bounding boxes.

  5. Predict on the cropped data using the “pose model”.

  6. Remap the pose model’s predictions to the original coordinate space.

Note

Steps 4–6 can be replaced by a single litpose predict --bbox_dir call — see Prediction on videos in the example below.

Bounding box sizing

The litpose create_bbox command supports two ways to size the bounding box around the animal. The two options are mutually exclusive; if neither is provided, --crop_ratio=2.0 is used.

--crop_ratio (default)

Sizes the bounding box relative to the per-frame span of the detected keypoints. A value of 2.0 (the default) produces a box twice as wide and tall as the spread of keypoints. Use this when the animal’s apparent size varies across frames.

litpose create_bbox $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 --crop_ratio 2.0
--crop_size

Produces a fixed square bounding box of the given pixel size, centred on the per-frame mean of the detected keypoints. Use this when the animal occupies a roughly consistent region of the frame and you want uniform crop dimensions.

litpose create_bbox $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 --crop_size 200

Bounding box smoothing

After running litpose create_bbox, you can optionally smooth the resulting bbox files with litpose smooth_bbox. Smoothing reduces per-frame jitter in the crop region, which can improve pose estimation quality when the detector produces noisy predictions.

Smoothed bboxes are written to a new directory alongside a metadata.json file that records the smoothing parameters. You can then pass this directory to litpose crop or litpose predict via --bbox_dir.

litpose smooth_bbox $MODEL_DIR/$DETECTOR_MODEL/video_preds \
    --output_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds/bboxes_smooth_w5 \
    --window 5

Using bboxes from an external source

Because litpose crop and litpose predict both accept a --bbox_dir argument, you can use bounding boxes produced by any external tool, not just litpose create_bbox (e.g., idtracker.ai, SAM3, etc.). Place your bbox CSV files in a directory following the naming convention below, then pass --bbox_dir to either command:

  • Videos: <video_stem>_bbox.csv (one file per video)

  • Labeled frames: bbox.csv

Each bbox CSV must have columns x, y, h, w (top-left corner and size in pixels), with one row per frame.

Example

This is a basic example of how you can setup a cropzoom pipeline. Paths to CSV and MP4 files below should be replaced with your files. The example is illustrative only. In reality you might be interested in making modifications to this such as:

  1. Using different model type, backbone, image_resize_dims for your detector model and pose model. This can be accomplished using different config files for the detector and pose model.

  2. Limiting train_frames and max_epochs for testing purposes.

  3. Choosing --crop_ratio or --crop_size to suit your data (see Bounding box sizing above).

We’ll use some bash variables to avoid repeating paths below:

MODEL_DIR=outputs/chickadee/cropzoom
DETECTOR_MODEL=detector_0
POSE_MODEL=pose_supervised_0

Training script

#!/bin/bash

# Train the detector model.
litpose train config.yaml --output_dir $MODEL_DIR/$DETECTOR_MODEL

# Predict on training data with the detector model.
litpose predict $MODEL_DIR/$DETECTOR_MODEL data/CollectedData.csv

# Create bounding boxes from detector predictions.
# Use --crop_ratio (default) or --crop_size to control bounding box size.
litpose create_bbox $MODEL_DIR/$DETECTOR_MODEL data/CollectedData.csv

# Crop images for pose model training.
litpose crop $MODEL_DIR/$DETECTOR_MODEL data/CollectedData.csv

# Train the pose model.
litpose train config.yaml --output_dir $MODEL_DIR/$POSE_MODEL \
    --detector_model $MODEL_DIR/$DETECTOR_MODEL

For command-line options of the commands used above, see Create bbox and Crop.

Prediction on videos

Pros

  • Intermediate cropped videos are stored on disk, making it straightforward to inspect each stage of the pipeline and diagnose potential issues in the detector or pose estimator.

Cons

  • Cropped videos are written to disk, requiring additional storage and compute.

  • An extra remap step is needed to convert predictions back to the original coordinate space.

#!/bin/bash

litpose predict $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4

litpose create_bbox $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4

# Optional: smooth bboxes before cropping.
litpose smooth_bbox $MODEL_DIR/$DETECTOR_MODEL/video_preds \
    --output_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds/bboxes_smooth

# Crop using raw bboxes (default):
litpose crop $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4

# Or crop using smoothed bboxes:
litpose crop $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 \
    --bbox_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds/bboxes_smooth

litpose predict $MODEL_DIR/$POSE_MODEL \
    $MODEL_DIR/$DETECTOR_MODEL/cropped_videos/cropped_test_vid.mp4

litpose remap $MODEL_DIR/$POSE_MODEL/video_preds/cropped_test_vid.csv \
    $MODEL_DIR/$DETECTOR_MODEL/video_preds/test_vid_bbox.csv

Pros

  • Runs entirely in memory — no intermediate cropped files are written to disk.

  • Predictions are returned in the original coordinate space; no remap step is needed.

Cons

  • Without intermediate cropped files, it is harder to inspect the detector output or diagnose issues at each stage of the pipeline.

#!/bin/bash

litpose predict $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4

litpose create_bbox $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4

# Optional: smooth bboxes.
litpose smooth_bbox $MODEL_DIR/$DETECTOR_MODEL/video_preds \
    --output_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds/bboxes_smooth

# Predict using raw bboxes:
litpose predict $MODEL_DIR/$POSE_MODEL data/videos/test_vid.mp4 \
    --bbox_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds

# Or predict using smoothed bboxes:
litpose predict $MODEL_DIR/$POSE_MODEL data/videos/test_vid.mp4 \
    --bbox_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds/bboxes_smooth

For detailed command-line options, see Create bbox, Smooth bbox, Crop, and Remap.

Limitations

  • Pose models do not yet support PCA Multiview loss.