######################## Cropzoom pipeline ######################## For setups where an animal is freely moving in a large arena, it's advantageous to crop around the animal before running pose estimation. Lightning Pose calls this technique "cropzoom". This document describes how to set up such a pipeline. .. tip:: This pipeline works equally well with bounding boxes from any external source (e.g., `idtracker.ai `_, `SAM3 `_, or your own custom detection scripts). Wherever the examples below call ``litpose predict [detector_model]`` and ``litpose create_bbox [detector_model]``, substitute your own scripts that produce bbox CSV files in the required format (see `Using bboxes from an external source`_ below). All downstream steps are identical. Conceptual overview =================== A cropzoom pipeline consists of two Lightning Pose models: a "detector model" and a "pose model". * The detector model operates on the full image of the arena. * The pose model operates on the cropped animal. These two models are trained and predicted like any other Lightning Pose model. We provide additional tools that help you compose these models: * ``litpose create_bbox``: Given the detector model's predictions, computes per-frame bounding boxes and saves them as CSV files. * ``litpose smooth_bbox``: *(optional)* Applies temporal smoothing to bbox CSV files, which can reduce jitter in the cropped region. * ``litpose crop``: Given a directory of bbox CSV files, crops the animal out of each frame or video. * ``litpose remap``: Given the pose model's predictions and the crop bounding boxes, remaps the predictions to the original coordinate space. Alternatively, ``litpose predict --bbox_dir`` combines the crop, predict, and remap steps into a single command — see :ref:`prediction-on-videos` for details. For the full command-line reference for these tools, see the CLI page sections: :ref:`Create bbox `, :ref:`Smooth bbox `, :ref:`Crop `, and :ref:`Remap `. Training -------- Training involves: 1. Train a "detector model". 2. Predict on training data using the detector model. 3. Create bounding boxes from detector predictions. 4. Crop training data for the pose model. 5. Train a "pose model". Inference --------- Inference involves: 1. Predict using the "detector model". 2. Create bounding boxes from detector predictions. 3. *(optional)* Smooth the bounding boxes. 4. Crop the data using the bounding boxes. 5. Predict on the cropped data using the "pose model". 6. Remap the pose model's predictions to the original coordinate space. .. note:: Steps 4–6 can be replaced by a single ``litpose predict --bbox_dir`` call — see :ref:`prediction-on-videos` in the example below. Bounding box sizing =================== The ``litpose create_bbox`` command supports two ways to size the bounding box around the animal. The two options are mutually exclusive; if neither is provided, ``--crop_ratio=2.0`` is used. ``--crop_ratio`` (default) Sizes the bounding box relative to the per-frame span of the detected keypoints. A value of 2.0 (the default) produces a box twice as wide and tall as the spread of keypoints. Use this when the animal's apparent size varies across frames. .. code-block:: bash litpose create_bbox $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 --crop_ratio 2.0 ``--crop_size`` Produces a fixed square bounding box of the given pixel size, centred on the per-frame mean of the detected keypoints. Use this when the animal occupies a roughly consistent region of the frame and you want uniform crop dimensions. .. code-block:: bash litpose create_bbox $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 --crop_size 200 Bounding box smoothing ====================== After running ``litpose create_bbox``, you can optionally smooth the resulting bbox files with ``litpose smooth_bbox``. Smoothing reduces per-frame jitter in the crop region, which can improve pose estimation quality when the detector produces noisy predictions. Smoothed bboxes are written to a **new directory** alongside a ``metadata.json`` file that records the smoothing parameters. You can then pass this directory to ``litpose crop`` or ``litpose predict`` via ``--bbox_dir``. .. code-block:: bash litpose smooth_bbox $MODEL_DIR/$DETECTOR_MODEL/video_preds \ --output_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds/bboxes_smooth_w5 \ --window 5 Using bboxes from an external source ===================================== Because ``litpose crop`` and ``litpose predict`` both accept a ``--bbox_dir`` argument, you can use bounding boxes produced by any external tool, not just ``litpose create_bbox`` (e.g., `idtracker.ai `_, `SAM3 `_, etc.). Place your bbox CSV files in a directory following the naming convention below, then pass ``--bbox_dir`` to either command: * **Videos**: ``_bbox.csv`` (one file per video) * **Labeled frames**: ``bbox.csv`` Each bbox CSV must have columns ``x``, ``y``, ``h``, ``w`` (top-left corner and size in pixels), with one row per frame. Example ======= This is a basic example of how you can setup a cropzoom pipeline. Paths to CSV and MP4 files below should be replaced with your files. The example is illustrative only. In reality you might be interested in making modifications to this such as: 1. Using different model type, backbone, image_resize_dims for your detector model and pose model. This can be accomplished using different config files for the detector and pose model. 2. Limiting ``train_frames`` and ``max_epochs`` for testing purposes. 3. Choosing ``--crop_ratio`` or ``--crop_size`` to suit your data (see `Bounding box sizing`_ above). We'll use some bash variables to avoid repeating paths below: .. code-block:: bash MODEL_DIR=outputs/chickadee/cropzoom DETECTOR_MODEL=detector_0 POSE_MODEL=pose_supervised_0 Training script --------------- .. code-block:: bash #!/bin/bash # Train the detector model. litpose train config.yaml --output_dir $MODEL_DIR/$DETECTOR_MODEL # Predict on training data with the detector model. litpose predict $MODEL_DIR/$DETECTOR_MODEL data/CollectedData.csv # Create bounding boxes from detector predictions. # Use --crop_ratio (default) or --crop_size to control bounding box size. litpose create_bbox $MODEL_DIR/$DETECTOR_MODEL data/CollectedData.csv # Crop images for pose model training. litpose crop $MODEL_DIR/$DETECTOR_MODEL data/CollectedData.csv # Train the pose model. litpose train config.yaml --output_dir $MODEL_DIR/$POSE_MODEL \ --detector_model $MODEL_DIR/$DETECTOR_MODEL For command-line options of the commands used above, see :ref:`Create bbox ` and :ref:`Crop `. .. _prediction-on-videos: Prediction on videos -------------------- .. tab-set:: .. tab-item:: Predict from cropped videos **Pros** - Intermediate cropped videos are stored on disk, making it straightforward to inspect each stage of the pipeline and diagnose potential issues in the detector or pose estimator. **Cons** - Cropped videos are written to disk, requiring additional storage and compute. - An extra remap step is needed to convert predictions back to the original coordinate space. .. code-block:: bash #!/bin/bash litpose predict $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 litpose create_bbox $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 # Optional: smooth bboxes before cropping. litpose smooth_bbox $MODEL_DIR/$DETECTOR_MODEL/video_preds \ --output_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds/bboxes_smooth # Crop using raw bboxes (default): litpose crop $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 # Or crop using smoothed bboxes: litpose crop $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 \ --bbox_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds/bboxes_smooth litpose predict $MODEL_DIR/$POSE_MODEL \ $MODEL_DIR/$DETECTOR_MODEL/cropped_videos/cropped_test_vid.mp4 litpose remap $MODEL_DIR/$POSE_MODEL/video_preds/cropped_test_vid.csv \ $MODEL_DIR/$DETECTOR_MODEL/video_preds/test_vid_bbox.csv .. tab-item:: Predict from original videos **Pros** - Runs entirely in memory — no intermediate cropped files are written to disk. - Predictions are returned in the original coordinate space; no remap step is needed. **Cons** - Without intermediate cropped files, it is harder to inspect the detector output or diagnose issues at each stage of the pipeline. .. code-block:: bash #!/bin/bash litpose predict $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 litpose create_bbox $MODEL_DIR/$DETECTOR_MODEL data/videos/test_vid.mp4 # Optional: smooth bboxes. litpose smooth_bbox $MODEL_DIR/$DETECTOR_MODEL/video_preds \ --output_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds/bboxes_smooth # Predict using raw bboxes: litpose predict $MODEL_DIR/$POSE_MODEL data/videos/test_vid.mp4 \ --bbox_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds # Or predict using smoothed bboxes: litpose predict $MODEL_DIR/$POSE_MODEL data/videos/test_vid.mp4 \ --bbox_dir $MODEL_DIR/$DETECTOR_MODEL/video_preds/bboxes_smooth For detailed command-line options, see :ref:`Create bbox `, :ref:`Smooth bbox `, :ref:`Crop `, and :ref:`Remap `. Limitations =========== * Pose models do not yet support PCA Multiview loss.