FAQs
View page source

FAQs

Does Lightning Pose support greyscale (single-channel) images and videos?

Yes. Lightning Pose automatically converts single-channel (greyscale) images to three-channel RGB by replicating the luminance channel, so no pre-processing is required on your end. IR cameras, greyscale high-speed cameras, and similar rigs are fully supported.

What video formats are supported by Lightning Pose?

Lightning Pose requires videos that use the h.264 codec. AVI files do not use the h.264 codec, but MP4 files typically do (though not always). The following function will check for the proper codec using ffmpeg:

import subprocess

def check_codec_format(input_file: str) -> bool:
    """Run FFprobe command to get video codec and pixel format."""

    ffmpeg_cmd = f'ffmpeg -i {input_file}'
    output_str = subprocess.run(ffmpeg_cmd, shell=True, capture_output=True, text=True)
    # stderr still has codec info
    output_str = output_str.stderr

    # search for correct codec (h264) and pixel format (yuv420p)
    if output_str.find('h264') != -1 and output_str.find('yuv420p') != -1:
        is_codec = True
    else:
        is_codec = False
    return is_codec

If your videos do not use the h.264 codec the following python code will convert them:

import os
import subprocess

def reencode_video(input_file: str, output_file: str) -> None:
    """Reencodes video into h.264 coded format using ffmpeg from a subprocess.

    Args:
        input_file: abspath to existing video
        output_file: abspath to to new mp4 video using h.264 codec

    """
    # check input file exists
    assert os.path.isfile(input_file), 'input video does not exist.'
    # check directory for saving outputs exists
    os.makedirs(os.path.dirname(output_file), exist_ok=True)
    # create ffmpeg command
    ffmpeg_cmd = f'ffmpeg -i {input_file} -c:v libx264 -pix_fmt yuv420p -c:a copy -y {output_file}'
    # run command
    subprocess.run(ffmpeg_cmd, shell=True)

Note that you can also run the ffmpeg command directly from the command line.

How should I set image_resize_dims?

image_resize_dims controls the resolution at which the model processes each frame. The default of 256×256 works well for most datasets.

If your frames are large (more than ~1000 pixels per side), increasing to 384×384 may improve accuracy. However, this increases the number of model parameters and typically requires more training data — we recommend at least 400–500 labeled frames before trying a larger resolution. We have not found any cases where increasing beyond 384×384 provides further benefit.

For very large images (1500–2000+ pixels per side) the best approach is the cropzoom pipeline, which crops a tight region around the animal before resizing, preserving detail that would otherwise be lost when downscaling a large frame to the model’s input resolution.

What if I encounter a CUDA out of memory error?

Model training can be GPU-memory-intensive, particularly when using unsupervised losses, the Temporal Context Network model, multi-view datasets, or high-resolution images. For this reason we recommend using a GPU with a minimum of 8GB of memory, but preferrably 16GB.

Some users using a combination of the memory-intensive features above may still run into issues. There are a few techniques available to reduce the memory consumption:

Reduce train_batch_size. Memory usage is directly proportional to batch size.
Enable multi-GPU training using num_gpus.
Reduce image resolution using image_resize_dims.
Enable gradient accumulation using accumulate_grad_batches. This parameter is not included in the config by default and should be added manually to the training section.

Each technique above has trade-offs. The right choice will be dependent on your individual situation.

See The configuration file section for more information about the above parameters.

Why does the network produce high confidence values for keypoints even when they are occluded?

Generally, when a keypoint is briefly occluded and its location can be resolved by the network, we are fine with high confidence values (this will happen, for example, when using temporal context frames). However, there may be scenarios where the goal is to explicitly track whether a keypoint is visible or hidden using confidence values (e.g., quantifying whether a tongue is in or out of the mouth). In this case, if the confidence values are too high during occlusions, try the suggestions below.

First, note that including a keypoint in the unsupervised losses - especially the PCA losses - will generally increase confidence values even during occlusions (by design). If a low confidence value is desired during occlusions, ensure the keypoint in question is not included in those losses.

If this does not fix the issue, another option is to set the following field in the config file: training.uniform_heatmaps_for_nan_keypoints: true. [This field is not visible in the default config but can be added.] This option will force the model to output a uniform heatmap for any keypoint that does not have a ground truth label in the training data. The model will therefore not try to guess where the occluded keypoint is located. This approach requires a set of training frames that include both visible and occluded examples of the keypoint in question.

Previous Next

Built with Sphinx using a theme provided by Read the Docs.