FAQs
What video formats are supported by Lightning Pose?
Lightning Pose requires videos that use the h.264 codec.
AVI files do not use the h.264 codec, but MP4 files typically do (though not always).
The following function will check for the proper codec using ffmpeg:
import subprocess
def check_codec_format(input_file: str) -> bool:
"""Run FFprobe command to get video codec and pixel format."""
ffmpeg_cmd = f'ffmpeg -i {input_file}'
output_str = subprocess.run(ffmpeg_cmd, shell=True, capture_output=True, text=True)
# stderr still has codec info
output_str = output_str.stderr
# search for correct codec (h264) and pixel format (yuv420p)
if output_str.find('h264') != -1 and output_str.find('yuv420p') != -1:
is_codec = True
else:
is_codec = False
return is_codec
If your videos do not use the h.264 codec the following python code will convert them:
import os
import subprocess
def reencode_video(input_file: str, output_file: str) -> None:
"""Reencodes video into h.264 coded format using ffmpeg from a subprocess.
Args:
input_file: abspath to existing video
output_file: abspath to to new mp4 video using h.264 codec
"""
# check input file exists
assert os.path.isfile(input_file), 'input video does not exist.'
# check directory for saving outputs exists
os.makedirs(os.path.dirname(output_file), exist_ok=True)
# create ffmpeg command
ffmpeg_cmd = f'ffmpeg -i {input_file} -c:v libx264 -pix_fmt yuv420p -c:a copy -y {output_file}'
# run command
subprocess.run(ffmpeg_cmd, shell=True)
Note that you can also run the ffmpeg command directly from the command line.
What if I encounter a CUDA out of memory error?
Model training can be GPU-memory-intensive, particularly when using unsupervised losses, the Temporal Context Network model, multi-view datasets, or high-resolution images. For this reason we recommend using a GPU with a minimum of 8GB of memory, but preferrably 16GB.
Some users using a combination of the memory-intensive features above may still run into issues. There are a few techniques available to reduce the memory consumption:
Reduce
train_batch_size. Memory usage is directly proportional to batch size.Enable multi-GPU training using
num_gpus.Reduce image resolution using
image_resize_dims.Enable gradient accumulation using
accumulate_grad_batches. This parameter is not included in the config by default and should be added manually to thetrainingsection.
Each technique above has trade-offs. The right choice will be dependent on your individual situation.
See The configuration file section for more information about the above parameters.
Why does the network produce high confidence values for keypoints even when they are occluded?
Generally, when a keypoint is briefly occluded and its location can be resolved by the network, we are fine with high confidence values (this will happen, for example, when using temporal context frames). However, there may be scenarios where the goal is to explicitly track whether a keypoint is visible or hidden using confidence values (e.g., quantifying whether a tongue is in or out of the mouth). In this case, if the confidence values are too high during occlusions, try the suggestions below.
First, note that including a keypoint in the unsupervised losses - especially the PCA losses - will generally increase confidence values even during occlusions (by design). If a low confidence value is desired during occlusions, ensure the keypoint in question is not included in those losses.
If this does not fix the issue, another option is to set the following field in the config file:
training.uniform_heatmaps_for_nan_keypoints: true.
[This field is not visible in the default config but can be added.]
This option will force the model to output a uniform heatmap for any keypoint that does not
have a ground truth label in the training data.
The model will therefore not try to guess where the occluded keypoint is located.
This approach requires a set of training frames that include both visible and occluded examples
of the keypoint in question.