HeatmapTracker

class lightning_pose.models.heatmap_tracker.HeatmapTracker(num_keypoints: int, num_targets: int | None = None, loss_factory: LossFactory | None = None, backbone: Literal['resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'resnet50_contrastive', 'resnet50_animal_apose', 'resnet50_animal_ap10k', 'resnet50_human_jhmdb', 'resnet50_human_res_rle', 'resnet50_human_top_res', 'resnet50_human_hand', 'efficientnet_b0', 'efficientnet_b1', 'efficientnet_b2', 'vit_b_sam'] = 'resnet50', downsample_factor: Literal[1, 2, 3] = 2, pretrained: bool = True, output_shape: tuple | None = None, torch_seed: int = 123, lr_scheduler: str = 'multisteplr', lr_scheduler_params: DictConfig | dict | None = None, **kwargs: Any)[source]

Bases: BaseSupervisedTracker

Base model that produces heatmaps of keypoints from images.

Attributes Summary

num_filters_for_upsampling

Methods Summary

`create_double_upsampling_layer`(in_channels, ...)	Perform ConvTranspose2d to double the output shape.
`forward`(images)	Forward pass through the network.
`get_loss_inputs_labeled`(batch_dict)	Return predicted heatmaps and their softmaxes (estimated keypoints).
`heatmaps_from_representations`(representations)	Upsample representations to get final heatmaps.
`initialize_upsampling_layers`()	Intialize the Conv2DTranspose upsampling layers.
`make_upsampling_layers`()
`predict_step`(batch_dict, batch_idx[, ...])	Predict heatmaps and keypoints for a batch of video frames.
`run_hard_argmax`(heatmaps)	Use hard argmax on heatmaps.
`run_subpixelmaxima`(heatmaps)	Use soft argmax on heatmaps.

Attributes Documentation

num_filters_for_upsampling

Methods Documentation

static create_double_upsampling_layer(in_channels: int, out_channels: int) → ConvTranspose2d[source]: Perform ConvTranspose2d to double the output shape.

forward(images: ~torch.Annotated[~torch.Tensor, {'__torchtyping__': True, 'details': ('batch', channels: 3, 'image_height', 'image_width',), 'cls_name': 'TensorType'}] | ~torch.Annotated[~torch.Tensor, {'__torchtyping__': True, 'details': ('batch', 'views', channels: 3, 'image_height', 'image_width',), 'cls_name': 'TensorType'}]) → Tensor, {'__torchtyping__': True, 'details': ('num_valid_outputs', 'num_keypoints', 'heatmap_height', 'heatmap_width',), 'cls_name': 'TensorType'}][source]: Forward pass through the network.

get_loss_inputs_labeled(batch_dict: HeatmapLabeledBatchDict | MultiviewHeatmapLabeledBatchDict) → dict[source]: Return predicted heatmaps and their softmaxes (estimated keypoints).

heatmaps_from_representations(representations: Tensor, {'__torchtyping__': True, 'details': ('batch', 'features', 'rep_height', 'rep_width'), 'cls_name': 'TensorType'}]) → Tensor, {'__torchtyping__': True, 'details': ('batch', 'num_keypoints', 'heatmap_height', 'heatmap_width',), 'cls_name': 'TensorType'}][source]: Upsample representations to get final heatmaps.

initialize_upsampling_layers() → None[source]: Intialize the Conv2DTranspose upsampling layers.

make_upsampling_layers() → Sequential[source]

predict_step(batch_dict: HeatmapLabeledBatchDict | MultiviewHeatmapLabeledBatchDict | UnlabeledBatchDict, batch_idx: int, return_heatmaps: bool | None = False) → Tuple[Tensor, Tensor] | Tuple[Tensor, Tensor, Tensor][source]

Predict heatmaps and keypoints for a batch of video frames.

Assuming a DALI video loader is passed in > trainer = Trainer(devices=8, accelerator=”gpu”) > predictions = trainer.predict(model, data_loader)

run_hard_argmax(heatmaps: Tensor, {'__torchtyping__': True, 'details': ('batch', 'num_keypoints', 'heatmap_height', 'heatmap_width'), 'cls_name': 'TensorType'}]) → Tensor, {'__torchtyping__': True, 'details': ('batch', 'num_keypoints',), 'cls_name': 'TensorType'}]][source]

Use hard argmax on heatmaps.

Parameters:

heatmaps – output of upsampling layers

Returns:

tuple

hard argmax of shape (batch, num_targets)
confidences of shape (batch, num_keypoints)

run_subpixelmaxima(heatmaps: Tensor, {'__torchtyping__': True, 'details': ('batch', 'num_keypoints', 'heatmap_height', 'heatmap_width'), 'cls_name': 'TensorType'}]) → Tensor, {'__torchtyping__': True, 'details': ('batch', 'num_keypoints',), 'cls_name': 'TensorType'}]][source]

Use soft argmax on heatmaps.

Parameters:

heatmaps – output of upsampling layers

Returns:

tuple

soft argmax of shape (batch, num_targets)
confidences of shape (batch, num_keypoints)

__init__(num_keypoints: int, num_targets: int | None = None, loss_factory: LossFactory | None = None, backbone: Literal['resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152', 'resnet50_contrastive', 'resnet50_animal_apose', 'resnet50_animal_ap10k', 'resnet50_human_jhmdb', 'resnet50_human_res_rle', 'resnet50_human_top_res', 'resnet50_human_hand', 'efficientnet_b0', 'efficientnet_b1', 'efficientnet_b2', 'vit_b_sam'] = 'resnet50', downsample_factor: Literal[1, 2, 3] = 2, pretrained: bool = True, output_shape: tuple | None = None, torch_seed: int = 123, lr_scheduler: str = 'multisteplr', lr_scheduler_params: DictConfig | dict | None = None, **kwargs: Any) → None[source]

Initialize a DLC-like model with resnet backbone.

Parameters:

num_keypoints – number of body parts
loss_factory – object to orchestrate loss computation
backbone – ResNet or EfficientNet variant to be used
downsample_factor – make heatmap smaller than original frames to save memory; subpixel operations are performed for increased precision
pretrained – True to load pretrained imagenet weights
output_shape – hard-coded image size to avoid dynamic shape computations
torch_seed – make weight initialization reproducible
lr_scheduler – how to schedule learning rate
lr_scheduler_params – params for specific learning rate schedulers multisteplr: milestones, gamma