Anti-drift pose tracker (ADPT). A Three examples of drifts in deep learning-based animal behavioral analysis. Similar object disturbance means that the object similar to a specific body part misleads the deep learning-based methods. Inexplicable keypoint drift is caused by the high confidence score predicted on the wrong place by the network. Failure to detect the keypoint is probably caused by the the predicted low confidence score. B The anti-drift effects of ADPT. C The general workflow of ADPT. The network is trained to predict confidence heatmap, LRSS, and location refinement. D The network architecture of ADPT.

Analysis of ADPT’s anti-drift performance in a mouse dataset collected by our lab. A The time course of the y-axis position of sixteen body parts extracted from a one-minute video using ADPT, DeepLabCut and SLEAP tools. It showed that ADPT successfully tracked all 17 body parts of a mouse, whereas DeepLabCut and SLEAP encountered inexplicable tracking drifts. B Two anti-drift examples from ADPT, where the tail was drifted by DeepLabCut and the hind claw failed to detect by SLEAP. C Overall percentage of tracking drift and failing to detect (miss) frames from three methods. ADPT demonstrated a significantly lower drift percentage than other methods. D The percentage of frames with tracking drift (left) and failing to detect (right). Drifts were mainly from the top four body parts, including the tip tail, the left and the right hind claws, and the middle tail. E The averaged RMSE across all body parts (left) and RMSE of the top four body parts with drifts (right). ADPT achieved the smallest RMSE than other two tools when thresholded at 0.2. *: P<0.05, **: P<0.01, ***: P<0.001, ****: P<0.0001. RMSE: root mean square error.

Anti-drift performance cross background and individual, where the percentage of frames includes two types of drift phenomena: drift and miss. A The overall cross-individual anti-drift performance of ADPT and the other methods. The drift percentage of ADPT is significant lower than other methods. B After training the model 5 times on the dataset shuffle, the cross-individual drift percentage for each shuffle was analyse using one-way ANOVA. The ANOVA results revealed that there are differences in the inference results of the SLEAP model among individual, and there were no differences for ADPT or DeepLabCut. C The overall cross-background anti-drift performance of ADPT and the other methods. The drift percentage of ADPT is significant lower than other methods. D The cross-background drift percentage for each shuffle was analyse using one-way ANOVA. The ANOVA results revealed that there are slight differences in the inference results of the DeepLabCut model among individual, and there were no differences for ADPT or SLEAP. ns.: no significant, *: P<0.05, **: P<0.01, ***: P<0.001, ****: P<0.0001.

Analysis of ADPT’s anti-drift performance on monkey data, showing the cross species anti-drift ability. A The time course of the y-axis position of sixteen body parts extracted from a one-minute video using ADPT, DeepLabCut and SLEAP tools. It showed that ADPT successfully tracked all 17 body parts of a monkey, while the other two methods encountered tracking drift because of the appearance of humans. B DeepLabCut and SLEAP both mistakenly located the monkey’s eyes on humans when they appeared, while ADPT can achieve robust tracking. C, D The percentage of frames with tracking drift and failing to detect (miss). The occurrence of drift was mainly concentrated in the limbs, because the appearance of humans.

Results of public datasets evaluation. A Samples of prediction on single fly dataset. B Mean average precision (mAP) on fly dataset, where ADPT achieved average 92.8% accuracy (the best model achieved 93.27%). C RSS improved the average accuracy by 0.3% on single fly dataset. D Relationship between annotated image and accuracy of ADPT on fly dataset where ADPT achieved acceptable performance with only 350 annotated images in a simple laboratory environment. Points indicate the validation accuracy of model training on specific number of labels dataset. E Transformer improved the average accuracy by 0.4% on single fly dataset. F Samples of prediction on OMS_Dataset. G Root mean square error (RMSE) on OMS_Dataset, where ADPT achieved smaller RMSE than SLEAP when threshold = 0.2, and smaller than DeepLabCut when threshold = 0.6. P value, **: 0.001862, ns.: 0.243472, ***: 8.700e-06. H RMSE comparison on hip and tail of OMS_Dataset.P value, ***: 0.000561, Hip ns. :0.023766, Tail ns. :0.336642, *: 0.035782.

Illustration for mix-up social animal dataset generation. A Frames originating from different videos and corresbonding background. B Mix-up image. C Represents schematic diagrams illustrating the keypoint generated from single animal pose estimation of ADPT. D Represents an augmented mix-up image. E Represents schematic diagrams of augmented annotation. F Represents augmented keypoints. G Represents augmented LRSS. H Represents schematic diagrams of augmented Body Affinity Fields(BAF), inspired by Part Affinity Fileds(Cao et al. (2021)).

Applications of ADPT for multi-animal pose tracking. A Left: The pipeline for the multi-animal identity-pose tracking task. B Confusion matrix of the 10-mice classification (accuracy=93.16%).C Social mice tracking pipeline with identification accuracy of 99.72%.