Introduction
This year's challenge has three independent tracks:
Track1: Single UAV Tracking. Given the bounding box of a drone target in the initial frame, this challenge track requires algorithms to track the given target in each video frame by predicting its bounding box. When the target disappears, an invisible mark (no bounding box) needs to be given.
Track2: Single UAV Detection & Tracking. Whether a drone target exits in the initial frame is unknown. This challenge track requires algorithms to detect and track the drone target when it appears by predicting its bounding box. When the target does not exist or disappears, an invisible mark (no bounding box) needs to be given.
Track3: Multiple UAV Tracking. Given the bounding boxes of the drone targets in the initial frame, this challenge track requires algorithms to detect and track both the initially provided and newly appeared drone targets in each subsequent video frame, by predicting their bounding boxes and unique IDs.
Guideline for Challenge
-
Three independent Codalab severs are used to evaluate and rank the submissions for Track1, Track2 and Track3, respectively.
-
We provide Baseline model and Evaluation code on ModelScope, Please refer to the evaluation code and resulting output file to test your algorithm and prepare final submission (.zip) to Codalab server.
-
If you encounter any questions or misunderstandings, please feel free to contact us:
We also set up a WeChat group and QQ group (1025561717) for quick communication.

Participation Requirements
-
The provided test data is NOT allowed to be used for training.
-
NO additional training data is allowed to train/pretrain the model. The public dataset like ImageNet and CoCo can be used, but the additional datasets about UAVs are BANNED.
-
The submission description should clearly state the algorithm framework.
Evaluation Metrics
We define the singel object tracking accuracy as:

For frame t, IoUt is Intersection over Union (IoU) between the predicted tracking box and its corresponding ground-truth box, pt is the predicted visibility flag, it equals 1 when the predicted box is empty and 0 otherwise. The vt is the ground-truth visibility flag of the target, the indicator function δ(vt>0) equals 1 when vt > 0 and 0 otherwise. The accuracy is averaged over all frames in a sequence, T indicates total frames and T* denotes the number of frames corresponding to the presence of the target in the ground-truth.
MOTA (Multi-Object Tracking Accuracy) is a metric used to evaluate the overall performance of multi-object tracking algorithms. It combines factors like false positives, false negatives, and ID switches. The formula is:
MOTA = 1 - (FP + FN + IDS)/GT
- FP: False Positives
- FN: False Negatives
- IDS: ID Switches
- GT: Ground Truth Objects
MOTA ranges from 0 to 1, with higher values indicating better tracking accuracy. It is a comprehensive metric that reflects tracking performance by considering these errors.