โ† Back to blog
Object TrackingBenchmarkJune 27, 202612 min read

Ultralytics object trackers comparison: ByteTrack, BoT-SORT & More

How do the six Ultralytics trackers actually behave on the same footage? A look at BoT-SORT, ByteTrack, OC-SORT, Deep OC-SORT, FastTrack, and TrackTrack, their internals, trade-offs, and side-by-side results on ID switches, ID stability, and FPS.

Muhammad Rizwan Munawar
Muhammad Rizwan Munawar
Computer Vision Engineer ยท Founder, Rizwan AI

Picking a tracker used to be simple in the Ultralytics Python package, because there were only two of them. Today Ultralytics ships six, and they behave very differently on the same video. This post walks through all six, namely BoT-SORT, ByteTrack, OC-SORT, Deep OC-SORT, FastTrack, and TrackTrack, and then puts them side by side on the same clip to see how they handle a close-pass ID switch, how stable their IDs stay, and how fast each one runs.

A quick overview

Every tracker here follows the tracking-by-detection approach. The detector (YOLO) finds boxes in each frame on its own, and the tracker only does the data association step, matching this frame's detections to the running trajectories (tracklets) so each object keeps a stable ID. The trackers differ almost entirely in how that matching works.

A few terms you'll need along the way:

  • Kalman filter (motion model). Each track holds a small state, the position and velocity of its box, and runs a predict then update loop. It predicts where the box should be in the next frame (a constant-velocity assumption), then corrects that guess with the matched detection. During an occlusion there is nothing to correct against, so the prediction drifts. That drift causes a lot of the errors you see right after an object reappears.
  • IoU vs. ReID association. The cheap cue is IoU, the overlap between a track's predicted box and a detection. It is fast but blind to identity, so when two boxes overlap it cannot tell them apart. The expensive cue is ReID (appearance), where a small CNN turns each box into an embedding and matching compares the cosine distance between embeddings. Appearance barely changes with position, so even when two boxes overlap, person A's embedding still matches A. That is exactly why ReID prevents swaps during close passes.
  • Camera motion compensation (CMC/GMC). On a moving camera the whole scene shifts, which breaks the constant-velocity prediction. CMC estimates the global frame-to-frame transform (optical flow, ORB, or ECC) and corrects each prediction before association.
  • ID switch vs. ID flickering. An ID switch permanently hands a track to the wrong object. ID flickering (track fragmentation) is when a track blinks out for a few frames and comes back, usually after a dropped detection or a short occlusion. The better trackers either never drop the ID, or recover the original one afterwards.

With that out of the way, here are the six.

1. ByteTrack: associate every box

In simple words: ByteTrack is the fast, no-frills option. It follows objects purely by where they are and where they are heading, and it pays attention even to the weak, low-confidence boxes that other trackers throw away.

Most trackers ignore detections the model is unsure about, but those faint boxes are often a real person who is half-hidden or blurry. ByteTrack (Zhang et al., ECCV 2022) keeps them. It matches in two passes: first it links the confident boxes to existing tracks, then it gives the leftover, unsure boxes a second chance to reconnect with any track that is still waiting. It uses only motion, with no idea what an object looks like, so it is very fast but can mix up two objects that look alike when they cross. Config: bytetrack.yaml.

2. BoT-SORT: ByteTrack plus a bag of tricks (the default)

In simple words: BoT-SORT is ByteTrack with a few upgrades, and it is the option Ultralytics uses by default. Its standout feature is that it can tell when the camera itself is moving and correct for it.

BoT-SORT (Aharon et al., 2022) keeps ByteTrack's two-pass matching but predicts each box's position more accurately, and it cancels out camera shake or panning so the objects don't appear to "jump" (this is the Global Motion Compensation, or GMC, part). It can also compare how objects look (ReID), but here is the important catch: in Ultralytics that appearance check is turned off by default (with_reid: False). So out of the box it still decides close calls using position alone. Switch with_reid on and it becomes much steadier in crowds. Config: botsort.yaml.

3. OC-SORT: fix the prediction after an object reappears

In simple words: OC-SORT is built to recover cleanly when an object is briefly hidden and then comes back, and to handle objects that move in unpredictable ways.

When an object disappears for a moment, a normal tracker keeps guessing where it went, and those guesses drift further from reality every frame. OC-SORT (Cao et al., CVPR 2023) waits for the object to actually reappear, then goes back and fixes the path it missed using that real sighting instead of the bad guesses. It also pays attention to each object's direction of travel, which helps with sudden turns. It does not look at appearance at all, so it is very fast and great with erratic motion, but like ByteTrack it can still confuse two objects that overlap. Config: ocsort.yaml.

4. Deep OC-SORT: OC-SORT that also remembers how objects look

In simple words: Deep OC-SORT takes OC-SORT and adds a memory of each object's appearance, so it is less likely to swap two people who pass close to each other.

Deep OC-SORT (Maggiolino et al., ICIP 2023) keeps OC-SORT's strengths and adds an appearance check (ReID) plus camera-motion correction. The clever part is that it only trusts clear, confident views of an object when building its appearance memory and ignores the blurry or half-hidden ones, so the memory stays accurate. That mix of motion and looks makes it solid in busy pedestrian scenes. The cost is extra computation for the appearance step, so it runs slower than plain OC-SORT or ByteTrack. Config: deepocsort.yaml.

5. FastTrack: appearance-aware, tuned for busy real-time scenes

In simple words: FastTrack tries to stay fast while still using appearance, and it is tuned for packed scenes like dense traffic.

FastTrack (Hashempoor & Hwang, 2024/2025) sits in the same family as ByteTrack but adds a sense of what objects look like, so it can hold IDs steady when lots of similar objects are crowded together. It was released with its own dense-traffic benchmark with dozens of objects per frame, which tells you the kind of scene it is built for. It is newer and less widely tested than the long-standing trackers, but strong on the crowded, real-time workloads it targets. Config: fasttrack.yaml.

6. TrackTrack: let each track pick its own match

In simple words: TrackTrack is the most careful and accurate of the group. It is built so that crowded, overlapping scenes don't create duplicate or flickering IDs.

Most trackers start from the detections and decide which track each one belongs to. TrackTrack (Shim et al., CVPR 2025) flips this around: each existing track looks at all the boxes and claims the one that fits it best, using both position and appearance. It is also careful about creating new IDs. If a new box overlaps an object that is already being tracked, it refuses to spawn a fresh ID for it, which stops the duplicates and flicker you often see in crowds. It is the heaviest and newest of the six, but it leads the major benchmarks and shines exactly where the others struggle: dense, occlusion-heavy scenes. Config: tracktrack.yaml.

Additional features

Tracker Config Motion Appearance / ReID Camera comp. Best for
ByteTrack bytetrack.yaml Kalman No No Static camera, max speed
BoT-SORT (default) botsort.yaml Kalman (w,h) Optional Yes (GMC) General use, moving camera
OC-SORT ocsort.yaml Kalman + ORU/OCM No No Non-linear motion, light CPU
Deep OC-SORT deepocsort.yaml Kalman + ORU/OCM Yes (adaptive) Yes Non-linear motion plus ID stability
FastTrack fasttrack.yaml Kalman Yes n/a Dense traffic, real-time
TrackTrack tracktrack.yaml Kalman Yes n/a Crowded, occlusion-heavy, best accuracy

The experiment

Same detector, same video, six trackers, and only the tracker argument changes. Swapping a tracker in Ultralytics is a one-line change:

from ultralytics import YOLO

model = YOLO("yolo11n.pt")

# Just change the tracker YAML to switch the algorithm:
#   botsort.yaml | bytetrack.yaml | ocsort.yaml
#   deepocsort.yaml | fasttrack.yaml | tracktrack.yaml
results = model.track(
    source="people.mp4",
    tracker="botsort.yaml",
    persist=True,
    show=True,
)

I ran all six on the same pedestrian clip and watched two moments where trackers tend to earn or lose their reputation: a close pass between two people, and how each tracker decides when to commit a brand-new ID.

Observation 1: BoT-SORT swaps an ID on a close pass

At 00:05 in the clip, a second person walks right next to ID #19, and BoT-SORT switches the ID right away. The track jumps from one person to the other the moment the boxes overlap. The other five trackers (ByteTrack, OC-SORT, Deep OC-SORT, FastTrack, and TrackTrack) all held ID #19 correctly through the same crossing.

BoT-SORT switches ID 19 during a close pass while ByteTrack, OC-SORT, Deep OC-SORT, FastTrack and TrackTrack keep it stable
Fig-1: At 00:05, BoT-SORT swaps ID #19 as another person passes, while the other five hold the identity.

Note: TrackTrack assigned ID #12 and ID #16 on the very first frame, so we can consider it's like #11 vs #19 forr all other trackers.

Why does the default tracker lose here? Because Ultralytics runs BoT-SORT with ReID disabled, so the close pass comes down to IoU and motion alone. When the two boxes overlap, the geometric cost matrix becomes ambiguous and the assignment swaps them. The fix is appearance. Either enable with_reid: True in botsort.yaml, or use a tracker whose association already includes appearance or motion-direction cues. This one clip is the clearest reason not to assume the default is always the best choice.

Observation 2: TrackTrack assigns IDs later, but more precisely

As highlighted in Fig-2, TrackTrack assigns some IDs a few frames later than the other trackers in crowded scenes. Its track-aware initialization (TAI) won't spawn a new ID until a detection is consistent enough to deserve one, rather than reacting to the first noisy box. In packed scenes where boxes overlap, merge, and split and confidence scores bounce around, that restraint is what stops duplicate IDs and throwaway tracks.

TrackTrack assigns IDs 15 and 23 a few frames later than the other trackers, producing more precise and stable tracks
Fig-2: TrackTrack commits IDs a few frames later, trading a little latency for cleaner, more stable tracks.

That few-frame delay buys clean trajectories for the rest of the clip, exactly what counting and dwell-time analytics need, where one spurious ID throws off the result.

Overall ranking for ID stability

Putting the whole clip together, here is how the six ranked on ID stability and overall tracking quality, from best to worst:

  1. TrackTrack: cleanest, most stable IDs, with track-aware initialization keeping the count honest.
  2. FastTrack: close behind, with stable appearance-aware association.
  3. Deep OC-SORT: adaptive ReID holds identities steady under occlusion.
  4. ByteTrack: solid for a motion-only tracker, thanks to its low-score recovery.
  5. OC-SORT: good motion handling, but more ID flickering without appearance.
  6. BoT-SORT: the least stable here, including the close-pass switch, with ReID off by default.

The two with the most track-ID flickering were BoT-SORT and OC-SORT. Both rely on motion and IoU with no active appearance branch, so they are the quickest to drop and re-create an ID when a detection wavers. The appearance-aware and track-aware methods (TrackTrack, FastTrack, Deep OC-SORT) sit at the top because they have an extra cue to hold an identity together when geometry alone is ambiguous.

Speed (FPS)

Accuracy is only half the decision, throughput is the other half. Heavier association (ReID embeddings, camera-motion estimation, multi-cue costs) trades frames per second for identity stability. The numbers below are the average tracker FPS on an NVIDIA GeForce RTX 3050, measured on the same clip and excluding overlay and visualization time, so this is the association cost only:

Tracker Avg. FPS Avg. latency (ms/frame) Close-pass ID switch ID-stability rank
TrackTrack 17.80 56.08 held 1 (best)
FastTrack 42.80 23.36 held 2
Deep OC-SORT 40.89 24.46 held 3
ByteTrack 46.60 21.71 held 4
OC-SORT 36.88 27.12 held 5 (high flicker)
BoT-SORT 17.80 56.19 switched #19 6 (high flicker)

The trade-off is clearest at the two extremes. ByteTrack is the fastest at 46.6 FPS because it is pure motion and IoU with no appearance pass. BoT-SORT and TrackTrack are the slowest at about 17.8 FPS, BoT-SORT because of global motion compensation and TrackTrack because of its heavy multi-cue, track-perspective association. The interesting part is the middle: FastTrack (42.8) and Deep OC-SORT (40.9) both run appearance-aware association yet stay close to ByteTrack's speed, which makes them the best accuracy-per-frame compromise in this test. TrackTrack buys the cleanest IDs, but at roughly 2.6 times the per-frame cost of ByteTrack. That is fine for offline analytics, and something to weigh carefully for real-time pipelines.

How to choose

A quick guide based on the scene you are working with:

  • Static camera, maximum speed, detector-bound pipeline: ByteTrack.
  • General purpose, anything with a moving camera: BoT-SORT (the default), and turn on with_reid for crowds and close passes.
  • Erratic, non-linear motion (sports, dance, animals) on a light compute budget: OC-SORT.
  • Non-linear motion where you also need appearance-based identity stability: Deep OC-SORT.
  • Dense traffic in real time: FastTrack.
  • Crowded, occlusion-heavy scenes where stable IDs matter most: TrackTrack.

The takeaway from this experiment is that the default is not automatically the best. BoT-SORT lost the close-pass test precisely because its appearance branch is off by default, while the other five held the identity through the crossing. Match the tracker to the scene (camera motion, motion linearity, crowd density), and always validate on your own footage, because the right answer really does change with the data.

See all six side by side on the same clip ๐Ÿš€

Explore more

FAQs

Q:Which tracker should I use by default with Ultralytics YOLO?
A:BoT-SORT is the default and a reasonable general-purpose choice, especially on moving cameras thanks to its built-in global motion compensation. If your camera is static and you want maximum speed, ByteTrack is lighter. For crowded scenes with frequent crossings, enable appearance (ReID) or move to Deep OC-SORT or TrackTrack.
Q:What is an ID switch in multi-object tracking?
A:An ID switch happens when the identity assigned to a real object changes. A track that was correctly following person A gets reassigned to person B, or person A reappears under a brand-new ID. It is the classic failure of the data-association step and usually spikes when objects cross, occlude one another, or move non-linearly.
Q:Why does BoT-SORT switch IDs when people pass close together?
A:By default Ultralytics runs BoT-SORT with ReID disabled, so association leans on motion and IoU. When two boxes overlap heavily during a close pass, the geometric cost becomes ambiguous and the assignment can swap the IDs. Enabling appearance features (with_reid) or using an appearance-aware tracker reduces this.
Q:Are all six trackers built into Ultralytics?
A:Recent Ultralytics releases ship all six as YAML configs: botsort.yaml, bytetrack.yaml, ocsort.yaml, deepocsort.yaml, fasttrack.yaml, and tracktrack.yaml. ByteTrack and BoT-SORT have been built in for a long time, while OC-SORT, Deep OC-SORT, FastTrack, and TrackTrack were added more recently, so pin your version to be sure.
Q:What is the difference between an ID switch and ID flickering?
A:An ID switch permanently reassigns an identity to the wrong object. ID flickering (a form of track fragmentation) is when the same track briefly disappears and reappears across a few frames, often because a detection dropped below threshold or a short occlusion broke the track. A good tracker either prevents the flicker or recovers the original ID afterwards.

Related posts

Muhammad Rizwan Munawar
Muhammad Rizwan Munawar

Computer Vision Engineer and top contributor to the YOLO project, building production AI and deep learning systems.

My course on LinkedIn LearningHands-On AI: Computer Vision Projects with Ultralytics and OpenCV