Tracking Overview¶
XRTracker uses model-based tracking — it knows the 3D shape of the object and matches it against the camera image in real-time. This page explains the core concepts.
How It Works¶
Every frame, the tracker:
- Receives a camera image from the active camera source
- Compares the 3D model against what it sees in the image — matching edges, boundaries, or depth surfaces depending on the active modality
- Refines the pose to best align the model with the real object
- Updates the GameObject's transform with the new position and rotation
This runs every frame at camera framerate (typically 30-60 fps).
Tracking Status¶
Each TrackedBody reports a TrackingStatus:
| Status | Description |
|---|---|
| NotTracking | Not tracking — waiting for detection or tracking was lost |
| Tracking | Actively tracking with good quality |
| Poor | Tracking but quality is below the "nice" threshold |
Tracking starts when quality exceeds QualityToStart for several consecutive frames. Tracking stops when quality drops below QualityToStop.
Tracking Modalities¶
XRTracker uses a primary + complementary modality architecture. You always pick one primary contour modality (Silhouette or Edge), and optionally add complementary modalities (Depth, Texture) to improve robustness.
Primary Modalities (pick one)¶
Silhouette Tracking¶
Uses the object's foreground boundary — the contour where the object separates from the background. Silhouette tracking is the most general-purpose modality and the most resilient to fast camera or object movement, since the foreground/background separation remains visible even during motion blur.
Works best with objects that have a distinct outline against their surroundings. Requires some visual contrast between the object and background.
Learn more about Silhouette Tracking
Edge Tracking¶
Uses geometric edges on the object's surface — creases, silhouette contours, and depth discontinuities. Edge tracking works in lower contrast conditions than silhouette because it relies on local gradients rather than global foreground/background separation. Well suited for large machinery, engine blocks, industrial equipment — anything with lots of internal geometric detail visible from the camera.
Edge tracking also excels when you can only see a partial view of a large object, since internal edges remain visible even when the outline is not fully in frame. However, it is more sensitive to fast motion — edge correspondences are easier to lose during rapid movement.
Learn more about Edge Tracking
Complementary Modalities (optional)¶
These modalities enhance the primary contour modality but cannot be used alone.
Depth Tracking¶
Uses depth data from LiDAR sensors (iPhone Pro, iPad Pro) or depth cameras (Intel RealSense). Matches the 3D point cloud against the model surface. Provides strong translational constraints that complement the rotational precision of contour modalities.
Depth tracking is always used in addition to silhouette or edge — it cannot be used standalone.
Learn more about Depth Tracking
Texture Tracking¶
Uses surface appearance — matching the object's texture or visual features against a reference. Provides additional constraints when the object has distinctive surface patterns. Complementary to contour tracking.
To enable, check Texture Tracking on the TrackedBody component. Texture tracking works best on objects with rich, non-repetitive surface patterns (labels, printed graphics, wood grain). It adds little value on uniform or metallic surfaces.
Silhouette vs. Edge¶
| Aspect | Silhouette | Edge |
|---|---|---|
| Setup | Requires pre-generated tracking model | Works directly from mesh — no model generation |
| Contrast requirement | Needs foreground/background separation | Works with lower contrast (local gradients) |
| Fast motion | Very resilient | More sensitive to rapid movement |
| Background clutter | Sensitive — similar backgrounds confuse it | More discriminative — uses local edge features |
| Object type | General-purpose, curved shapes | Large machinery, engines, objects with internal detail |
| Computational cost | Lower | Higher |
Combining Modalities¶
Silhouette and Edge are mutually exclusive — you pick one contour method. Depth is a complementary modality that enhances either method but cannot be used alone.
| Combination | Use Case |
|---|---|
| Silhouette only | General-purpose, no depth sensor |
| Edge only | Industrial parts, cluttered backgrounds, no depth sensor |
| Silhouette + Depth | Best robustness for AR with depth sensor (iPhone Pro, RealSense) |
| Edge + Depth | Industrial parts with depth sensor |
Choosing the Right Combination¶
| Scenario | Recommended |
|---|---|
| Object with distinct outline, varied background | Silhouette |
| Large machinery, engine, equipment with geometric detail | Edge |
| General-purpose with depth sensor | Silhouette + Depth |
| Low contrast scene with depth sensor | Edge + Depth |
| Object against cluttered background | Edge + Depth |
| Fast-moving objects or handheld camera | Silhouette (+ Depth if available) |
How Modalities Interact¶
When multiple modalities are active, they work together to find the best pose. Each modality contributes its own measurements, and the tracker combines all information to produce the most accurate result. Quality is reported as the best across active modalities. Adding depth doesn't slow down contour tracking — all modalities are processed together efficiently.
Enabling Combinations¶
On the TrackedBody component:
- Set Tracking Method to
SilhouetteorEdge - Check Depth Tracking to add depth
Detection¶
Detection uses the initial pose — the position and rotation of the TrackedBody GameObject in the scene relative to the camera. When you click "Track" or call ForceStartTracking(), the tracker renders the model at this pose and searches for a match.
Detection tips
- Place the 3D model at the approximate real-world position relative to the camera
- The closer the initial pose, the faster and more reliable detection is
- For AR Foundation, detection uses the phone's current camera view