Skip to content

Tracking Overview

XRTracker uses model-based tracking — it knows the 3D shape of the object and matches it against the camera image in real-time. This page explains the core concepts.

How It Works

Every frame, the tracker:

  1. Receives a camera image from the active camera source
  2. Compares the 3D model against what it sees in the image — matching edges, boundaries, or depth surfaces depending on the active modality
  3. Refines the pose to best align the model with the real object
  4. Updates the GameObject's transform with the new position and rotation

This runs every frame at camera framerate (typically 30-60 fps).

Tracking Status

Each TrackedBody reports a TrackingStatus:

Status Description
NotTracking Not tracking — waiting for detection or tracking was lost
Tracking Actively tracking with good quality
Poor Tracking but quality is below the "nice" threshold

Tracking starts when quality exceeds QualityToStart for several consecutive frames. Tracking stops when quality drops below QualityToStop.

Tracking Modalities

XRTracker uses a primary + complementary modality architecture. You always pick one primary contour modality (Silhouette or Edge), and optionally add complementary modalities (Depth, Texture) to improve robustness.

Primary Modalities (pick one)

Silhouette Tracking

Uses the object's foreground boundary — the contour where the object separates from the background. Silhouette tracking is the most general-purpose modality and the most resilient to fast camera or object movement, since the foreground/background separation remains visible even during motion blur.

Works best with objects that have a distinct outline against their surroundings. Requires some visual contrast between the object and background.

Learn more about Silhouette Tracking

Edge Tracking

Uses geometric edges on the object's surface — creases, silhouette contours, and depth discontinuities. Edge tracking works in lower contrast conditions than silhouette because it relies on local gradients rather than global foreground/background separation. Well suited for large machinery, engine blocks, industrial equipment — anything with lots of internal geometric detail visible from the camera.

Edge tracking also excels when you can only see a partial view of a large object, since internal edges remain visible even when the outline is not fully in frame. However, it is more sensitive to fast motion — edge correspondences are easier to lose during rapid movement.

Learn more about Edge Tracking

Complementary Modalities (optional)

These modalities enhance the primary contour modality but cannot be used alone.

Depth Tracking

Uses depth data from LiDAR sensors (iPhone Pro, iPad Pro) or depth cameras (Intel RealSense). Matches the 3D point cloud against the model surface. Provides strong translational constraints that complement the rotational precision of contour modalities.

Depth tracking is always used in addition to silhouette or edge — it cannot be used standalone.

Learn more about Depth Tracking

Texture Tracking

Uses surface appearance — matching the object's texture or visual features against a reference. Provides additional constraints when the object has distinctive surface patterns. Complementary to contour tracking.

To enable, check Texture Tracking on the TrackedBody component. Texture tracking works best on objects with rich, non-repetitive surface patterns (labels, printed graphics, wood grain). It adds little value on uniform or metallic surfaces.

Silhouette vs. Edge

Aspect Silhouette Edge
Setup Requires pre-generated tracking model Works directly from mesh — no model generation
Contrast requirement Needs foreground/background separation Works with lower contrast (local gradients)
Fast motion Very resilient More sensitive to rapid movement
Background clutter Sensitive — similar backgrounds confuse it More discriminative — uses local edge features
Object type General-purpose, curved shapes Large machinery, engines, objects with internal detail
Computational cost Lower Higher

Combining Modalities

Silhouette and Edge are mutually exclusive — you pick one contour method. Depth is a complementary modality that enhances either method but cannot be used alone.

Combination Use Case
Silhouette only General-purpose, no depth sensor
Edge only Industrial parts, cluttered backgrounds, no depth sensor
Silhouette + Depth Best robustness for AR with depth sensor (iPhone Pro, RealSense)
Edge + Depth Industrial parts with depth sensor

Choosing the Right Combination

Scenario Recommended
Object with distinct outline, varied background Silhouette
Large machinery, engine, equipment with geometric detail Edge
General-purpose with depth sensor Silhouette + Depth
Low contrast scene with depth sensor Edge + Depth
Object against cluttered background Edge + Depth
Fast-moving objects or handheld camera Silhouette (+ Depth if available)

How Modalities Interact

When multiple modalities are active, they work together to find the best pose. Each modality contributes its own measurements, and the tracker combines all information to produce the most accurate result. Quality is reported as the best across active modalities. Adding depth doesn't slow down contour tracking — all modalities are processed together efficiently.

Enabling Combinations

On the TrackedBody component:

  1. Set Tracking Method to Silhouette or Edge
  2. Check Depth Tracking to add depth

Detection

Detection uses the initial pose — the position and rotation of the TrackedBody GameObject in the scene relative to the camera. When you click "Track" or call ForceStartTracking(), the tracker renders the model at this pose and searches for a match.

Detection tips

  • Place the 3D model at the approximate real-world position relative to the camera
  • The closer the initial pose, the faster and more reliable detection is
  • For AR Foundation, detection uses the phone's current camera view