Benchmarks

This section defines the detection and trajectory prediction tasks, evaluation metrics, and experimental configurations used throughout EagleVision.

3D Detection Task

The 3D detection task localizes racing vehicles in the LiDAR coordinate frame using oriented 3D bounding boxes. Only one semantic class (Car) is considered.

Evaluation range

x ∈ [-60, 60] m
y ∈ [-60, 60] m
z ∈ [-2, 4] m

Detection Metrics

AP: Average Precision computed over center-distance thresholds {0.25, 0.5, 1.0, 2.0} meters.
ATE: Average Translation Error (center distance).
ASE: Average Scale Error.
AOE: Average Orientation Error (yaw).
NDS: reduced nuScenes-style detection score (velocity/attribute omitted).

Reduced NDS

NDS = (1/10) * ( 5*AP + 3 - ATE - ASE - AOE )

Detection Transfer Protocol

We denote A2RL Real as R, Indy as I, Simulator as S, and Waymo pretraining as W. For all experiments, the best checkpoint is selected according to validation NDS.

R (Scratch): Training on R from random initialization.
W → R: Waymo-pretrained model finetuned on R.
W → (R+S)_1:1: Joint finetuning on R and S with equal sampling ratio.
W → S₁₀ → R: Pretraining on S for 10 epochs, then finetuning on R.
W → (R + 0.1S): Finetuning on R augmented with 10% simulator samples.
W → (R + 0.1I): Finetuning on R augmented with 10% Indy samples.
W → I₁₀ → R: Pretraining on I for 10 epochs before finetuning on R.
W → (S+I)₃₀ → R: Joint pretraining on S and I for 30 epochs.
W → I₅ → S₈ → R: Two-stage pretraining before final finetuning.

Detection Results (A2RL Real)

Setup	AP ↑	ATE ↓	ASE ↓	AOE ↓	NDS ↑
R	0.843	0.1796	0.0716	0.0372	0.69266
W → R	0.890	0.1702	0.0333	0.0231	0.72234
W → (R+S)_1:1	0.847	0.1918	0.0348	0.0254	0.69830
W → S₁₀ → R	0.879	0.1580	0.0294	0.0208	0.71868
W → (R + 0.1S)	0.873	0.1589	0.0310	0.0209	0.71542
W → (R + 0.1I)	0.883	0.1666	0.0413	0.0213	0.71858
W → I₁₀ → R	0.895	0.1547	0.0320	0.0274	0.72609
W → (S+I)₃₀ → R	0.876	0.1654	0.0337	0.0186	0.71623
W → I₅ → S₈ → R	0.878	0.1632	0.0263	0.0302	0.71703

Trajectory Prediction Results (ADE/FDE, lower is better)

Dataset	Train		Validation		Test		R
Dataset	ADE ↓	FDE ↓	ADE ↓	FDE ↓	ADE ↓	FDE ↓	ADE ↓	FDE ↓
I^v5	0.502	0.905	1.152	2.326	0.774	1.511	0.478	0.947
I^v3	0.672	1.175	1.624	3.254	0.862	1.611	1.567	4.205
I	0.378	0.743	0.621	1.276	1.611	3.177	0.852	1.389

Cross-dataset Trajectory Prediction on R

Models are trained on R and I respectively and evaluated on R using ADE and FDE (↓).

Dataset	Train		Validation		Test
Dataset	ADE ↓	FDE ↓	ADE ↓	FDE ↓	ADE ↓	FDE ↓
R	0.266	0.414	0.214	0.302	0.484	1.24
I	0.502	0.905	1.152	2.326	0.478	0.947