Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection

1University of Cagliari, 2Technical University of Munich (TUM), 3University of California Merced (UCM), 4Universidad de Las Palmas de Gran Canaria, 5University of California San Diego (UCSD)
ECVA European Computer Vision Conference (ECCV'24)

*Indicates Equal Contribution

Overview

RoadSense3D is a large-scale synthetic dataset covering roadside scenarios. It contains 1.4 million labeled camera frames with 9 million labeled 3D traffic participants recorded in the CARLA simulator.

In summary:
  • The RoadSense3D dataset consists of labeled traffic scenarios recorded at various lighting and weather conditions such.
  • We provide an in-depth comparison of state-of-the-art monocular 3D object detection methods.
  • We extend the Cube R-CNN model to make it compatible with various datasets.
  • We develop domain adaptation methods to improve generalization.
  • We perform extensive transfer learning experiments and ablation studies on the RoadSense3D dataset, the TUM Traffic datasets, and the DAIR-V2X dataset.
  • We open-source our code and dataset and provide some qualitative video results.

Abstract

Accurately detecting 3D objects from monocular images in dynamic roadside scenarios remains a challenging problem due to varying camera perspectives and unpredictable scene conditions. This paper introduces a two-stage training strategy to address these challenges. Our approach initially trains a model on the large-scale synthetic dataset, RoadSense3D, which offers a diverse range of scenarios for robust feature learning. Subsequently, we fine-tune the model on a combination of real-world datasets (TUM Traffic A9 Highway and DAIR-V2X-I) to enhance its adaptability to practical conditions. Experimental results of the Cube R-CNN model on challenging public benchmarks show a significant improvement in detection performance, with a mean average precision rising from 0.26 to 12.76 on the TUMTraf-A9 dataset and from 2.09 to 6.60 on the DAIR-V2X-I dataset, when performing transfer learning. Code, data, and qualitative video results are available on the project website: https://roadsense3d.github.io.

Qualitative Results

title_figure
Qualitative results of Cube R-CNN on the synthetic RoadSense3D test set.
We show 3D box detections of the Cube R-CNN model in the class-specific colors during different lighting and weather conditions.
title_figure
Qualitative results of Cube R-CNN on the TUMTraf-A9 test set. The Cube R-CNN was model trained from scratch on TUMTraf-A9 training set and evaluated on the TUMTraf-A9 test set.
title_figure
Qualitative results of Cube R-CNN and transfer learning. The Cube R-CNN model was pre-trained on RoadSense3D, fine-tuned on TUMTraf-A9 training set and evaluated on TUMTraf-A9 test set.

Quantitative Results

Architecture Pre-Train Set Fine-Tuning Set Evaluation Set Difficulty Level
Cube R-CNN TUMTraf-A9 Train - TUMTraf-A9 Test 0.26 0.26 0.26
Cube R-CNN RoadSense3D Train TUMTraf-A9 Train TUMTraf-A9 Test 12.76 12.76 12.76
Single-Step Dataset Transfer on TUMTraf-A9. We report the 3D mean average precision across the easy, moderate, and hard difficulty levels. Transfer learning involves pre-training on the synthetic RoadSense3D dataset and fine-tuning on the real-world TUMTraf-A9 dataset.
Architecture Pre-Train Set Fine-Tuning Set Evaluation Set Difficulty Level
Cube R-CNN DAIR-V2X-I Train - DAIR-V2X-I Test 2.09 2.62 2.61
Cube R-CNN RoadSense3D Train DAIR-V2X-I Train DAIR-V2X-I Test 6.60 8.60 8.65
Single-Step Dataset Transfer on DAIR-V2X-I. We report the 3D mean average precision across the easy, moderate, and hard difficulty levels. Transfer learning involves pre-training on the synthetic RoadSense3D dataset and fine-tuning on the real-world DAIR-V2X-I dataset.

BibTeX


@inproceedings{ahmed2024transfer,
    location = {Milan, Italy},
    title = {Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection},
    pages = {19},
    booktitle={Proceedings of the 18th European Conference on Computer Vision ECCV 2024},
    organization={Springer},
    publisher = {Springer-Verlag},
    author = {Mohamed, Sondos and Zimmer, Walter and Greer, Ross and Alaaeldin Ghita, Ahmed and Castrillón-Santana, Modesto and Trivedi, Mohan M. and Knoll, Alois C. and Carta, Salvatore Mario and Marras, Mirko},
    date = {2024-09-30}
}