热搜关键词: MATLAB天线OpenCVVHDL

pdf

自动驾驶计算机视觉研究综述英文版

  • 1星
  • 2022-07-05
  • 39.75MB
  • 需要2积分
  • 3次下载
  • favicon收藏
  • rep举报
  • free评论
标签: 自动驾驶

自动驾驶

自动驾驶计算机视觉研究综述英文版,阐述了自动驾驶计算机视觉研究综述:难题、数据集与前沿成果。

Computer Vision for Autonomous Vehicles:
Problems, Datasets and State-of-the-Art
Joel Janai
a,∗
, Fatma G¨ ney
a,∗
, Aseem Behl
a,∗
, Andreas Geiger
a,b
u
a
Autonomous
b
Computer
Vision Group, Max Planck Institute for Intelligent Systems, Spemannstr. 41, D-72076 T¨ bingen, Germany
u
Vision and Geometry Group, ETH Z¨ rich, Universit¨ tstrasse 6, CH-8092 Z¨ rich, Switzerland
u
a
u
Abstract
arXiv:1704.05519v1 [cs.CV] 18 Apr 2017
Recent years have witnessed amazing progress in AI related fields such as computer vision, machine learning and autonomous
vehicles. As with any rapidly growing field, however, it becomes increasingly difficult to stay up-to-date or enter the field as a
beginner. While several topic specific survey papers have been written, to date no general survey on problems, datasets and methods
in computer vision for autonomous vehicles exists. This paper attempts to narrow this gap by providing a state-of-the-art survey on
this topic. Our survey includes both the historically most relevant literature as well as the current state-of-the-art on several specific
topics, including recognition, reconstruction, motion estimation, tracking, scene understanding and end-to-end learning. Towards
this goal, we first provide a taxonomy to classify each approach and then analyze the performance of the state-of-the-art on several
challenging benchmarking datasets including KITTI, ISPRS, MOT and Cityscapes. Besides, we discuss open problems and current
research challenges. To ease accessibility and accommodate missing references, we will also provide an interactive platform which
allows to navigate topics and methods, and provides additional information and project links for each paper.
Keywords:
Computer Vision, Autonomous Vehicles, Autonomous Vision
manner
1
. We hope that our survey will become a useful tool
for researchers in the field of autonomous vision and lowers the
entry barrier for beginners by providing an exhaustive overview
over the field.
There exist several other related surveys. Winner et al. (2015)
explains in detail systems for active safety and driver assistance,
considering both their structure and their function. Their focus
is to cover all aspects of driver assistance systems and the chap-
ter about machine vision covers only the most basic concepts
of the autonomous vision problem. Klette (2015) provide an
overview over vision-based driver assistance systems. They de-
scribe most aspects of the perception problem at a high level,
but do not provide an in-depth review of the state-of-the-art in
each task as we pursue in this paper. Complementary to our
survey, Zhu et al. (2017) provide an overview of environment
perception for intelligent vehicles, focusing on lane detection,
traffic sign/light recognition as well as vehicle tracking. In con-
trast, our goal is to bridge the gap between the robotics, intelli-
gent vehicles, photogrammetry and computer vision communi-
ties by providing an extensive overview and comparison which
includes works from all fields.
1. History of Autonomous Driving
1.1. Autonomous Driving Projects
Many governmental institutions worldwide started various
projects to explore intelligent transportation systems (ITS). The
1
http://www.cvlibs.net/projects/autonomous_vision_survey
Since the first successful demonstrations in the 1980s (Dick-
manns & Mysliwetz (1992); Dickmanns & Graefe (1988); Thorpe
et al. (1988)), great progress has been made in the field of au-
tonomous vehicles. Despite these advances, however, it is safe
to believe that fully autonomous navigation in arbitrarily com-
plex environments is still decades away. The reason for this
is two-fold: First, autonomous systems which operate in com-
plex dynamic environments require artificial intelligence which
generalizes to unpredictable situations and reasons in a timely
manner. Second, informed decisions require accurate percep-
tion, yet most of the existing computer vision systems produce
errors at a rate which is not acceptable for autonomous naviga-
tion.
In this paper, we focus on the second aspect which we call
autonomous vision
and investigate the performance of current
perception systems for autonomous vehicles. Towards this goal,
we first provide a taxonomy of problems and classify exist-
ing datasets and techniques using this taxonomy, describing the
pros and cons of each method. Second, we analyze the current
state-of-the-art performance on several popular publicly avail-
able benchmarking datasets. In particular, we provide a novel
in-depth qualitative analysis of the KITTI benchmark which
shows the easiest and most difficult examples based on the meth-
ods submitted to the evaluation server. Based on this analysis,
we discuss open research problems and challenges. To ease
navigation, we also provide an interactive online tool which vi-
sualizes our taxonomy using a graph and provides additional
information and links to project pages in an easily accessible
Joint
first authors with equal contribution.
Preprint submitted to ISPRS Journal of Photogrammetry and Remote Sensing
April 20, 2017
PROMETHEUS project started 1986 in Europe and involved
more than 13 vehicle manufacturers, several research units from
governments and universities of 19 European countries. One of
the first projects in the United States was Navlab Thorpe et al.
(1988) by the Carnegie Mellon University which achieved a
major milestone in 1995, by completing the first autonomous
drive
2
from Pittsburgh, PA and Sand Diego, CA. After many
initiatives were launched by universities, research centers and
automobile companies, the U.S. government established the Na-
tional Automated Highway System Consortium (NAHSC) in
1995. Similar to the U.S., Japan established the Advanced Cruise-
Assist Highway System Research Association in 1996 among
many automobile industries and research centers to foster re-
search on automatic vehicle guidance. Bertozzi et al. (2000)
survey many approaches to the challenging task of autonomous
road following developed during these projects. They concluded
that sufficient computing power is become increasingly avail-
able, but difficulties like reflections, wet road, direct sunshine,
tunnels and shadows still make data interpretation challenging.
Thus, they suggested the enhancement of sensor capabilities.
They also pointed out that the legal aspects related to the re-
sponsibility and impact of automatic driving on human passen-
gers need to be considered carefully. In summary, the automa-
tion will likely be restricted to special infrastructures and will
be extended gradually.
Motivated by the success of the PROMETHEUS projects to
drive autonomously on highways, Franke et al. (1998) describe
a real-time vision system for autonomous driving in complex
urban traffic situations. While highway scenarios have been
studied intensively, urban scenes have not been addressed be-
fore. Their system included depth-based obstacle detection and
tracking from stereo as well as a framework for monocular de-
tection and recognition of relevant objects such as traffic signs.
The fusion of several perception systems developed by Vis-
Lab
3
have led to several prototype vehicles including ARGO
Broggi et al. (1999), TerraMax Braid et al. (2006), and BRAiVE
Grisleri & Fedriga (2010). BRAiVE is the latest vehicle proto-
type which is now integrating all systems that VisLab has de-
veloped so far. Bertozzi et al. (2011) demonstrated the robust-
ness of their system at the VisLab Intercontinental Autonomous
Challenge, a semi-autonomous drive from Italy to China. The
onboard system allows to detect obstacles, lane marking, ditches,
berms and identify the presence and position of a preceding ve-
hicle. The information produced by the sensing suite is used to
perform different tasks such as leader-following and stop & go.
The PROUD project Broggi et al. (2015) slightly modified
the BRAiVE prototype Grisleri & Fedriga (2010) to drive in
urban roads and freeways open to regular traffic in Parma. To-
wards this goal they enrich an openly licensed map with in-
formation about the maneuver to be managed (e.g. pedestrian
crossing, traffic light, . . . ). The vehicle was able to handle
complex situations such as roundabouts, intersections, priority
roads, stops, tunnels, crosswalks, traffic lights, highways, and
2
https://www.cmu.edu/news/stories/archives/2015/july/
urban roads without any human intervention.
The V-Charge project Furgale et al. (2013) presents an elec-
tric automated car outfitted with close-to-market sensors. A
fully operational system is proposed including vision-only lo-
calization, mapping, navigation and control. The project sup-
ported many works on different problems such as calibration
Heng et al. (2013, 2015), stereo H¨ ne et al. (2014), reconstruc-
a
tion Haene et al. (2012, 2013, 2014), SLAM Grimmett et al.
(2015) and free space detection H¨ ne et al. (2015). In addition
a
to these research objectives, the project keeps a strong focus on
deploying and evaluating the system in realistic environments.
Google started their self-driving car project in 2009 and
completed over 1,498,000 miles autonomously until March 2016
4
in Mountain View, CA, Austin, TX and Kirkland, WA. Dif-
ferent sensors (i.a. cameras, radars, LiDAR, wheel encoder,
GPS) allow to detect pedestrians, cyclists, vehicles, road work
and more in all directions. According to their accident reports,
Google’s self-driving cars were involved only in 14 collisions
while 13 times were caused by others. In 2016, the project
was split off to Waymo
5
, an independent self-driving technol-
ogy company.
Tesla Autopilot
6
is an advanced driver assistant system de-
veloped by Tesla which was first rolled out in 2015 with version
7 of their software. The automation level of the system allows
full automation but requires the full attention of the driver to
take control if necessary. From October 2016, all vehicles pro-
duced by Tesla were equipped with eight cameras, twelve ul-
trasonic sensors and a forward-facing radar to enable full self-
driving capability.
Long Distance Test Demonstrations:
In 1995 the team within
the PROMETHEUS project Dickmanns et al. (1990); Franke
et al. (1994); Dickmanns et al. (1994) performed the first au-
tonomous long-distance drive from Munich, Germany, to Odense,
Denmark, at velocities up to 175 km/h with about 95% au-
tonomous driving. Similarly, in the U.S. Pomerleau & Jochem
(1996) drove from Washington DC to San Diego in the ’No
hands across America’ tour with 98% automated steering yet
manual longitudinal control.
In 2014, Ziegler et al. (2014) demonstrated a 103 km ride
from Mannheim to Pforzheim Germany, known as Bertha Benz
memorial route, in nearly fully autonomous manner. They present
an autonomous vehicle equipped with close-to-production sen-
sor hardware. Object detection and free-space analysis is per-
formed with radar and stereo vision. Monocular vision is used
for traffic light detection and object classification. Two comple-
mentary vision algorithms, point feature based and lane mark-
ing based, allow precise localization relative to manually an-
notated digital road maps. They concluded that even thought
the drive was successfully completed the overall behavior is far
inferior to the performance level of an attentive human driver.
Recently, Bojarski et al. (2016) drove autonomously 98%
of the time from Holmdel to Atlantic Highlands in Monmouth
4
https://static.googleusercontent.com/media/www.google.
look-ma-no-hands.html
3
http://www.vislab.it
com/lt//selfdrivingcar/files/reports/report-0316.pdf
5
https://www.waymo.com
6
https://www.tesla.com/autopilot
2
County NJ as well as 10 miles on the Garden State Parkway
without intervention. Towards this goal, a convolutional neural
network which predicts vehicle control directly from images is
used in the NVIDIA DRIVE
TM
PX self-driving car. The system
is discussed in greater detail in Section 11.
While all aforementioned performed impressively, the gen-
eral assumption of precisely annotated road maps as well as pre-
recorded maps for localization demonstrates that autonomous
systems are still far from human capabilities. Most importantly,
robust perception from visual information but also general arti-
ficial intelligence are required to reach human level reliability
and react safely even in complex innercity situations.
1.2. Autonomous Driving Competitions
The European Land Robot Trial (ELROB)
7
is a demonstra-
tion and competition of unmanned systems in realistic scenarios
and terrains, focusing mainly on military aspects such as recon-
naissance and surveillance, autonomous navigation and convoy
transport. In contrast to autonomous driving challenges, EL-
ROB scenarios typically include navigation in rough terrain.
The first autonomous driving competition focusing on road
scenes (though primarily dirt roads) has been initiated by the
American Defense Advanced Research Projects Agency (DARPA)
in 2004. The DARPA Grand Challenge 2004 offered a prize
money of $1 million for the team first finishing a 150 mile route
which crossed the border from California to Nevada. However,
none of the robot vehicles completed the route. One year later,
in 2005, DARPA announced a second edition of its challenge
with 5 vehicles successfully completing the route (Buehler et al.
(2007)). The third competition of the DARPA Grand Chal-
lenge, known as the Urban Challenge (Buehler et al. (2009)),
took place on November 3, 2007 at the site of the George Air
Force Base in California. The challenge involved a 96 km ur-
ban area course where traffic regulations had to be obeyed while
negotiating with other vehicles and merging into traffic.
The Grand Cooperative Driving Challenge (GCDC
8
, see
also Geiger et al. (2012a)), a competition focusing on autonomous
cooperative driving behavior was held in Helmond, Netherlands
in 2011 for the first time and in 2016 for a second edition. Dur-
ing the competition, teams had to negotiate convoys, join con-
voys and lead convoys. The winner was selected based on a
system that assigned points to randomly mixed teams.
2. Datasets & Benchmarks
Datasets have played a key role in the progress of many
research fields by providing problem specific examples with
ground truth. They allow quantitative evaluation of approaches
providing key insights about their capacities and limitations.
In particular, several of these datasets Geiger et al. (2012b);
Scharstein & Szeliski (2002); Baker et al. (2011); Everingham
et al. (2010); Cordts et al. (2016) also provide online evalua-
tion servers which allow for a fair comparison on held-out test
7
http://www.elrob.org/
8
http://www.gcdc.net/en/
sets and provide researchers in the field an up-to-date overview
over the state-of-the-art. This way, current progress and remain-
ing challenges can be easily identified by the research commu-
nity. In the context of autonomous vehicles, the KITTI dataset
Geiger et al. (2012b) and the Cityscapes dataset Cordts et al.
(2016) have introduced challenging benchmarks for reconstruc-
tion, motion estimation and recognition tasks, and contributed
to closing the gap between laboratory settings and challeng-
ing real-world situations. Only a few years ago, datasets with
a few hundred annotated examples were considered sufficient
for many problems. The introduction of datasets with many
hundred to thousands of labeled examples, however, has led to
spectacular breakthroughs in many computer vision disciplines
by training high-capacity deep models in a supervised fashion.
However, collecting a large amount of annotated data is not an
easy endeavor, in particular for tasks such as optical flow or
semantic segmentation. This initiated a collective effort to pro-
duce that kind of data in several areas by searching for ways
to automate the process as much as possible such as through
semi-supervised learning or synthesization.
2.1. Real-World Datasets
While several algorithmic aspects can be inspected using
synthetic data, real-world datasets are necessary to guarantee
performance of algorithms in real situations. For example, al-
gorithms employed in practice need to handle complex objects
and environments while facing challenging environmental con-
ditions such as direct lighting, reflections from specular sur-
faces, fog or rain. The acquisition of ground truth is often la-
bor intensive because very often this kind of information cannot
be directly obtained with a sensor but requires tedious manual
annotation. For example, (Scharstein & Szeliski (2002),Baker
et al. (2011)) acquire dense pixel-level annotations in a con-
trolled lab environment whereas Geiger et al. (2012b); Konder-
mann et al. (2016) provide sparse pixel-level annotations of real
street scenes using a LiDAR laser scanner.
Recently, crowdsourcing with Amazon’s Mechanical Turk
9
have become very popular to create annotations for large scale
datasets, e.g., Deng et al. (2009); Lin et al. (2014); Leal-Taix´
e
et al. (2015); Milan et al. (2016). However, the annotation qual-
ity obtained via Mechanical Turk is often not sufficient to be
considered as reference and significant efforts in post-processing
and cleaning-up the obtained labels is typically required. In the
following, we will first discuss the most popular computer vi-
sion datasets and benchmarks addressing tasks relevant to au-
tonomous vision. Thereafter, we will focus on datasets particu-
larly dedicated to autonomous vehicle applications.
Stereo and 3D Reconstruction:
The Middlebury stereo bench-
mark
10
introduced by Scharstein & Szeliski (2002) provides
several multi-frame stereo data sets for comparing the perfor-
mance of stereo matching algorithms. Pixel-level ground truth
is obtained by hand labeling and reconstructing planar compo-
nents in piecewise planar scenes. Scharstein & Szeliski (2002)
9
https://www.mturk.com/mturk/welcome
10
http://vision.middlebury.edu/stereo/
3
videos in a variety of indoor and outdoor scenes. A high-precision
laser scanner allows to register all images with a robust method.
The high-resolution images enable the evaluation of detailed 3D
reconstruction while the low-resolution stereo images are pro-
vided to compare approaches for mobile devices.
Optical Flow:
The Middlebury flow benchmark
13
by Baker
et al. (2011) provides sequences with non-rigid motion, syn-
thetic sequences and a subset of the Middlebury stereo bench-
mark sequences (static scenes) for the evaluation of optical flow
methods. For all non-rigid sequences, ground truth flow is ob-
tained by tracking hidden fluorescent textures sprayed onto the
objects using a toothbrush. The dataset comprises eight differ-
ent sequences with eight frames each. Ground truth is provided
for one pair of frames per sequence.
Besides the limited size, real world challenges like com-
plex structures, lighting variation and shadows are missing as
the dataset necessitates laboratory conditions which allow for
manipulating the light source between individual captures. In
addition, it only comprises very small motions of up to twelve
pixels which do not admit the investigation of challenges pro-
vided by fast motions. Compared to other datasets, however, the
Middlebury dataset allows to evaluate sub-pixel precision since
it provides very accurate and dense ground truth. Performance
is measured using the angular error (AEE) and the absolute end
point error (EPE) between the estimated flow and the ground
truth.
Janai et al. (2017) present a novel optical flow dataset com-
prising of complex real world scenes in contrast to the labo-
ratory setting in Middlebury. High-speed video cameras are
used to create accurate reference data by tracking pixel through
densely sampled space-time volumes. This method allows to
acquire optical flow ground truth in challenging everyday scenes
in an automatic fashion and to augment realistic effects such as
motion blur to compare methods in varying conditions. Janai
et al. (2017) provide 160 diverse real-world sequences of dy-
namic scenes with a significantly larger resolution (1280×1024
Pixels) than previous optical datasets and compare several state-
of-the-art optical techniques on this data.
Object Recognition and Segmentation:
The availability of
large-scale, publicly available datasets such as ImageNet (Deng
et al. (2009)), PASCAL VOC (Everingham et al. (2010)), Mi-
crosoft COCO (Lin et al. (2014)), Cityscapes (Cordts et al.
(2016)) and TorontoCity (Wang et al. (2016)) have had a major
impact on the success of deep learning in object classification,
detection, and semantic segmentation tasks.
The PASCAL Visual Object Classes (VOC) challenge
14
by
Everingham et al. (2010) is a benchmark for object classifica-
tion, object detection, object segmentation and action recogni-
tion. It consists of challenging consumer photographs collected
from Flickr with high quality annotations and contains large
variability in pose, illumination and occlusion. Since its in-
troduction, the VOC challenge has been very popular and was
13
http://vision.middlebury.edu/flow/
14
http://host.robots.ox.ac.uk/pascal/VOC/
Figure 1: The structured light system of Scharstein et al. (2014) provides highly
accurate depth ground truth, visualized in color and shadings (top). A close-up
view is provided in (a),(b), rounded disparities are shown in (c) and the surface
obtained using a baseline method in (d). Adapted from Scharstein et al. (2014).
further provide a taxonomy of stereo algorithms that allows the
comparison of design decisions and a test bed for quantitative
evaluation. Approaches submitted to their benchmark website
are evaluated using the root mean squared error and the per-
centage of bad pixels between the estimated and ground truth
disparity maps.
Scharstein & Szeliski (2003) and Scharstein et al. (2014)
introduced novel datasets to the Middlebury benchmark com-
prising more complex scenes and including ordinary objects
like chairs, tables and plants. In both works a structured light-
ing system was used to create ground truth. For the latest ver-
sion Middlebury v3, Scharstein et al. (2014) generate highly
accurate ground truth for high-resolution stereo images with
a novel technique for 2D subpixel correspondence search and
self-calibration of cameras as well as projectors. This new ver-
sion achieves significantly higher disparity and rectification ac-
curacy than those of existing datasets and allows a more precise
evaluation. An example depth map from the dataset is illus-
trated in Figure 1.
The Middlebury multi-view stereo (MVS) benchmark
11
by
Seitz et al. (2006) is a calibrated multi-view image dataset with
registered ground truth 3D models for the comparison of MVS
approaches. The benchmark played a key role in the advances
of MVS approaches but is relatively small in size with only two
scenes. In contrast, the TUD MVS dataset
12
by Jensen et al.
(2014) provides 124 different scenes that were also recorded
in controlled laboratory environment. Reference data is ob-
tained by combining structured light scans from each camera
position and the resulting scans are very dense, each containing
13.4 million points on average. For 44 scenes the full 360 de-
gree model was obtained by rotation and scanning four times
with 90 degree intervals. In contrast to the datasets so far,
Sch¨ ps et al. (2017) provide scenes that are not carefully staged
o
in a controlled laboratory environment and thus represent real
world challenges. Sch¨ ps et al. (2017) recorded high-resolution
o
DSLR imagery as well as synchronized low-resolution stereo
11
http://vision.middlebury.edu/mview/
12
http://roboimagedata.compute.dtu.dk/?page_id=36
4
yearly updated and adapted to the needs of the community un-
til the end of the program in 2012. Whereas the first challenge
in 2005 had only 4 different classes, 20 different object classes
were introduced in 2007. Over the years, the benchmark grew
in size reaching a total of 11,530 images with 27,450 ROI an-
notated objects in 2012.
In 2014, Lin et al. (2014) introduced the Microsoft COCO
dataset
15
for the object detection, instance segmentation and
contextual reasoning. They provide images of complex every-
day scenes containing common objects in their natural context.
The dataset comprises 91 object classes, 2.5 million annotated
instances and 328k images in total. Microsoft COCO is signifi-
cantly larger in the number of instances per class than the PAS-
CAL VOC object segmentation benchmark. All objects are an-
notated with per-instance segmentations in an extensive crowd
worker effort. Similar to PASCAL VOC, the intersection-over-
union metric is used for evaluation.
Tracking:
Leal-Taix´ et al. (2015); Milan et al. (2016) present
e
16
the MOTChallenge which addresses the lack of a centralized
benchmark for multi object tracking. The benchmark contains
14 challenging video sequences in unconstrained environments
filmed with static and moving cameras and subsumes many ex-
isting multi-object tracking benchmarks such as PETS (Ferry-
man & Shahrokni (2009)) and KITTI (Geiger et al. (2012b)).
The annotations for three object classes are provided: mov-
ing or standing pedestrians, people that are not in an upright
position and others. They use the two popular tracking mea-
sures, Multiple Object Tracking Accuracy (MOTA) and Multi-
ple Object Tracking Precision (MOTP) introduced by Stiefelha-
gen et al. (2007) for the evaluation of the approaches. Detection
ground truth provided by the authors allows to analyze the per-
formance of tracking systems independent of a detection sys-
tem. Methods using a detector and methods using the detection
ground truth can be compared separately on their website.
Aerial Image Datasets:
The ISPRS benchmark
17
(Rotten-
steiner et al. (2013, 2014)) provides data acquired by airborne
sensors for urban object detection and 3D building reconstruc-
tion and segmentation. It consists of two datasets: Vaihingen
and Downtown Toronto. The object classes considered in the
object detection task are building, road, tree, ground, and car.
The Vaihingen dataset provides three areas with various object
classes and a large test site for road detection algorithms. The
Downtown Toronto dataset covers an area of about 1.45 km
2
in the central area of Toronto, Canada. Similarly to Vaihingen,
there are two smaller areas for object extraction and building
reconstruction, as well as one large area for road detection. For
each test area, aerial images with orientation parameters, digi-
tal surface model (DSM), orthophoto mosaic and airborne laser
scans are provided. The quality of the approaches is assessed
using several metrics for detection and reconstruction. In both
cases completeness, correctness and quality is assessed on a
per-area level and a per-object level.
15
http://mscoco.org/
16
https://motchallenge.net/
17
http://www2.isprs.org/commissions/comm3/wg4/tests.html
Figure 2: The recording platform with sensors (top left), trajectory (top center),
disparity and optical flow (top right) and 3D object labels (bottom) from the
KITTI benchmark proposed by Geiger et al. (2012b). Adapted from Geiger
et al. (2012b).
Autonomous Driving:
In 2012, Geiger et al. (2012b, 2013)
have introduced the KITTI Vision Benchmark
18
for stereo, opti-
cal flow, visual odometry/SLAM and 3D object detection (Fig-
ure 2). The dataset has been captured from an autonomous driv-
ing platform and comprises six hours of recordings using high-
resolution color and grayscale stereo cameras, a Velodyne 3D
laser scanner and high-precision GPS/IMU inertial navigation
system. The stereo and optical flow benchmarks derived from
this dataset comprise 194 training and 195 test image pairs at
a resolution of 1280
×
376 pixels and sparse ground truth ob-
tained by projecting accumulated 3D laser point clouds onto the
image. Due to the limitations of the rotating laser scanner used
as reference sensor, the stereo and optical flow benchmark is
restricted to static scenes with camera motion.
To provide ground truth motion fields for dynamic scenes,
Menze & Geiger (2015) have annotated 400 dynamic scenes,
fitting accurate 3D CAD models to all vehicles in motion in to
order to obtain flow and stereo ground truth for these objects.
The KITTI flow and stereo benchmarks use the percentage of
erroneous (bad) pixels to assess the performance of the submit-
ted methods. Additionally, Menze & Geiger (2015) combined
the stereo and flow ground truth to form a novel 3D scene flow
benchmark. For evaluating scene flow, they combine classical
stereo and optical flow measures.
The visual odometry
/
SLAM challenge consists of 22 stereo
sequences, with a total length of 39.2 km. The ground truth
pose is obtained using GPS/IMU localization unit which was
fed with RTK correction signals. The translational and rota-
tional error averaged over a particular trajectory length is con-
sidered for evaluation.
For the KITTI object detection challenge, a special 3D la-
beling tool has been developed to annotate all 3D objects with
3D bounding boxes for 7481 training and 7518 test images. The
benchmark for the object detection task was separated into a
vehicle, pedestrian and cyclist detection tasks, allowing to fo-
cus the analysis on the most important problems in the context
18
http://www.cvlibs.net/datasets/kitti/
5
展开预览

推荐帖子 最新更新时间:2022-08-16 00:04

IAP自升级程序
资料:-------------------------------------------------------------------------------------------------------------------------通常在用户需要实现IAP功能时,即用户程序运行中作自身的更新操作,需要在设计固件程序时编写两个项目代码,第一个项目程序不执行正常的功能操作,而只是通过某
马建鲁 微控制器 MCU
晒晒CY8CKIT-042-BLE开发板
本帖最后由 dcexpert 于 2015-2-4 09:48 编辑 先晒晒Cypress BLE workshop上用的CY8CKIT-042-BLE开发套件,相关资料稍晚会整理出来 像一本书那样厚厚的盒子,现在CY的开发板好像都是这种风格,显得比别的开发板包含了更多东西 用防静电泡沫很好的保护了开发板 CY8CKIT-042-BLE套件包含了一个带Debug的底板、两
dcexpert 单片机
ST SensorTile官方蓝牙板系列编程之一:寄存器闪灯
本帖最后由 大秦正声 于 2017-2-15 08:51 编辑 /* ST SensorTile 官方蓝牙板 最小系统测试程序 不需要头文件和库函数! IAR 7.8版本 PG12口闪灯 作者: 大秦正声,小电子 日期: 2017.2.11 博客:  http://yang96381.blog.163.com 邮箱:    yang96381@163.com */ #def
大秦正声 MEMS传感器
基于KL26Z的串行TFT屏图像显示
在KL25、KL26及KL27中,我最喜欢的是KL26开发板,这不但是因为它的硬件配置,也源于它的软件资源支持。别开它与其它板子比起来并不起眼,但基于这块开发板和所提供的例程,能将它打造成一个简单的多媒体开发平台。在前期,曾参考音频播放的例程实现了一个会讲故事的书的小作品,后来受时间的限制,就把后续的功能拓展放下了。前一段又进行了一点拓展,就是利用板载的SD卡来存放图像文件,并借助例程中的文件管理
jinglixixi NXP MCU
[转]【电子DIY】红外线电子栅栏报警器
  红外线电子栅栏报警器已经被广泛应用于各类安保场合,它具备结构简单、造价低廉、可靠性好等优点。  一、基本原理   电子栅栏报警器主要分发射机和接收机两部分。发射机主要负责红外线的发射,接收机主要负责红外线的接收、判断、警报触发。在使用中,发射机和接收机拉开一定距离安装,且发射管与接收头垂直对正,当发射机开机后,即形成一束红外线栅栏。当有人穿越栅栏时,会瞬间阻断红外线,警报装置立即启动报警,
ohahaha PCB设计
【实战】消除Buck转换器中的EMI(2)
6. 实战案例本章将示范在Buck转换器的EMI设计中的不同方法所导致的影响。示范所使用的IC是RT7297CHZSP,一款800kHz工作频率、3A输出能力的电流模式Buck转换器,采用PSOP-8封装。测试中的电路工作在12V输入下,输出为3.3V/3A,测试所用电路显示在图12中。测试所用的板子有两个版本,一个具有完整的地铜箔层,一个没有。板上设置了多种可选配置,如LC输入滤波器,不同的输入
木犯001号 电源技术

评论

登录/注册

意见反馈

求资源

回顶部
查找数据手册?

EEWorld Datasheet 技术支持

热门活动

相关视频

可能感兴趣器件

About Us 关于我们 客户服务 联系方式 器件索引 网站地图 最新更新 手机版 版权声明

北京市海淀区知春路23号集成电路设计园量子银座1305 电话:(010)82350740 邮编:100191

电子工程世界版权所有 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号 Copyright © 2005-2022 EEWORLD.com.cn, Inc. All rights reserved
×