Appearance-Based Topological Bayesian Inference for Loop-Closing Detection in a Cross-Country Environment

Abstract

1. Introduction

In this paper, an appearance-based environment modeling technique is presented. Based on this approach, the probabilistic Bayesian inference can work together with a symbolic topological map to relocalize a mobile robot. One prominent advantage offered by this algorithm is that it can be applied to a cross-country environment where no features or landmarks are available. Further more, the loopclosing can be detected independently of estimated map and vehicle location. High dimensional laser measurements are projected into a low dimensional space (mapspace) which describes the appearance of the environment. Since laser scans from the same region share a similar appearance, after the projection, they are expected to form a distinct cluster in the low dimensional space. This small cluster essentially encodes appearance information of the specific region in the environment, and it can be approximated by a Gaussian distribution. This Gaussian model can serve as the “joint” between the topological map structure and the probabilistic Bayesian inference. By employing such “joints”, the Bayesian inference in the metric level can be conveniently implemented on a topological level. Based on appearance, the proposed inference process is thus completely independent of local metric features. Extensive experiments were conducted using a tracked vehicle traveling in an open jungle environment. Results from live runs verified the feasibility of using the proposed methods to detect loop-closing. The performances are also given and thoroughly analyzed.

During the process of simultaneous localization and mapping, when a vehicle revisits a place (called closing-a-loop), it needs to associate its observations with the environment it mapped previously, this problem is often referred to as loop-closing detection or the revisiting problem. Loop-closing detection is widely acknowledged as a major problem within the SLAM community. Given consecutive measurements from a 2D range scanner and inertial sensors, the goal of this work is to localize the loop-closing at the topological level. More specifically, after dividing the map into a series of topological nodes, our objective is to identify which node is the one where loop-closing takes place. In this work, the challenges come from the fact that no features or landmarks can be robustly detected in the crosscountry environment. Here we compare the testing field with two other popular outdoor environments, as shown in Figure 1. It can be seen that, in both Victoria Park and the car park, there exist quite a few observable landmarks such as trees, walls and corners. However, in the open jungle environment, there are no apparent geometrical patterns. The first contribution of this work is an appearance model built from raw 2D range scans.All the measurement frames are segmented into a sequence of groups. Each group corresponds to a certain region in the environment. In the mapping context, it is regarded as a submap of the environment. This process is illustrated in Figure 2. Subsequently, all measurements are projected to a low dimensional space using principal component analysis (PCA). Suppose we have measurement frames from two submaps A and B, as shown in Figure 3(b); for a typical 2D laser range scan at the resolution of 0.5 degree, each frame contains 361 data. A 3D coordinate is employed to illustrate this high

KEY WORDS—appearance, SLAM, topology, Bayesian, PCA, loop-closing detection

The International Journal of Robotics Research Vol. 25, No. 10, October 2006, pp. 953-983 DOI: 10.1177/0278364906068375 ©2006 SAGE Publications

953

954

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

(b) Car park E, NTU

(c) Open jungle

50

50

45

45

45

40

40

40

35

35

35

30

30

30

25

Y, unit: m

50

Y, unit: m

Y, unit:m

(a) Victoria Park

25

25

20

20

20

15

15

15

10

10

10

5

5

0 −50

−40

−30

−20

−10

0

10

20

X, unit:m

(d) Victoria Park

30

40

50

0 −30

5

−20

−10

0

10

X, unit: m

(e) Car park E, NTU

20

30

0 −30

−20

−10

0

10

20

30

X, unit: m

(f) Open jungle

Fig. 1. A comparison of environments where SLAM is performed. Sub-figure(a), (b) and (c) are the photos taken at Victoria Park, Sydney; car park E, NTU; and a cross-country environment. Sub-figure (d),(e),(f) are respectively the raw 2D range scans taken from these environments.

Fig. 2. The consecutive scans collected when the vehicle moves are segmented into a sequence of submaps. This procedure will be elaborated in Section 3.

Chen and Wang / Loop-Closing Detection

955

Fig. 3. Combining symbolic map topology with probabilistic Bayesian inference. (A)The topological graph built by segmenting the successive 2D range scans. For each new 2D scan, the conditional probability of observing this frame conditioned can be calculated. This probability model makes it possible to build Bayesian network (E) on the topological level.

dimensional measurement space, which is denoted as x−y−z. By conducting PCA, these measurement frames are projected to a low dimensional (now, 2D) space where it is more convenient to segment them. This space is called map space, denoted as x −y , see Figure 3(c). In the map space, projected measurements from the same submap are expected to gather within a compact cluster. For each of above clusters, its distribution is approximately Gaussian, as in Figure 3(d). Compared with the huge quantity of raw range data collected in a submap, the mean/variance representation can drastically reduce the computational complexity, but can nevertheless catch the information contained in the raw data. The second contribution of this paper is to utilize such an appearance model to bridge the gap between the topological map representation and the probabilistic Bayesian inference.

It is known that the map topology is inherently symbolic, which is difficult to infer in a numerical way. The above Gaussian distributions make it possible to build probabilistic observation models for the map topology. With these models, the popular metric level probabilistic Bayesian inference can be conveniently transplanted to the topological level, see Figure 3(e). Informed decisions can thus be made for loop-closing detection given a sequence of measurements. We demonstrate that, by combining these two techniques, the advantages of both are exploited: the presented algorithm is capable of performing Bayesian inference on a topological level, without any reliance on metric-level features. This characteristic enables the algorithm to detect loop-closing in a cross-country environment, where feature-extraction algorithms are prone to yield poor performance. Additionally, the

956

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

proposed algorithm uses the sensorial information that is outside the central SLAM estimation loop, so it will not utilize a potentially erroneous state (vehicle pose) to make a decision regarding the fusion of measurements. Therefore, even if the vehicle’s self-location provided by SLAM has a large error, the loop-closing detection algorithm can still work properly. This paper is organized as follows. The next section will give some background knowledge about the loop-closing detection; Section 3 will describe how the map topology is constructed, then in Section 4, we will set out the appearance modeling process for the map topology using 2D laser scans; then Section 5 will explain how to incorporate the appearancebased techniques into a Bayesian inference framework; finally, the results and performance analysis will be shown in Section 6.

2. Background Simultaneous localization and mapping (SLAM) algorithms try to build a model of the environment and concurrently localize the robot itself. From the fundamental work by Smith, Self, and Cheeseman (1988), and Leonnard, Durrant-Whyte, and Cox (1991), to recent convergence proof by Dissanayake et al. (2001), many SLAM algorithms addressing different SLAM issues have been developed. According to the widely referenced convergence proof presented by Dissanayake et al. (2001), when the robot revisits a place and propagates the error in the loop-closing back to all the components of the map, the map becomes more correlated. As this process iterates, the components inside the map will finally become fully-correlated, and thus the map converges. Given the role loop-closing plays, it is widely acknowledged that loop-closing is crucial to solving the SLAM problem. Before conducting loop-closing, the robot should first detect it. More specifically, the robot needs to associate its current observation with a certain part of the environment it mapped some time ago. As elaborated in Kuipers and Beeson (2002), the difficulties here lie in two aspects: perceptual aliasing, in which different places appear the same; and measurement variability, where the same place appears differently. When the algorithm cannot handle perceptual aliasing, it may take an unexplored place as somewhere already mapped and then give a false positive report. On the other hand, if the algorithm is too conservative, measurement variability would be difficult to deal with. The algorithm may then report false negative, i.e., the vehicle cannot detect the loop-closing although it has already been in the mapped place. Various schemes have been proposed to accurately detect loop-closing. Most of them fall into two classes. The first one is to exploit information from a single observation and then make a deterministic judgement. If one single frame of measurement is not yet sufficient, the second one is to accumulate information from a sequence of observations and then make a “batch” decision. A review of these two techniques follows.

2.1. Deterministic Methods A feature map is the most popular way to describe an environment and thus detect the loop-closing. In a feature map, the environment is modeled as a combination of certain geometrical patterns such as circles (Guivant and Nebot 2001), corners (Arras et al. 2003), lines (Jensfelt and Kristensen 2001) and more recently, polyline (Veeck and Burgard 2004). A more general corner detection technique has also been developed by Madhavan and Durrant-Whyte (2004), which can be regarded as a 1D version of the SIFT (Lowe 2004). For visual sensors, Zhou, Wei, and Tan (2003) developed the multi-dimensional histogram to represent the rich information within the observed image, such as colors, edges and textures. Lamon et al. (2001) introduced a low dimensional representation called an image fingerprint sequence for measuring the similarity between image frames. Similar comparison can also be applied to the image histogram, as proposed by Ulrich and Nourbakhsh (2000). For a map representation such as an occupancy grid (Elfes 1987; Konolige 1997), Gutmann and Konolige (1999) used correlation to detect possible matches between the robot’s current observation and the previously built map. Such a correlation-based technique was also used by Duckett and Nehmzow (2001), in whose research, the correlation was applied to the histogram of the occupancy grids. Similar to the grid map, point matching has also recently been used in SLAM to serve as the observation model in Lu and Milios (1997) and Thrun (2001). From the perspective of tracking, loop-closing can be detected when some already mapped features fall into the uncertainty gate of their predictions (Guivant and Nebot 2001), such techniques essentially calculate a weighted distance between the observed feature and the estimated one. Since the static features always have a fixed position relative to each other, Neira and Tardós (2001) proposed the joint compatibility test to exploit the inter-feature relations. 2.2. Non-Deterministic Methods If one frame of measurement is not sufficient to make a reasonable judgement, a straight forward alternative is to accumulate information over time. A sequence of measurements taken at different times can be analyzed in a “batch” manner to verify the loop-closing. A multiple-hypotheses-tracking based loopclosing detection algorithm was proposed by Tomatis, Nourbakhsh, Iand Siegwart (2002). The possible closing positions compete with each other, until two of them finally become dominant. These two represent the vehicle’s current position and the position in the already-built map. Markov localization proposed by Fox, Burgard, and Thrun (1998) is another implementation of above idea to localize a mobile robot. In contrast to Fox, who used only a single beam of laser range, Gutmann and Konolige (1999) used the corre-

Chen and Wang / Loop-Closing Detection lation between the map and a whole frame of measurement for the observation model. Topological Bayesian inference reduces the high dimensional robot pose space into topological nodes’ space, whose dimensionality is much lower. The combination of map topology and Bayesian inference is found in Kuipers and Beeson (2002), in which the authors used unsupervised learning to let the algorithm learn by itself how to map observations into different topological nodes. Unfortunately, with such a scheme it is not yet known how serious the perceptual aliasing problem could be. Recently, Modayil, Beeson, and Kuipers (2004) combined the dynamic Bayesian network with map topology to build a large scale map. In the Monte Carlo sampling scheme, the distribution of vehicle’s pose within the map is represented as a set of weighted particles (Doucet, de Freitas, and Gordon 2001). Thrun, Fox, and Burgard (2000) first introduced the Monte Carlo approach to the robot localization, and demonstrated attractive robustness and efficiency. Ranganathan and Dellaert (2004, 2005) further combined the Monte Carlo sampling with map topology under the Markov localization framework. By doing so, the correct map topology can be learned from the space of all possible topologies. Stewart et al. (2003) recently developed a hierarchical Bayesian approach for the revisiting problem, this approach divides the environment into a set of local map patches which are connected. A hidden Markov process is modeled to represent the transitions between these patches. However, it is not clear yet how well the priors over this map structure can be learned.

3. Building the Map Topology Segmenting the observed measurements (2D range frames) to different places is a crucial procedure in submap based SLAM (Kuipers et al. 2004; Beeson, Jong, and Kuipers 2005). From the topological perspective, such segmentation abstracts continuous sensory experience to a graph of atomic structures, and these structures constitute the basic components of a topological map. From the perspective of learning, just as with other appearance-based techniques, the loop-closing detection technique in this paper needs a supervised learning process, to teach itself how to distinguish measurement frames from different regions. To perform such supervised learning, the sequence of measurement frames are first required to be automatically labeled to distinct places (submaps). All of these labeled frames then form a pool of training samples, from which further classification rules are learned. An important property of the labeling process in this work is that it is carried out online and incrementally, so the map structure is to some extent encoded in the vehicle’s trajectory. If the incoming observations are only labeled according to the

957

similarities between them (Kuipers and Beeson 2002), e.g., using a clustering technique such as K-means (Duda, Hart, and Stork 2000), the map structure information acquired when the vehicle travels cannot be incorporated in this training process. For example, given two measurement frames similar but far from each other, a purely appearance-based classification technique will label them as from the same place. On the other hand, if the map is only segmented according to the map structure, e.g., the volume of the submap as in Guivant and Nebot (2001), observations within the same submap could be distinct. Consequently, the topological reasoning would be difficult to carry out. In this paper, the above two kinds of labeling strategies are integrated. Either the change of the environment’s structure, or the change of the exterior sensor observations, will divide the vehicle’s experience into disjoint segments, i.e., initialize a new submap. Here, intersections of the road are used to detect the shift between the environment’s structures. Such intersections are indicated by the change of the heading direction of the vehicle, because the vehicle is supposed to always navigate itself following the road. When its heading changes greatly, a reasonable assertion is that the vehicle has moved from one place to another. Appearance-based segmentation is not trivial because of the measurement variability. The desired algorithm must capture the major structure of the input range scan, which is often encoded in the low frequency domain. It should also be non-sensitive to the local distracters which are in the high frequency domain. In this paper, wavelet is employed to remove those high frequency details and preserve the structural information of the range scan. Wavelet is a well-understood technique for information compression and noise removal, here we apply the 3 level db1 wavelet to each frame of 2D range scan, and then a vector whose length is only 1/8 of the original measurement is acquired, as in Figure 4. By comparing the Euclidean distances between these vectors, the shifts between submaps can be detected. Interested readers can turn to Daubechies (2002) for more details. Readers may be confused to see that two dimensionality reduction techniques, PCA and wavelet, are both used here. However, please note that the above wavelet segmentation does not have any recognition capability. It is only employed to detect “new” regions, which results in the segmentation of the map into a topological graph. Whether the detected new region has already been mapped or not can only be answered by going through all the previously acquired information. This recognition process will be elaborated in the following section. The segmentation strategy is tested in a large cross-country jungle environment, as detailed in Section 6. During the trial, 19 053 frames of 2D scans were collected; the total length of the trajectory was over 3500 meters. A reference map was built to illustrate the shape of the environment, as shown in Figure 5. As indicated, there is a loop in this trajectory.

958

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

100

Raw data 50

0

0

50

100

150

200

250

300

350

400

1000

Level 1 500

0

0

20

40

60

80

100

120

140

160

180

200

20

30

40

50

60

70

80

90

100

10

15

20

25

30

35

40

45

50

600

Level 2

400 200 0

0

10

400

Level 3

200

0

0

5

Fig. 4. The db1 wavelet used in this work. Because a 3 level wavelet is employed, the raw data can be finally compressed to a vector whose size is only 1/8 of the original size. It can be seen that the high frequency noise is mostly removed in the compressed result.

The whole map is finally segmented into 35 submaps, based on both the heading directions of the vehicle, and the similarities between continuous measurements. This online segmentation’s result is plotted in Figures 6 and 7. The solid dots represent the changes of the vehicle’s heading, ranging from −π to π . The thin curve is the change of the sensor measurements; the peaks of this curve represent large dissimilarities between successive observations, which indicate the possible transitions from one submap to another. How the submaps are segmented can be easily observed from this figure: for example, submap No. 25 is initialized when both the vehicle’s heading and the range observations change vastly; for submap No. 19, the heading direction of the vehicle does not change much, but the wavelet checking reports a high dissimilarity.

Although the above submap representation makes it convenient to build an observational model for the map topology, it is difficult to encode the vehicle’s metric level motion into the Bayesian inference. For example, a submap can be sufficiently long that, given a reading from inertial sensors, it is impossible to predict whether the vehicle is still inside this submap or has moved outside of it. In this paper, the topological node is defined as a segment of a submap which is at a fixed resolution. By dividing the submap into topological nodes, it will be much more convenient to incorporate the vehicle’s inertial sensor measurements into the Bayesian framework. For example, in this paper, the topological node’s length is set to be 10 meters, so if the vehicle is reported to have moved 12.7 meters, it can be predicted that the vehicle has probably moved to a certain nearby node.

Chen and Wang / Loop-Closing Detection

959

400

Trajectory ended here 350

300

North, unit: m

250

200

Loop closing here

150

Trajectory started here 100

50

0

−50

0

100

200

300

400

500

600

700

800

East, unit: m

Fig. 5. The environment where the trial was conducted. This map was built by rendering the 2D laser scans to the vehicle poses read from GPS/INS, which can be regarded as ground truth. Please note that this map is only used for reference and illustration purpose. It dose not provide any information to the loop-closing detection algorithm.

change of heading change of appearance 4

←1

3

←10

2

←25

1 0

←13

−1

←28 ←31

←16

←34

−2

←7 −3 −4

←19

←4 0

2000

←22 4000

6000

8000

10000

12000

14000

16000

measurement ID

Fig. 6. The incoming measurements are segmented according to either the similarities between them or the changes of vehicle’s heading directions. In fact, the dissimilarities computed based on wavelet are much larger. To display them in a single figure with heading’s changes, the dissimilarities are all scaled smaller. For clarity, for every three submaps’ IDs, only one is marked.

960

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006 400

14

350

15 16

34

North, unit: m

300

35

250

11 12

32 1

200

2 7 45

150

19

10 3

17 18

13

33

89 31

6

20 21 22

30

100

29

0 −50

23

28

50

0

100

200

300

27

24 26 25

400

500

East, unit:m

600

700

800

Fig. 7. The submaps are represented as rectangles, the size of the rectangle is determined by the length and max width of the road boundary.

4. Appearance Model for Topological Map 4.1. Advantages of Using Appearance Model In contrast to those techniques based on a geometrical model (Guivant and Nebot 2001; Lamon et al. 2001; Duckett and Nehmzow 2001) which try to register the sensor measurements to a model of the environment, appearance-based techniques (Krose et al. 2001; Crowley, Wallner, and Schiele 1998; Porta, Verbeek, and Krose 2005) are not designed to capture the relations between observations and the map geometry. Instead, they directly build the environment’s representation in the sensor space, i.e., the space spanned by the sensor values themselves. For example, an appearance-based approach can learn what makes the observations in a corridor different from those in a square room, but it does not necessarily distinguish the doors or walls. From the perspective of feature extraction, the “features” extracted by appearance-based techniques could be uninformative to human eyes. The advantage of appearance-based techniques is that the a priori knowledge required to model the map, such as the definition of walls and corners, is no longer necessary. In other words, appearancebased techniques can work without the conventional “feature extraction” routing; this is highly beneficial when the vehicle navigates in cross-country environments where features are difficult to extract. Principal component analysis (PCA) is a widely used tool to handle the high dimensional measurements space. A PCA-

based recognition/localization algorithm was originally introduced in the computer vision community by Turk and Pentland (1991). Thereafter, PCA and its derivatives have achieved tremendous popularity in the pattern recognition domain, and have been successfully applied to various artificial intelligence applications, e.g., face recognition (Yang et al. 2004), object detection (Ali and Shah 2005) and bio-informatics (Yeung 2001). Also using PCA,Vlassis and Krose (1999) proposed a robot localization algorithm which used appearance information to localize a mobile robot in the indoor environment. A similar implementation for 2D range data was developed by Crowley, Wallner, and Schiele (1998), in which synthetic range scans were calculated and used to train the appearance model. To the authors’ knowledge, this is the first time that 2D range data are used for appearance-based mobile robot localization. Although innovative, this approach is essentially not completely appearance-based because it still relies on a composite range map to generate synthetic scans, and to build such a map could be quite challenging in the outdoor environment. A known problem of the original PCA is that it could be timeconsuming to build the eigenspace as the robot travels; Artac, Jogan, and Leonardis (2002) employed an incremental approach to conduct PCA for image data. The appearance-based solution most similar to ours is the one proposed by Krose et al. (2001), in which a sophisticated algorithm is presented to calculate the probability of observing a certain scene given a robot pose. This approach,

Chen and Wang / Loop-Closing Detection which used a panorama camera, showed attractive localization results in an office environment. However, we believe its performance could be even better if the topological structure of the environment is incorporated in the localization process, as presented in this paper. Using a laser scanner’s range data for PCA can outperform image-based PCA in two ways. First, the laser scanner’s measurement frame is much smaller than the image frame. A typical 2D scan contains 361 data measurements, however, a typical 320 × 240 gray image is of the size 76,800 pixels. Second, images captured from cameras are prone to being affected by illumination conditions; PCA may give false results under variable light. Range data, on the other hand, are not sensitive to the illumination conditions. 4.2. Eigen-Representation Suppose that at time t, the labeling approach discussed in previous subsection has segmented m submaps, say, S1 , S2 , . . . , Sm ; each of them comprises a series of measurements, si,1 , si,2 , . . . , si,n(i) , where i = 1, . . . , m. Each measurement is a typical laser scan including 361 range measurements (scalar) for each angle at a 0.5 degree resolution (this may be different for other operational modes of the sensor). It can be regarded as a vector of dimension 361, or, equivalently, a point in a 361-dimensional space. One of the advantages of the appearance modeling is that it can automatically handle the out-of-range data, for which the laser beams are not reflected by any object. These kinds of readings are often encoded as a special value by the sensor. Appearance modeling processes these values without any discriminations and let the data manifest the low-dimensional manifold by themselves. Frames of each submap will not be randomly distributed in this huge measurement space and thus can be described by a relatively low dimensional subspace. PCA can find the vectors that best account for the distribution of frames within the entire measurement space. These vectors define a subspace of measurement space. Each vector is of length 361, and is a linear combination of the original measurements. At time t, the average frame of the whole measurement set is computed by: m n(i) T i=1 j =1 si,j t = ψt1 ψt2 · · · ψt361 . (1) = m i=1 n(i) As the vehicle moves and new range scans are observed, this average frame evolves over time, as in Figure 8. For the j th frame of the ith submap, it differs from this average by a vector: 1 T 2 361 φi,j · · · φi,j . φi,j = si,j − t = φi,j Then all the φi,j from all the submaps are subjected to PCA, which seeks a set of normal vectors that can best describe the distribution of the data.

961

In most cases, only the dominant part of the distribution is necessary, the details can be ignored (Turk and Pentland 1991). So the normal vector which describes the distribution of the data can be much smaller than the original measurement data. In this implementation, the first λ eigenvectors which correspond to the biggest λ eigenvalues are chosen. These eigenvectors are symbolized as u1t , u2t , u3t , ..., uλt , where: T k,1 ukt = uk,2 · · · uk,361 ut k = 1, 2, . . . , λ . (2) t t These eigenvectors define a space with dimensionality λ, which represents the most predominant information about the measurements. In this paper, all the map modeling is conducted in this space; for convenience, it is referred to as map space. The first four are shown in Figure 9. These eigenvectors can catch the statistical features of the measurements: e.g., in the first sub-figure, the points in lower part are much denser than the ones in upper part, because this is a common property shared by all frames of measurements (due to the fixed angular resolution of 2D laser scanner); and these 4 sub-figures have the basic shape of a road, because for most of the time, the vehicle traveled in a road-like environment. When a new measurement sx is available, it is projected into the map-space by a simple operation: wxk = (ukt )T (sx − t )

(3)

where k = 1, 2, . . . , λ. This describes a set of point-by-point multiplications and summations. These weights (scalar) form a low dimensional vector which can be used to represent this measurement frame sx : T Wx = wx1 wx2 · · · wxλ . (4) The vector Wx essentially describes the contribution of each eigenvector in representing the input measurement frame sx , by treating these eigenvectors as a basis set for measured frames. This low dimensional vector is the core of our appearance-model. It provides SLAM with a convenient tool to represent measurement frames, as well as local environment (by averaging all the measurement frames inside). For convenience, in this context, such a projection of a measurement frame in the eigenspace is called eigenframe. As can be observed, this modeling process is completely independent of any metric features or landmarks. 4.3. Computing the Probabilistic Observational Models A probabilistic model in the mapspace is necessary for a Bayesian inference process. Given a certain submap Si , where i ∈ {1, . . . , m}, we can project all measurement frames within this submap into the map space using (3). Then there will be n(i) vectors at the length of λ, here they are denoted as: Wi1 , Wi2 , . . . , Win(i) .

(5)

962

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006 The average till 1301 80

70

70

60

60

50

50

North, unit:m

North, unit:m

The average till 1001 80

40

30

40

30

20

20

10

10

0

0

−10 −70

−60

−50

−40

−30

−20

−10

0

10

−10 −80

20

−60

−40

East, unit: m

(a) No. 1 - No. 1001

70

70

60

60

50

50

40

30

30

20

10

10

0

0

−20

40

40

20

−40

20

The average till 2001 80

North, unit:m

North, unit:m

The average till 1601

−60

0

(b) No. 1 - No. 1301

80

−10 −80

−20

East, unit: m

0

20

40

East, unit: m

−10 −80

−60

−40

−20

0

20

40

60

East, unit: m

(c) No. 1 - No. 1601

(d) No. 1 - No. 2001

Fig. 8. The average of all the collected measurements. It can be seen that when the vehicle moves forward, the average of the training pool evolves accordingly.

The center of this cluster corresponds to a vector which can best describe this submap: 1 j W¯ i = W . n(i) j =1 i n(i)

(6)

This center’s estimate comes with a variance, which is computed by: σi =

1 n(i) − 1

n(i) j =1

W¯ i − Wij 2

(7)

where σi can also be regarded as the trace of a diagonal covariance matrix which shows how these n(i) points are distributed in the map space. Here this distribution is approximated as Gaussian: Wi ∼ N [W¯ i , (σi )2 ].

(8)

Given an incoming measurement zt at time t, after projecting it into the map space using (3), we could obtain its eigenframe Wt . Let Yti represent the fact that the vehicle is within submap Si at time t; the probability of observing zt

Chen and Wang / Loop-Closing Detection Eigenvector corresponds to the 1st biggest Eigenvalue

Eigenvector corresponds to the 2 nd biggest Eigenvalue

40

80

70

30

60

25

50

20

40

y :unit:m

y :unit:m

35

15

30

10

20

5

10

0

0

−5 −12

−10

−8

−6

−4

−2

0

2

4

6

−10 −40

8

−20

0

20

x : unit: m

70

70

60

60

50

50

40

40

30

30

20

20

10

10

0

0

−10

0

80

Eigenvector corresponds to the 4 th biggest Eigenvalue 80

y :unit:m

y :unit:m

Eigenvector corresponds to the 3 th biggest Eigenvalue

−20

60

(b) 2nd

80

−30

40

x : unit: m

(a) 1st

−10 −40

963

10

20

30

40

x : unit: m

−10 −60

−40

−20

0

20

40

60

80

x : unit: m

(c) 3rd

(d) 4th

Fig. 9. The eigenvectors corresponding to the biggest 4 eigenvalues. Please note that these figures are expressive: they catch the basic shape of the measurements acquired in the testing field. A problem of displaying these eigenvectors is that they are not supposed to be at the scale of the original measurement frames, so the displayed is the result after normalization.

5. Appearance-Based Bayesian Inference on a Topological Level

conditioned on Yti can therefore be calculated as: 1 W¯ − W 2 t i . p(zt |Yti ) = exp − 2 (σi )2

(9)

This formulation is of great importance in vehicle localization. It provides a probabilistic way to model the connection between the 2D scanner’s observations and the places (nodes) in the topological map, without any knowledge about features or landmarks.

Because of the existence of perceptual aliasing and measurement variability, the probability in (9) alone is not sufficient to detect loop-closing. Therefore, to improve the robustness of the loop-closing detection, the Bayesian inference process was introduced so that we could fuse information acquired from a sequence of observations. In most of the cases, matching a sequence of measurements with the previously built map is an exhausting task, because the number of possible solutions could be exponentially high.

964

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

The Markov assumption offers an effective way to bypass the above matching problem. It assumes that only the one-step-previous action/states can affect the vehicle’s current state. This essentially divides the whole expensive matching problem into a chain of small matching problems. The correct candidate is expected to survive all the individual testings, while those false ones are not. Bayes Law then provides an efficient way to connect these individual testings in a probabilistic manner, so that the matching results can be propagated through the whole Markov chain to the end. 5.1. Topological Bayesian Inference From the perspective of loop-closing detection, the task of Bayesian inference is to localize the vehicle’s current position within its previously built map (the topological network). A probability will be assigned to each topological node T x ; the goal of the loop-closing is then to find which topological node T has the highest probability that the vehicle is currently within it: T = arg max p(Ttx |Zt , Ut−1 , SHt−1 ) x

x = 1, 2, ..., n (10)

where n is the total number of the topological nodes, Zt is the whole set of observations till time t, and SHt−1 and Ut−1 are respectively the set of detected shifts between submaps and transitions between topological nodes till time t − 1. At each time instance t, Bayesian inference calculates the vehicle’s position distribution over the topological node space, p(Tt |Zt , Ut−1 , SHt−1 ). To be consistent with Section 4.2, here m is used to denote the total number of submaps, so this probability can be further marginalized over submaps: p(Tt |Zt , Ut−1 , SHt−1 ) =

m

p(Tt |Ytj , Zt , Ut−1 , SHt−1 )p

j =1

(Y |Zt , Ut−1 , SHt−1 ). j t

(11)

Since each topological node is inside a definite submap, the conditional probability of p(Tt |Ytj ) can be calculated as: p(Tt |Ytj ) =

p(Tt ) if Tt ∈ Y 0 otherwise. j t

(12)

This probability can be expressed by a function ν(Tt , Ytj ) which takes the value 1 when Tt ∈ Ytj and 0 in other cases. Therefore, eq. (11) can be re-written as: p(Tt |Zt , Ut−1 , SHt−1 ) =

m

ν(Tt , Ytj )p(Tt |Zt , Ut−1 , SHt−1 )

j =1

p(Ytj |Zt , Ut−1 , SHt−1 ).

(13)

The second item on the right side of (13) is the estimation for the vehicle’s state in the topological node space. By applying Bayes Rule and assuming that the estimation problem is Markovian, it can be calculated as: p(Tt |Zt , Ut−1 , SHt−1 ) = p(Tt |zt , Zt−1 , Ut−1 , SHt−1 ) = p(zt |Tt , Zt−1 , ut−1 , sht−1 )p(Tt |Zt−1 , ut−1 , sht−1 ).

(14)

Since the observation will not be affected by the vehicle motion and previous observations, theZt−1 , ut−1 and sht−1 in the first item on the right side should be omitted. We may further notice that, the topological nodes do not have any appearance characteristics, therefore the observation zt is actually independent of the topological node and p(zt |Tt ) can be regarded as a constant: p(zt |Tt , Zt−1 , ut−1 , sht−1 ) = p(zt |Tt ) = c

(15)

The second item on the right hand in (14) is for calculating the prior on the likelihood of topological nodes. As explained in Section 3, the topological nodes only model the vehicle’s motions and do not encode any appearance information. Therefore, they are labeled continuously and the transitions on the topological level are completely independent of the shifts on the submap level. Then we can drop the item sht−1 , and there will be: p(Tt |Zt−1 , ut−1 , sht−1 ) = p(Tt |Zt−1 , ut−1 ) n i i = p(Tt |Tt−1 , ut−1 )p(Tt−1 |Zt−1 )

(16)

i=1 i where p(Tt |Tt−1 , ut−1 ) is the transitional model in the topologi |Zt−1 ) is the state of the topological ical node space, and p(Tt−1 node in previous step. The second item on the right side of (11) is the estimation for the vehicle’s state in the submap space. As with the estimation for the topological node in (14), this likelihood can be computed using Bayesian inference based on a Markovian assumption:

p(Ytj |Zt , Ut−1 , SHt−1 ) = p(zt |Ytj , Zt−1 , ut−1 , sht−1 ) p(Ytj |Zt−1 , ut−1 , sht−1 )

(17)

where the first item can be simplified into p(zt |Ytj ) because the observations are apparently independent of the vehicle’s movement and previous observations. This probability represents exactly the observational model we constructed in Section 4. The second item p(Ytj |Zt−1 , ut−1 , sht−1 ) is the prior probability of the vehicle’s state in the submap space. Given the fact that the motion between submaps is independent of the motion between topological nodes, we can drop the item ut−1

Chen and Wang / Loop-Closing Detection and then this probability is calculated as: p(Ytj |Zt−1 , ut−1 , sht−1 ) =

m

k k p(Ytj |Yt−1 , sht−1 )p(Yt−1 |Zt−1 ).

k=1

(18) All above inferences can be illustrated by the Bayesian inference network in Figure 10: 5.2. Motion Models in Submap Space and Node Space It can be seen that, in contrast toconventional Markov localization (Fox, Burgard, and Thrun 1998), here two kinds of transitional models (also referred as motion model) exist: transition among submaps, and transition among topological nodes. The transition between topological nodes is formulated as: p(T i |T j , ut−1 )

(19)

where i and j are the IDs of two certain topological nodes. This probability is used to model the predicted motion read from inertial sensors. Given an odometry input vt−1 , the number of topological nodes that the vehicle has traveled since the last time instance can be computed as: vt−1 ut−1 = (20) φnode where φnode is the fixed size of the topological node. This equation reveals the advantage of introducing the topological node level in the map representation hierarchy: by doing so, the continuous dead-reckoning process becomes discrete and therefore can be integrated with other discrete variables in the Bayesian inference process. Obviously ut−1 is not so accurate an estimate because of the round operator [·], the transitional probability can thus be finally formulated as:

p(T |T , ut−1 ) = exp i

j

ut−1 − i − j σt2

2 (21)

where σt is a manually set parameter to model the confidence we have when we calculate the number of topological nodes from odometry. For instance, if the vehicle has traveled 50 meters, and the topological node’s size φnode is 10 meters. Equation (20) will show that the vehicle has traveled 5 nodes, while actually it could be only 4 nodes. There is an ambiguity here and σt is employed to model such uncertainty. Basically, the above equation gives bias toward non-loopclosing. It assumes that if one loop-closing has happened, in the following a few steps, the vehicle’s trajectory should be consecutive in terms of both time and geography, or in other words, another loop-closing is not so likely to happen again. We recognize that this may be quite a strong assumption, especially in the indoor environment which is quite compact and

965

complicated. However, as we noticed in the experiment, in the outdoor jungle environment where the proposed algorithm is designed for, the environment is quite sparse and loop-closing does not happen so frequently. On the other hand, Removing this component is equivalent to assuming that the motion of the vehicle could be “random” as a result of the potential loopclosing, and therefore the valuable odometry information may not be exploited in the inference process. The transitional probability from submap i to submap j is denoted as: p(Y i |Y j , sht−1 )

(22)

where sht−1 represents the report from submap segmentation routing. Apparently, if a submap shift is detected, there could be two possible explanations: first, the detection is correct, the vehicle has moved to the next submap, with a probability γ ; or the detection is a false alarm, the vehicle is still in the current submap, with a probability 1 − γ . The transitional probability can be calculated as follows: i = j+1 γ 1 − γ i=j p(Y i |Y j , sht−1 ) = 0 else Please note that in the above equation, the situation could exist in which Y i is the first submap and Y j is the last one, here i = j + 1 still stands.

6. Experimental Results 6.1. Platform Experiments were carried out to test the performances of the appearance-based topological Bayesian inference. The platform for the experiment is a tracked vehicle. For testing purposes, the sensors were also mounted on a pickup (Figure 11) in the same layout as they were mounted on the tracked vehicle. More details about the experiments can be found in Ng et al. (2004). Since the test platform is a tracked vehicle, its motion cannot be read through dead-reckoning. As elaborated in previous sections, the proposed technique utilizes only the local vehicle transformations, i.e., translation and steering. These data can be conveniently calculated from the measurements of IMU. Although its error will accumulate over time, the commercial IMU is already enough for the segmentation purpose. It must be noted that, the GPS data used in this paper are only for reference and result analysis. During the experiments, no GPS/INS information is involved. 6.2. Testing Environment The testing field is a square-like environment in a crosscountry jungle. A photo of the testing field is shown in Figure 1. In the experiment, the vehicle traveled a trajectory of

966

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

Fig. 10. The Bayesian inference network in our topological Bayesian inference.

tially throughout the map, rather than within each individual submap. This is consistent with the Bayesian inference network in Figure 10, which shows that the transitions in the topological nodes’ space is independent of the submap space. 6.3. The λ

Fig. 11. The pickup used to simulate the layout of the sensors.

about 700 meters, and collected more than 4000 frames of range measurements. Because of some mechanical problems, the vehicle stopped for a short while during the trial, so we manually “truncate” those data frames which correspond to the duration when vehicle stopped. In this environment, the map is segmented into 12 submaps. For readers’ convenience, we only plot the first 8 of them which correspond to the first loop (see Figure 13). The topological nodes are labeled accordingly in Figure 14. As can be observed, the topological nodes are indexed sequen-

To explore the eigenvalue spectrum, in Figure 15 we plot the percentages which represent how much variance the first n eigenvectors account for. This figure shows that, although we have a large quantity of data, the first few of them are already sufficient to describe them in the eigenspace. In this paper, the first 40 of them are used to represent the environment information, or mathematically, λ = 40. As demonstrated, these 40 eigenvectors can provide about 90 percent of the total observed information. To examine how λ in (4) can affect the performance of the Bayesian inference, in Figure 16, we plot the estimation results under different λ. This figure shows that the estimation error is big when the λ is either too small (e.g., 3 or 5) or too big (150 or 300). This agrees with appearance model’s theory: a too small λ means that only very little of the observed information is used in the calculation, in other words, a lot of valuable knowledge is ignored in the PCA process. Bayesian inference then gives poor estimations with insufficient information. In contrast, a large λ results in the incorporation of a lot of redundant information, such as noise and small distracters. Such redundant information can easily confuse the Bayesian inference and then lead to a false loop-closing report.

Chen and Wang / Loop-Closing Detection

967

180

160

140

North, unit: m

120

100

80

60

40

20

0

−20 −20

0

20

40

60

80

100

120

140

160

180

East, unit: m Fig. 12. Map of the testing environment. As in the map in Figure 5, this map is built from GPS/INS data and is for reference use only.

200 M6 M5

M7 150

M4

North, unit:m

←submap:13

100

50 M8 0 M3

M1 M2 −50 −50

0

50 100 East, unit:m

150

200

Fig. 13. The environment is divided into a set of submaps. Each submap has its own coordinate system, which is indicated by two arrows. The size of the submap is represented by a dashed rectangle.

968

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

28 160

25 31

140

22

North, unit: m

120 100 80

19

60 40 20

16

1

0

13

4

−20

7

−40 −50

0

50

100

10

150

200

East, unit: m

Normalized weight of each corresponding eigenvalue

Fig. 14. Topological nodes are obtained by further segmenting the submaps in a fixed resolution. All the topological nodes are indexed sequentially, because of limited space, we plot only one index for each three topological nodes.

1

0.8

0.6

0.4

0.2

0

0

50

100

150

200

250

300

350

400

Sorted eigenvector

Fig. 15. The X-axis represents the sorted ID of eigenvectors, the Y-axis represents the normalized accumulated eigenvalues up to the corresponding ID.

Chen and Wang / Loop-Closing Detection

969

Error for different λ 10 ground truth λ=3 λ=5 λ=60 λ = 150 λ= 300

8

6

Initialization region

Topolocal node

4

2

0

−2

−4

GPS ground truth −6

−8

−10

2

4

6

8

10

12

14

16

18

Estimations Fig. 16. The error between the estimation and GPS ground truth. The X-axis is the ground truth; coordinates in Y-axis represent the distances from the estimation to the vehicle’s real position, the units are topological nodes. For example, the coordinate (x = 5, y = 2) means at the fifth estimation, the error is 2 topological nodes away from the topological node in which the vehicle actually is.

In short, we can regard the appearance modeling as a process to collect only the information useful for distinguishing different places, and remove those which are useless. Too small a λ over-removes the useless information which damages the useful information, while too large a λ under-removes the useless information and gives false “features”. Although such over-remove and under-remove situations exist, how to choose the λ is actually not a problem. According to our experiments, for λ ranging from 20 to 80, the inference can all give satisfactory results.

Two measurement frames from submap No. 1 and No. 2 are depicted in Figure 19. Their corresponding eigenframes are shown in Figure 20. These two frames are distinct, which agrees with the fact that they are from different submaps. By comparing them with Figures 18(a) and (b), the highly similarities between submap No. 1’s eigen-representation and eigenframe No. 3161 can be observed. Such similarity is also found between submap No. 2 and No. 3351. These facts proove the validity of detecting loop-closing in the mapspace. 6.5. Loop-Closing Detection Results

6.4. From Euclidean Space to Mapspace The map modeling in eigenspace can catch most of the properties of the environment in the Euclidean space. In Figure 17, submap No. 1 and No. 2 are depicted in the global coordinates; it can be observed that these two submaps are very different, as a result, their representations in the eigen-space also demonstrate great dissimilarity, as in Figure 18.

We calculate the observation probability for each measurement frame conditioned on each submap, the results are shown in Figure 21. The flat lines in the center correspond to the time when vehicle stopped (the senor was still working). It can be noticed that there are at least 4 submaps which have similar probabilities at the point of loop-closing (near measurement No. 3000). In this case, the submap No. 1, which has the

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

180

180

160

160

140

140

120

120

100

100

North, unit: m

North, unit: m

970

80 60 40

80 60 40

20

20

0

0

−20

−20

submap No.1 −40

submap No.2

−40 −40

−20

0

20

40

60

80

100

120

140

160

180

−40

−20

0

20

40

East, unit: m

60

80

100

120

140

160

180

East, unit: m

(a) Submap No. 1

(b) Submap No. 2

500

500

400

400

300

300

200

200

100

100

Value

Value

Fig. 17. The two different submaps in the global coordinates.

0

0

−100

−100

−200

−200

−300

−300

−400

−400

−500

2

4

6

8

10

Item ID

(a) For submap No. 1

12

14

−500

2

4

6

8

10

12

14

Item ID

(b) For submap No. 2

Fig. 18. The averaged eigenframes used to represent the two submaps in Figure 17. Since the submaps’ appearances are completely different, their representations using eigenframes are also distinct.

Chen and Wang / Loop-Closing Detection

Frame3351

50

50

45

45

40

40

35

35

30

30

y :unit:m

y :unit:m

Frame3161

25

25

20

20

15

15

10

10

5

5

0 −25

−20

−15

−10

−5

0

971

5

10

15

20

0 −25

25

−20

−15

−10

−5

x : unit: m

0

5

10

15

20

25

x : unit: m

2D scan No. 3161

2D scan No. 3351

Fig. 19. These two frames are from different submaps.

EigenFrame 3351 500

400

400

300

300

200

200

100

100

Value

Value

EigenFrame No.3161 500

0

0

−100

−100

−200

−200

−300

−300

−400

−400

−500

0

5

10

15

20

25

Item ID

Eigenframe No. 3161

30

35

40

−500

0

5

10

15

20

25

30

35

40

Item ID

Eigenframe No. 3351

Fig. 20. The eigenframes computed for the two range scans in Figure 19. These two eigenframes are significantly different. By comparing them with Figure 18(a) and (b), respectively, it can be seen that a measurement and the submap from which it is observed are close in the eigenspace. This property, along with the dissimilarities showed previously, demonstrates the validity of conducting loop-closing detection in the eigenspace.

972

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

p(z|submap) 1 1 2 3 4 5 6 7 8

Re−visiting submap 1

Vehicle stopped here for a while

each verticle line represents the initialization of a new submap

1000

1500

2000

2500 3000 measurement ID

3500

Fig. 21. The observational probability for measurement z conditioned on different submaps. For the readers’ convenience, only the first 8 submaps’ curves are plotted. These probabilities are all high because they are not yet normalized. Please note that, no GPS or other positioning sensors’ measurements are used to acquire this result.

highest probability, is the correct result. However, it is still dangerous to simply use a Nearest Neighborhood criterion to detect the loop-closing. When possible loop-closings are detected, the topological Bayesian inference procedure is initialized to confirm it. The results of the Bayesian inference are depicted in the Appendix (Figures 24 to 35). In each figure, the probability distribution over the topological space is plotted. The corresponding geometric information about each topological node can be found in Figure 14. The vehicle’s actual position read from GPS is marked by an arrow. The proposed algorithm’s performance can therefore be observed by comparing the node with the highest probability with the indicated one. Given that all the sequence of measurements verify the loop-closing hypothesis, a batch decision can thus be made that the vehicle has revisited submap No. 1 at the measurement ID 3000.

correspondingly, 300 meters, the initialization costs less than 10 sec. Since the loop-closing detection algorithm only builds the topological level environment model, it is not required to be embedded into the local boundary map building. So the loop-closing detection and Markov localization can be run on an individual computer. Additionally, PCA is conducted only when a new submap is initialized, as noticed in Figures 12 and 5, such shifts do not happen very often. According to our trials, submap shift is detected on average every 1 minute. As a result, a 10 second lag is still affordable for the algorithms. After the projection from measurement space to the map space is computed, the eigenframe can be computed in constant time (less than 20 ms) for each incoming measurement. Even the submap number scales with travelled distance, in most current SLAM literatures, this number will not surpass 100, and the Bayesian inference can still be run in realtime.

6.6. Computational Efficiency

7. Conclusion and Discussion

We also computed the time requirement of conducting PCA, more specifically, the parameters of the projection from measurement space to map space. The algorithms are implemented using Matlab, and run on a computer with one Pentium IV 2.0 GHz processor. We found that within 2000 frames, or

This paper presents an innovative approach for detecting the loop-closing for SLAM in a highly unstructured cross-country environment where no geometrical landmarks are available. We elaborate how to use linear dimensionality reduction technique, i.e., PCA, to model the environment’s appearance.

Chen and Wang / Loop-Closing Detection

973

Time requirement in initializing each submap 30

25

Time unit: sec

20

15

10

5

0 500

1000

1500

2000

2500

3000

3500

4000

measurement ID

Fig. 22. The time requirement when a submap is initialized. As the measurement frames accumulate, the algorithm requires more and more time to calculate the mapping parameters from measurement space to the map space.

submap 2

200 100 0 −100 −200 −300

submap 1

−400 −500 −600 400

submap 3 200

2000 0

1500 −200

1000 −400

500 −600

0

Fig. 23. The 3-dimensional illustrative map space. Three submaps are plotted here: triangles, crosses and circles represent submaps 1, 2 and 3 respectively. These three submaps’ geometrical information can be found in Figures 5 and 7. As can be seen, PCA sometimes cannot properly handle the nonlinearity of the manifold.

974

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

After the high dimensional measurements are projected into a low dimensional manifold, their distributions are approximated by a series of Gaussian models, the major contribution of this paper is to use these Gaussian models to calculate the observation probability for Bayesian inference. By doing so, the appearance model of the environment could be integrated into the topological Bayesian inference process, so that the metrical level feature information is no longer required. The experimental results demonstrate that the proposed technique can robustly detect large scale loop-closing in a cross-country environment. Another desirable characteristic of the proposed algorithm is that, in contrast to the conventional loop-closing detection techniques which rely on feature tracking, in our inference framework no vehicle pose estimation is necessary. In other words, the loop-closing detection in this work is out of the central SLAM estimation loop. So even when the vehicle’s localization error is huge, the detection algorithm can nevertheless work properly. This is especially important for a tracked vehicle moving through the cross-country environment, because in this case, the error of the vehicle’s self-localization could become quite large after even a short distance. With respect to such a large error, it is impossible to use any “gating” mechanism to find whether the vehicle has re-visited a certain place. Although the presented algorithm has several advantages, it nevertheless has its own limitations. Currently, we can see that the presented algorithm could be further improved in following two aspects. 7.1. Handling Nonlinear Manifold PCA and the linear discriminant analysis (LDA) assume that the extracted features are linear functions of the input data. However, such linearity assumption may sometimes not lead to good results. Such a nonlinear manifold is illustrated in Figure 23. As can be seen, the distribution of projected input data from three submaps cannot be linearly segmented in the 3D space. This limitation of PCA has been widely discussed in the pattern recognition domain. Recent research has found that the observations are often controlled by a small number of factors such as the view angle. Such a relationship, even though

nonlinear globally, is often smooth and approximately linear in a local region. More powerful algorithms have been employed to exploit such local linearity, e.g., LLE (Roweis and Saul 2000) and ISOMAP (Tenenbaum, de Silva, and Langford 2000). Recently, innovative applications of nonlinear dimensionality reduction techniques in mobile robotics have been individually developed by Kumar, Guivant, and DurrantWhyte (2004) and Kumar et al. (2005). Here the authors argue that, integrating more powerful dimensionality reduction tools in the appearance-based topological Bayesian inference framework is a promising research direction and deserves further investigations. 7.2. Handling Viewpoint Variance As elaborated in previous sections, appearance-based loopclosing detection essentially combines two steps. The first one is a supervised learning process which teaches the algorithm how to distinguish different places. In the second step, it uses the learned knowledge to classify the new observed data. It is then important to understand that the appearance-based approaches completely rely on the given samples to understand the environment. If there are no samples from a certain aspect of the environment, the appearance-based technique cannot learn from them. Consequently, it cannot recognize such a place in the future, even the vehicle has already been there with a different pose. In short, such techniques are originally not supposed to be viewpoint-invariant. However, as observed in Figure 13, when the vehicle visits and re-visits submaps No. 1,2, . . . ,5, its trajectories are not exactly same. The proposed algorithm can still work properly in this situation, which demonstrates the appearance model’s capability to handle moderate viewpoint invariance. Meanwhile, it can also be noticed that, in a cyclic environment as in Figure 13, because of the road constraint and the fact that the vehicle is non-holonomic, for a fixed sensor, its viewpoint cannot change vastly during the revisiting unless the vehicle goes through a different trajectory. So in a cyclic environment, the viewpoint is often naturally constrained.

Appendix See Figures 24–35.

Chen and Wang / Loop-Closing Detection

0.0302

975

0.0305

0.03 0.03 0.0298

GPS 0.0296

0.0295

Prob

Prob

Estimation

Estimation

GPS 0.0294

0.029

0.0292

0.029

0.0285

0.0288 0.028 0.0286

0.0284

0

5

10

15

20

25

30

0.0275

35

0

5

10

15

node ID

20

25

30

35

25

30

35

node ID

(a) Step 3001

(b) Step 3051 0.032

0.031

Estimation 0.0305

0.031 0.03

GPS

GPS Prob

Prob

0.03

Estimation

0.0295

0.029

0.029

0.0285

0.028 0.028

0.027 0.0275

0.027

0

5

10

15

20

node ID

(c) Step 3101

25

30

35

0.026

0

5

10

15

20

node ID

(d) Step 3151

Fig. 24. The probability distribution over submaps and the topological network at the position corresponding to No. 3001 to No. 3151. As observed in (b), the estimation of the proposed algorithm does not match the actual position read from GPS because currently the Markov process is in initialization, current information is not sufficient to correctly compute where the vehicle is.

976

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

0.032

0.032

Estimation 0.031

0.031

Estimation

GPS

GPS

0.03

0.03

Prob

0.029

Prob

0.029

0.028

0.028

0.027

0.027

0.026

0.026

0.025

0

5

10

15

20

25

30

0.025

35

0

5

10

15

node ID

20

25

30

35

node ID

(a) 3201

(b) 3251

Fig. 25. The probability distribution over submaps and the topological network at the position corresponding to measurements No. 3201 to No. 3251. Compared with Figure 24, it can be seen that the estimated vehicle position has become much more accurate as new sensor data is received.

current position

Topological shift happened! 200

160 140

150

120

←submap:13

80

North, unit:m

North, unit:m

100

iter:3301 60 40

100

50

20 0

0 −20

Frame:3301

−40 −20

0

20

40

60

80

100

East, unit: m

(a) GPS position

120

140

160

180

−50 −50

0

50

100

150

200

East, unit: m

(b) p(zt |Ytj )

Fig. 26. A topological shift detected at iteration No. 3301. The vehicle is supposed to move from its previous submap to a new one. All the submap’s observational probability are recalculated and plotted as above.

Chen and Wang / Loop-Closing Detection

977

0.032

0.031

Estimation GPS

0.03

Prob

0.029

0.028

0.027

0.026

0.025

0

5

10

15

20

25

30

35

node ID

Fig. 27. The probability distribution over submaps and the topological network at the position corresponding to No. 3351. Up to this point, the estimations from Bayesian inference are satisfactory.

current position

Topological shift happened! 200

160 140

150

120

←submap:13

80

North, unit:m

North, unit:m

100

iter:3401 60 40

100

50

20 0

0 −20 −40

Frame:3401 −20

0

20

40

60

80

100

East, unit: m

(a) GPS position

120

140

160

180

−50 −50

0

50

100

150

200

East, unit: m

(b) p(zt |Ytj )

Fig. 28. Another topological shift detected at iteration No. 3401. According to the information gathered previously, this shift should be detected a short while later. Unfortunately, such a shift is reported because of the fact that the vehicle’s current trajectory is different from the one in the first loop. So there is now actually a conflict between the Bayesian estimation and the new observation. If such a conflict persists in the following iterations, the Bayesian inference will just report the loop-closing hypothesis at iteration 3000 is false.

978

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

0.032

0.032

0.031

0.031

GPS

GPS

0.03

0.03

Estimation

Estimation Prob

0.029

Prob

0.029

0.028

0.028

0.027

0.027

0.026

0.026

0.025

0

5

10

15

20

25

30

0.025

35

0

5

10

15

node ID

20

25

30

35

20

25

30

35

node ID

(a) 3451

(b) 3501

0.032

0.034

0.033 0.031

Estimation

0.032

GPS Estimation

0.03

GPS

0.031 0.029

Prob

Prob

0.03

0.028

0.029

0.028 0.027 0.027 0.026 0.026

0.025

0

5

10

15

20

node ID

(c) 3551

25

30

35

0.025

0

5

10

15

node ID

(d) 3601

Fig. 29. The probability distribution over submaps and the topological network at the position corresponding to measurements No. 3451 to No. 3601. The conflict introduced in Figure 28 “confused” the Bayesian inference process. Consequently, the error of our algorithm grows during these 4 steps. The question to be answered here is: how should we judge whether such errors are caused by the wrong loop-closing hypothesis, or by the temporary observation error?

Chen and Wang / Loop-Closing Detection

979

0.038

Estimation 0.036

0.034

0.032

Prob

GPS 0.03

0.028

0.026

0.024

0

5

10

15

20

25

30

35

node ID

Fig. 30. The probability distribution over submaps and the topological network at the position corresponding to No. 3651. The keep-coming correct observations since iteration No. 3351 finally compensate that error. In this figure, the estimation is quite close to the vehicle’s actual position.

current position

Topological shift detected

160

160

140

140

120

120

100

100

North, unit:m

North, unit:m

Frame:3701

80

iter:3701 60 40

80 60 40

20

20

0

0

−20

−20

−40

−40 −20

0

20

40

60

80

100

East, unit: m

(a) GPS position

120

140

160

180

−20

0

20

40

60

80

100

120

140

160

180

East, unit: m

(b) p(zt |Ytj )

Fig. 31. A topological shift is detected at iteration 3701. The vehicle’s actual position is depicted in (a), the observational probability of each submap is plotted in (b) by gray shading.

980

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

0.035

Estimation 0.034

0.033

GPS

0.032

Prob

0.031

0.03

0.029

0.028

0.027

0.026

0

5

10

15

20

25

30

35

node ID

Fig. 32. The probability distribution over submaps and the topological network at the position corresponding to No. 3751. The topological Bayesian inference gives the excellent estimation for the vehicle’s pose. Put in the loop-closing context, the keep-coming information verifies the loop-closing hypothesis.

current position

Topological shift detected

160

160

140

140

120

120

100

100

North, unit:m

North, unit:m

Frame:3801

80

iter:3801 60 40

80 60 40

20

20

0

0

−20

−20

−40

−40 −20

0

20

40

60

80

100

East, unit: m

(a) GPS position

120

140

160

180

−20

0

20

40

60

80

100

120

140

160

180

East, unit: m

(b) p(zt |Ytj )

Fig. 33. A topological shift is detected at frame No. 3801, then the likelihood of the vehicle’s current position is updated.

Chen and Wang / Loop-Closing Detection

981

0.034

0.033

GPS Estimation

0.032

Prob

0.031

0.03

0.029

0.028

0.027

0.026

0

5

10

15

20

25

30

35

node ID

Fig. 34. The probability distribution over submaps and the topological network at the position corresponding to measurement No. 3851.

0.035

0.034

0.034

0.033

Estimation GPS Estimation

0.033

0.032

GPS

0.032

0.031

0.03

Prob

Prob

0.031

0.03

0.029

0.029

0.028

0.028

0.027

0.027

0.026

0.026

0

5

10

15

20

node ID

(a) 3901

25

30

35

0.025

0

5

10

15

20

25

30

35

node ID

(b) 3951

Fig. 35. The probability distribution over submaps and the topological network at the position corresponding to measurement Nos 3901 to 3951. The estimation from Bayesian inference matches well with the ground truth, which demonstrates the accuracy and robustness of the proposed algorithm.

982

THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006

Acknowledgments We appreciate the help from Dr Javier Ibañez-Guzmán and Ng Teck Chew when the experiments were conducted.

References Ali, S. and Shah, M. 2005. A supervised learning framework for generic object detection in images. Proceedings of International Conference on Computer Vision, pp. 1347– 1354. Arras, K. O., Castellanos, J., Schilt, M., and Siegwart, R. 2003. Feature-based multi-hypothesis localization and tracking using geometric constraints. Robotics and Autonomous Systems 44:41–53. Artac, M., Jogan, M., and Leonardis, A. 2002. Mobile robot localization using an incremental eigenspace model. Proceedings of International Conference on Robotics and Automation, pp. 1025–1030. Beeson, P., Jong, N. K., and Kuipers, B. 2005. Towards autonomous topological place detection using the extended voronoi graph. Proceedings of IEEE International Conference on Robotics and Automation (ICRA-05). Crowley, J. L., Wallner, F., and Schiele, B. 1998. Position estimation using principal components of range data. Proceedings of International Conference on Robotics and Automation , pp. 3121–3128. Daubechies, I. 2002. Ten lectures on wavelets. Philadelphia: SIAM. Dissanayake, M. W. M. G., Newman, P. M., Clark, S., DurrantWhyte, H. F., and Csorba, M. 2001. A solution to the simultaneous localization and map building problem. IEEE Transactions Robotics and Automation 17(3):229–241. Doucet, A., de Freitas, N., and Gordon, N. 2001. Sequencial Monte Carlo Methods in Practice. New York: SpringerVerlag. Duckett, T. and Nehmzow, U. 2001. Mobile robot selflocalization using occupancy histograms and a mixture of Gaussian location hypotheses, Robotics and Autonomous Systems 34:117–129. Duda, R. O., Hart, P. E., and Stork, D. G. 2000. Pattern Classification. New York: John Wiley and Sons. Elfes, A. 1987. Occupancy grids: a probabilistic framework for robot perception and navigation. Journal of Robotics and Automation 3:249–265. Fox, D., Burgard, W., and Thrun, S. 1998. Active Markov localization for mobile robots. Robotics and Autonomous Systems 25:195–207. Guivant, J. E. and Nebot, E. M. 2001. Optimization of the simultaneous localization and map-building algorithm for real-time implementation, IEEE Transactions on Robotics and Automation 17(3):242–257. Gutmann, J.-S. and Konolige, K. 1999. Incremental mapping of large cyclic environments. Proceedings of Conference

on Intelligent Robots and Applications (CIRA). Jensfelt, P. and Kristensen, S. 2001. Active global localisation for a mobile robot using multiple hypothesis tracking, IEEE Transactions Robotics and Automation 17:748–760. Konolige, K. 1997. Improved occupancy grids for map building, Autonomous Robots 4:351–367. Krose, B. J. A., Vlassis, N., Bunschoten, R., and Motomura, Y. 2001. A probabilistic model for appearance based robot localization, Image and Vision Computing 19:381–391. Kuipers, B., and Beeson, P. 2002. Bootstrap learning for place recognition. Proceedigns of National Conference on Artificial Intelligence(AAAI), pp. 2512–2517. Kuipers, B., Modayil, J., Beeson, P., MacMahon, M., and Savelli, F. 2004. Local metrical and global topological maps in the hybrid spatial semantic hierarchy. Proceedings of International Conference on Robotics and Automation, pp. 4845–4851. Kumar, S., Guivant, J., and Durrant-Whyte, H. 2004. Informative representations of unstructured environment. Proceedings of International Conference on Robotics and Automation, pp. 212–217. Kumar, S., Ramos, F., Upcroft, B., and Durrant-Whyte, H. 2005. A statistical framework for natural feature representation. Proceedings of International Conference on Intelligent Robots and Systems, pp. 1–6. Lamon, P., Nourbakhsh, I., Jensen, B., and Siegwart, R. 2001. Deriving and matching image fingerprint sequences for mobile robot localization. Proceedings of International Conference on Robotics and Automation, pp. 1609–1614. Leonnard, J. J., Durrant-Whyte, H., and Cox, I. J. 1991. Mobile robot localization by tracking geometric beacons. IEEE Transactions on Robotics and Automation 3:376– 382. Lowe, D. G. 2004. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision 2:91–110. Lu, F. and Milios, E. 1997. Globally consistent range scan alignment for environment mapping. Autonomous Robots 4:333–349. Madhavan, R. and Durrant -Whyte, H. 2004. Natural landmark-based autonomous vehicle navigation. Robotics and Autonomous Systems 46:79–95. Modayil, J., Beeson, P., and Kuipers, B. 2004. Using the topological skeleton for scalable global metrical map-building. Proceedings of International Conference on Intelligent Robots and Systems, pp. 1530–1536. Neira, J. and Tardós, J. D. 2001. Data association in stochastic mapping using the joint compatibility test. International Journal of Robotics Research 17:890–897. Ng, T. C., Jian, S., Ziming, G., Ibañez-Guzmán, J., and Cheng, C. 2004. Vehicle following with obstacle avoidance capabilities in natural environments. Proceedings of International Conference on Robotics and Automation, volume 5, pp. 4283–4288.

Chen and Wang / Loop-Closing Detection Porta, J. M., Verbeek, J., and Krose, B. 2005. Active appearance-based robot localization using stereo vision. Autonomous Robots 18:59–80. Ranganathan, A. and Dellaert, F. 2004. Inference in the space of topological maps: an MCMC-based approach. Proceedings of International Conference on Intelligent Robots and Systems, pp. 1518–1523. Ranganathan, A. and Dellaert, F. 2005. Data driven MCMC for appearance-based topological mapping. Proceedings of Conference on Robotics: Science and Systems. Roweis, S. and Saul, L. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290:2323– 2326. Smith, R., Self, M., and Cheeseman, P. 1988. Estimating uncertain spatial relationships in robotics. In Cox, I. J. and Wilfong, G. T. (eds.) Autonomous Robot Vehicles, pp. 167– 193, New York: Springer Verlag. Stewart, B., Ko, J., Fox, D., and Konolige, K. 2003. The revisiting problem in mobile robot map building: A hierarchical Bayesian approach. Proceedings of the Conference on Uncertainty in AI (UAI). Tenenbaum, J. B., de Silva, V., and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2326. Thrun, S. 2001. A probabilistic on-line mapping algorithm for teams of mobile robots. International Journal of Robotics Research 20:335–363. Thrun, S., Fox, D., and Burgard, W. 2000. Robust Monte Carlo localization for mobile robots. Artificial Intelligence 128:99–141. Tomatis, N. Nourbakhsh, I., and Siegwart, R. 2002. Hybrid si-

983

multaneous localization and map building: closing the loop with multi-hypotheses tracking. Proceedings of International Conference on Robotics and Automation, pp. 2749– 2754. Turk, M. and Pentland, A. 1991. Face recognition using eigenfaces. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–591. Ulrich, I. and Nourbakhsh, I. 2000. Appearance-based place recognition for topological localization. Proceedings of International Conference on Robotics and Automation, pp. 1023–1029. Veeck, M. and Burgard, W. 2004. Learning polyline maps from range scan data acquired with mobile robots. Proceedings of International Conference on Intelligent Robots and Systems , pp. 1065–1070. Vlassis, N. and Krose, B. 1999. Robot environment modeling via principal component regression. Proceedings of International Conference on Intelligent Robots and Systems, pp. 677–672. Yang, J., Zhang, D., Frangi, A. F., and Yang, J. 2004. Twodimensional pca: A new approach to appearance based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(1):131–137. Yeung, K.Y. and Ruzzo, W.L. 2001. Principal component analysis for clustering gene expression data. Bioinformatics 17:763–74. Zhou, C., Wei, Y., and Tan, T. 2003. Mobile robot selflocalization based on global visual appearance features. Proceedings of International Conference on Robotics and Automation, pp. 1271–1276.