Robots are often relegated to the realm of fantasy. But Minerva, an interactive tour-guide robot, which was successfully exhibited in the Smithsonian Museum, has brought robotics to everyday life.
Robots: Cuddly or Deadly?
From the terrifying annihilators of Terminator II to the cute, artificial creature of Short Circuit, contemporary science fiction has done so much to shape people’s conception of robots that it may be difficult to assess the real possibilities of robots in everyday life. That is why you might be pleasantly surprised by Minerva, a robot designed by a research team from Carnegie Mellon with the purpose of assisting people in public places. Specifically, she exists in Smithsonian’s National Museum of American History. She actively approaches people, offers tours, and guides them from exhibit to exhibit. She even entertains by singing or smiling when she is happy. But just as a person who is temporarily blinded or blocked by another person’s hand or body would become frustrated after a while, Minerva too becomes frustrated when someone blocks her way and responds by honking her horn. Her combination of abilities makes her a pioneering example of a successful autonomous robotic tour guide in a crowded, public environment. By examining her ability to learn maps and her human-robot interface, one can see that Minerva is a prime specimen to demonstrate the growing relevance of autonomous mobile robotics in everyday life.
Navigation
One primary task of any tour guide, robot or human, is clearly to navigate. In fact, navigation is required for any mobile agent. Knowledge required for navigation can be broken into two distinct components: knowledge of current position and knowledge of how to get to a desired position from a current position. The knowledge of current position carries the name of localization while the knowledge of how to get to a desired position from the current position is mapping. Also, in building a robust, autonomous robot, adaptation is a necessary ingredient [1]. In the context of navigation, adaptation can come in the form of the robot’s ability to react appropriately to a wide and diverse range of situations. In implementing such flexibility, the robot must also be as independent as possible of its own physical deficiencies in perception.
In the case of Minerva, these perceptive shortcomings include limits on the range of her lasers and sonar range-finders, which do not allow the robot to “see” anything to use as a reference point in sufficiently wide-open spaces [2]. Minerva’s odometer also collects errors fairly quickly over long ranges, eventually causing dramatic errors in the robot’s belief of its position by pure dead reckoning. Even cameras can fail in regions that lack sufficient visual structure, such as blank walls or ceilings [2]. Therefore, in obtaining localization and mapping information, Minerva should be as independent as possible of her dead reckoning and sonar data. The presence of many moving people immensely complicates the perceived environment. The robot must dynamically be able to differentiate what is an environmental feature and what is a human being, a task that requires considerable adaptation. All in all, adaptation is key.
Learning to Navigate
In fact, she should try to learn for herself where she is and how she should get to where she intends to be. She should dynamically create her own map information. However, since Minerva does not accept any of her sensory information as absolutely accurate, and in fact takes minimal perceptory accuracy for granted, there is significant uncertainty in her learning. Probability is the most appropriate analytical tool to use for problems involving uncertainty. The most well-known and effective localization algorithms about uncertainty are the probabilistic Markov localization algorithms [3]. Many of these algorithms are passive. That is to say, while figuring her position, the robot using passive localization does not re-adjust herself to optimize the angle at which she can obtain information.
Minerva, however, uses active Markov localization, which allows her to re-adjust herself at any point during the localization process [2]. This gives her greater independence from her physical situation and thus generates more autonomy and robustness. In conjunction with this qualitative statement, empirical evidence also shows that active Markov localization significantly improves robotic navigational performance in comparison to passive Markov localization (Fox and Burgard).
In addition to utilizing the robust tool of active Markov localization, Minerva also takes advantage of a very traditional human navigational technique: coastal navigation. Just as being close to land allows sailors to determine with high accuracy where they are, mobile robots take into account the density of certain features (such as walls) to determine with greater accuracy where they are in order to reduce the probability of getting lost. Studies demonstrate that this coastal navigation technique also optimized Minerva’s performance [2].
To address the problem of actual mapping, Minerva also uses active probabilistic methods in learning her maps so as to maximize her self-sufficiency and adaptability (see Fig. 1). It should be noted that mapping is not independent of localization, as errors in odometry have to be corrected in the process of building a map [4], especially occupancy maps. These maps are based on sonar and odometry data as well as the approximate error in both types of data. The EM algorithm alternates localization and mapping to find the most probably occupancy configuration (i.e. layout of occupied space, including location of walls) [4]. However, due to the size and density of people in the museum, who also of course seem to be “occupied space” via sonars, the occupancy maps are further enhanced by the texture maps of the ceiling obtained from a camera pointed to the ceiling [4]. Since it is less likely that a person will block the robot from directly above, the texture maps of the ceiling eliminate the difficult-to-solve global alignment problem created by blockage by people [4].
People skills
We have seen that the methods Minerva uses in localization and mapping are theoretically optimal in allowing her to adapt to a complex human environment and that this theoretical optimality has been supported by empirical studies. Interestingly, the most significant demonstration of Minerva’s navigational robustness in real, human environments is quite direct. During two weeks of operation, she traveled more than 44 km at speeds of up to 163 cm/sec amidst thousands of people without injuring anybody and without getting lost or misleading people [4].
In this remarkable and unprecedented accomplishment, Minerva’s human interface must be given its due credit. The fact is that Minerva’s brother project, Rhino, built by the same group of researchers and installed as a tour guide for Deutsches Museum Bonn, possessed very similar control and navigation software. Yet Rhino was not quite as successful a tour guide as Minerva [1]. This is largely attributed to Rhino’s lack of Minerva’s human-robot interaction abilities [1]. Among these abilities include Minerva’s dynamic learning of behaviors that attract people via her 201 attraction experiments [1]. Included among these experiments is her ability to utter human sentences and simple natural language processing. Very importantly, she has the ability to express her intent and “emotions” via a simple finite state machine (i.e. a simple flow-chart logic) and her physical attributes, which may be the most basic and essential contributor to her success in human interaction. Indeed, the most direct and conspicuous level of the human-robot interface is in the form of physical attributes of the robot. If the human-robot interface is at all important in the robot’s task of assisting people, then the specifications of the physical attributes are quite important as well. It turns out that this human-robot interface is of great significance to the tasks of a tour guide agent: two primary tour guide tasks are attracting people and engaging people, both of which are heavily interactive duties [1]. The approach of the Minerva project team in implementing effective interaction was to create a system which acts in a believable manner while interacting with people in the given context of spontaneous short term interaction [1]. To implement this believability, one aspect that needs to be considered is the focal point, and another is emotional state. The focal point of Minerva, much like the face of a human being, is the single location upon which people may focus their attentions during interaction with the robot. In implementing believability through the focal point design, Minerva’s architects attempted to present as recognizable and intuitive an interface as possible: the caricature of a human face [1].
Since Minerva is one of the first experiments in robotic interaction with novice users (e.g. the general public), the design team deduced their own specifications for their robotic face, based purely on the intent of making the face humanlike and capable of communicating intent [1]. Only elements necessary for the degree of expression appropriate a tour guide robot were used. Since a simple face consisting of two eyes with eyebrows and a mouth is both generally recognizable and can portray the range of emotions relevant to tour guide interaction, such a layout was chosen for Minerva [1].
Because three-dimensional objects require intelligent control, while flat moving images most likely result from the playback of a stored sequence, it was determined that a physically implemented movable face would be more believable and appropriate than a flat display. This, combined with the observation that a three-dimensional head can be viewed from many angles, spurred Minerva’s designers to implement a three dimensional rotatable and tiltable head with face of four degrees of freedom (one for each of the eyebrows and two for the mouth) [1]. It was via this implementation of her focal point that Minerva was able to seem autonomous and express emotions/intent with her movable eyebrows and lips that are able to smile. This implementation proved to be effective: as evaluated by children under ten years old, Minerva’s expressiveness fell somewhere in between a monkey and a human being [1].
The future
Minerva is a useful robot that navigates accurately and efficiently in a complex human environment, is friendly with and attracts people, and is so believable that it is compared to a human being by children. Through Minerva’s performance we see that it is quite possible for robots to successfully integrate into the day-to-day world of humans. Research is continuously being done to improve the probabilistic and active learning techniques used by Minerva. While one might not need to venture so far as to blindly accept the prophecy of Hans Moravec of Carnegie Mellon’s Mobile Robot Lab, some predict that we have “fifty years, tops, until the robots exceed us” [5]. It is, nevertheless, clear that one way or another, with practical models such as Minerva or PicaBot (Fig. 2) to guide the way, robots will become increasingly relevant in our daily lives – in reality, not just fantasy.