Multiple Categorization by iCub: Learning Relationships between Multiple Modalities and Words* Akira Taniguchi1 , Tadahiro Taniguchi1 and Angelo Cangelosi2 Abstract— Human infants can acquire word meanings by estimating the relationships among multiple situations and words. In this paper, we propose a Bayesian probabilistic model that can learn multiple categorizations and words related to any of four modalities (action, object, position, and color). This paper focuses on a cross-situational learning using the co-occurrence of sentences and situations. We conducted a learning experiment using the humanoid iCub robot. In this experiment, the human tutor describes a sentence about an object of visual attention to the robot and an action of the robot. The experimental results show that the proposed method was able to estimate the multiple categorizations and to accurately learn the relationships between multiple modalities and words.

action

tutor

Fig. 1.

position color

object

grasp green front cup

Overview of the proposed method

I. INTRODUCTION Human infants can acquire word meanings by estimating the relationships between multiple situations and words. For example, if an infant grasps a green cup at hand, the parent may describe the actions of the infant to the infant using a sentence such as “grasp green front cup”. In this case, the infant does not know the relationship between words and situations because the infant has not acquired the word meanings. In other words, the infant cannot determine whether the word “green” indicates an action, an object, or a color. However, we consider that the infant can learn that the word “green” represents the green color by observing the cooccurrence of the word “green” with objects of green color in various situations, e.g., “touch green box” and “green right ball”. This is called cross-situational learning, which has been both studied in children [1] and modeled in simulated agents and robots [2]. In this paper, the goal is to develop a novel method for learning the relationships between words and the four modalities (action, object, color, and position) from the robot’s experience of hearing sentences describing object manipulation scenes. Many studies of language acquisition focus on the estimation of phonemes and words from speech signals [3]–[5]. In this paper, we assume that the robot can recognize spoken words without errors, as this work focuses specifically on 1) the categorization for each modality, 2) the learning of the relationships between words and modalities, and 3) the grounding of words in multiple categories. Peniak et al. [6] performed action learning by a multiple time-scale *This work was partially supported by JST, CREST. 1 Akira Taniguchi and Tadahiro Taniguchi are with Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu, Shiga 525-8577, Japan {a.taniguchi,

taniguchi} @em.ci.ritsumei.ac.jp

2 Angelo Cangelosi is with the Centre for Robotics and Neural Systems, Plymouth University, Plymouth, PL4 8AA, United Kingdom

[email protected]

recurrent neural network. Attamimi et al. [7] performed an estimation of the relationships among words and multiple concepts by weighting the learned words according to their mutual information as post processing. In this paper, we perform cross-situational learning, including action learning, by a Bayesian probabilistic model. The proposed method can estimate multiple categories and the relationships of words and modalities simultaneously. II. OVERVIEW OF THE TASK Fig. 1 shows an overview of the task. The training procedure consists of the following steps: 1) The robot is in front of the table with objects on it. 2) The robot performs an action on an object. 3) The human tutor utters a sentence about the object and the action of the robot. 4) The robot processes the sentence to discover the meanings of the words. This process (steps 1–4) is carried out many times in different situations. The robot learns word meanings and multiple categories by using visual, tactile, and proprioceptive information, as well as words. III. M ULTIPLE CATEGORIZATIONS AND WORD MEANING LEARNING

Fig. 2 shows a graphical model of the proposed method. The n-th word in the d-th trial is denoted as wdn . An index of the word distribution is denoted as l. The model associates the word distributions θ with categories zdm on four modalities, namely, the action ad , the position pdm of the object on the table, object feature odm , and object color cdm . An index of the object of attention selected by iCub from the multiple objects on the table is denoted as Ad = m. The sequence representing the respective modalities associated

λ

Fd

θl

wdn ܰ

touch grasp lookat reach

γ

Word distribution

βa

Action category

βp

Position category

βo

Object category

βc

Color category

‫ܮ‬

Ad

αa

πa

z da

φ ka

ad

‫ܭ‬௔

αp

πp

p z dm

φ kp

p dm

‫ܭ‬௣

αo

πo

o z dm

φ ko

odm

‫ܭ‬௢

αc

πc

c z dm

φ kc

cdm ‫ܯ‬ ‫ܦ‬

‫ܭ‬௖

far

Fig. 3.

Fig. 2. The proposed graphical model; the action, position, color, and object categories are represented by a component in Gaussian mixture models (GMMs). A word distribution is related to a category on GMMs.

front

right

box

ball

cup green

red

blue

Word probability distribution across the multiple categories

Index of position category

with each word in the sentence is denoted as Fd , e.g., Fd = (a, p, c, o). The number of categories for each modality is K. The number of words in the sentence is N . The set of word distributions is denoted as θ = {θl=(F ,zFdn ) | Fdn ∈ dn

left

a0 a1 a2 a3 a4 p0 p1 p2 p3 p4 o0 o1 o2 o3 o4 c0 c1 c2 c3 c4

p0

dm

Fdn {o, c, p, a}, zdm ∈ {1, 2, ..., K Fdn }}. The action category a ϕk , the position category ϕpk , the object category ϕok , and the color category ϕck are represented by a Gaussian distribution. The hyperparameter λ of the Uniform distribution has the constraint that each modality exists only once in the sentence. The hyperparameter of the mixture weights π is denoted as α. The hyperparameter of the Gaussian-inverseWishart distribution is denoted as β. The hyperparameter of the Dirichlet distribution is denoted as γ. Parameters of the proposed model are estimated by Gibbs sampling. The learning algorithm is obtained by repeatedly sampling from the conditional posterior distributions for each parameter.

IV. E XPERIMENT We conducted an experiment on the learning of categories for each modality and words associated with each category. The number of action trials was D = 20 for learning. The number of objects M on the table for each trial was a number from one to three. The number of words in the sentence was N = 4. We assume that a word related to each modality is spoken only once in each sentence. The word order for each category was Fd = (a, p, c, o) in all of the sentences. This experiment used 14 words. The number of categories for each modality was K = 5. Fig. 3 shows the word probability distributions θ. Higher probability values are represented by darker shades. Fig. 4 shows the learning result for the position category ϕp . Fig. 5 shows part of the examples of categorization results for the object and color categories. Fd was correctly estimated in the all of the data. The results demonstrate that the proposed method was able to accurately associate each word with its respective modality.

p1 p2

p4

Fig. 4. Learning result of position category; for example, the index of position category p1 corresponds to the word “left”. A point group of each color represents each Gaussian distribution of position category. The crosses of each color represent the object positions of the learning data. The each color represents the each position category.

words and multiple modalities. The experimental results showed that it is possible for a robot to learn the combination of an element in a complex situation and a word from their co-occurrence. In the experiment, there were four words within an uttered sentence, however, we consider that the proposed method can learn word meanings from more uncertain sentence, i.e., the sentence could have four words or less with a changing order. The proposed method did not take into account a grammatical information and the sentence of five words or more, which we will try to address in our future studies. In ongoing work, our focus will be on new experiments with action generation tasks and action description tasks. In addition, we plan to consider integrating the proposed method into Bayesian probabilistic models of object cono0

o3

c0

c3

o1

o4

c1

c4

o2

c2

V. C ONCLUSION In this paper, we have proposed a learning method that can estimate multiple categories and the relationships between

(a) Object Fig. 5.

(b) Color

Part of the examples of categorization results

cepts [8] and spatial concepts [9]. R EFERENCES [1] K. Smith, A. D. Smith, and R. A. Blythe, “Cross-situational learning: An experimental study of word-learning mechanisms,” Cognitive Science, vol. 35, no. 3, pp. 480–498, 2011. [2] J. F. Fontanari, V. Tikhanoff, A. Cangelosi, R. Ilin, and L. I. Perlovsky, “Cross-situational learning of object–word mapping using neural modeling fields,” Neural Networks, vol. 22, no. 5, pp. 579–585, 2009. [3] D. Roy and A. Pentland, “Learning words from sights and sounds: A computational model,” Cognitive Science, vol. 26, no. 1, pp. 113–146, 2002. [4] S. Goldwater, T. L. Griffiths, and M. Johnson, “A Bayesian framework for word segmentation: Exploring the effects of context,” Cognition, vol. 112, no. 1, pp. 21–54, 2009. [5] T. Taniguchi, R. Nakashima, H. Liu, and S. Nagasaka, “Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals,” Advanced Robotics, vol. 30, no. 11-12, pp. 770–783, 2016. [6] M. Peniak, D. Marocco, J. Tani, Y. Yamashita, K. Fischer, and A. Cangelosi, “Multiple time scales recurrent neural network for complex action acquisition,” in ICDL/Epirob Joint Conference, 2011. [7] M. Attamimi, Y. Ando, T. Nakamura, T. Nagai, D. Mochihashi, I. Kobayashi, and H. Asoh, “Learning word meanings and grammar for verbalization of daily life activities using multilayered multimodal latent Dirichlet allocation and Bayesian hidden Markov models,” Advanced Robotics, vol. 30, no. 11-12, pp. 806–824, 2016. [8] T. Nakamura, T. Araki, T. Nagai, and N. Iwahashi, “Grounding of word meanings in latent Dirichlet allocation-based multimodal concepts,” Advanced Robotics, vol. 25, no. 17, pp. 2189–2206, 2011. [9] A. Taniguchi, T. Taniguchi, and T. Inamura, “Spatial concept acquisition for a mobile robot that integrates self-localization and unsupervised word discovery from spoken sentences,” IEEE Transactions on Cognitive and Developmental Systems, 2016.

Learning Relationships between Multiple Modalities and Words

*This work was partially supported by JST, CREST. 1Akira Taniguchi and Tadahiro Taniguchi are with Ritsumeikan Univer- sity, 1-1-1 Noji Higashi, Kusatsu, Shiga 525-8577, Japan {a.taniguchi, taniguchi} @em.ci.ritsumei.ac.jp. 2Angelo Cangelosi is with the Centre for Robotics and Neural. Systems, Plymouth University ...

947KB Sizes 1 Downloads 229 Views

Recommend Documents

Learning Relationships between Multiple Modalities and Words
that can learn multiple categorizations and words related to any of four modalities (action, object, position, and color). This paper focuses on a cross-situational learning using the co-occurrence of sentences and situations. We conducted a learning

RELATIONSHIPS BETWEEN BLOOD·SERUM ...
210 portable ultrasound device (Corometrics. Medical Systems, Inc., Wallingford, CT). A lower canine tooth was ... Advanced Telemetry Systems (Isanti, MN). At the completion ofhandling, immobiliza- tion was reversed with an .... More samples over a w

Exploring relationships between learners' affective ...
Stimuli and Software ... When arousal is moderate, valence is expected to be predictive of learning ... Learners' JOLs are typically predictive of overall learning.

1977_Further relationships between IPAT anxiety scale ...
1977_Further relationships between IPAT anxiety scale performance and infantile feeding experiences..pdf. 1977_Further relationships between IPAT anxiety ...

Phylogenetic relationships between the families ...
turale des modifications adaptativcs iiu cours du cycle et dcs relations phyletiques. These d'Etat, UniversitC de Perpignan. France. Justine, J.-L. 1983. A new ...

Exploring relationships between learners' affective ...
Fifty undergraduate students from a southern public college in the U.S. participated in this experiment. .... San Diego, CA: Academic Press (2007). 8. Zimmerman ...

Spatial relationships between cacti and nurse shrubs in ...
found differences of more than 30 "C between outside and under the canopy of ... consideration that cacti are succulents with CAM me- tabolism, which, during ...

Relationships between Water, Otolith, and Scale Chemistries of ...
Abstract.—We quantified Mg:Ca, Mn:Ca, Sr:Ca, and Ba:Ca molar ratios from an area representing the summer 2000 growth season on otoliths and scales from 1-year-old westslope cutthroat trout. Oncorhyncus clarki lewisi collected from three streams in

Threshold Cointegration Relationships between Oil and ...
France, the United States of America, Mexico and the Philippines. The stock ..... North American Journal of Finance and Banking Research 1, n°1, 22-36. Balke ...

Relationships between forest structure and vegetation ...
intensive and global/wide studies, providing useful information for decision ... tions of several spectral values that are mathematically recombined in such a way as .... and remote sensing image processing system with an object-oriented data ...

Relationships between oceanic conditions and growth ...
dramatically different along the Alaska Current, Cal- .... each return year from the terminal fisheries of Situk and Taku Rivers, Alaska, and Skagit River, ...

Relationships between exercise and three components ...
relatively similar to that of the whole workforce (app. 76% males and 24% .... model (see Fig. 1). The data were analysed using maximum likelihood analysis.

The boundary between learning and e-learning
the society in they live! References. 1. Constantin Mantea, Mihaela Garabet, Fizica, Ed. All, Bucuresti, 2004. 2. Tom Savu, Neacşu Ion, Grigorescu Ştefan, Garabet Elena Mihaela, Babele instrumenta iei virtuale LabVIEW, Ed. Atelier Didactic, BucureÅ

Learning and the Value of Trade Relationships
Nov 6, 2017 - recovery is slower in Germany than in the United Kingdom. .... lationships,” Working Paper 14-08, U.S. Census Center for Economic ...... See Schmidt-Eisenlohr (2013) for a model of payment choices with positive interest rates.

Learning and the Value of Trade Relationships
Most trade (that we can track) is in long-term relationships. Table: U.S. Arm's-Length Imports, 2011 ..... again next period. The posterior probability that a supplier is patient after buying from them for k periods is: θk = ̂θ ... (1 − θk ) λ

4C Relationships between Living Things worksheet.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

unequal attendance: the relationships between race ...
tunately, the data these authors examined offered little insight regarding this difference ..... descriptive information, the participants were asked to indicate the race .... ing agriculture/forestry/fishing, mining or oil and gas extraction, utilit

An Investigation of the Relationships between Lines of ...
We measure software in order to better understand its ... the objectives of software metrics. ... For example, top 10% of the largest program account for about.

Relationships between degradability of silk scaffolds ...
May 23, 2010 - impacts the rate of new collagen-extracellular matrix formation by ... collapse, hindering mass transfer and leading to necrosis [19]. If the.

Learning and the Value of Trade Relationships
Nov 6, 2017 - 1In related work, Antr`as and Foley (2015) reveal that learning also plays a key role for the dynamics of .... international real business cycle model. Several papers ... operations are classified as wholesale or retail are also dropped