Dialogic RSA: A Bayesian Model of Pragmatic Reasoning in Dialogue Kenneth Nichols Australian National University
Introduction • Dialogic RSA (dRSA) is an instantiation of the Rational speech acts (RSA) model of (Frank & Goodman, 2012; Goodman & Stuhlmuller, 2013), enriched with a simple model of turn taking and context: one that grounds pragmatic inference within the structure of repeated games, game structures that contain multiple sequences of events. • The resulting model allows RSA-like agents to solve sequential reasoning tasks containing multiple ordered subgoals, using (potentially) vague and ambiguous language as the only input. dRSA offers a principled way of formalising pragmatic inference tasks that cannot be straightforwardly described using non-sequential or ‘one-shot’ games.
Why does order matter in dialogue? In everyday dialogue, the order in which utterances are produced and encountered matters. If I (speaker A) was trying to teach you (speaker B) how to bake a cake, the following dialogue would be sub-optimal since it 1. does not correctly communicate how to bake a cake, and 2. would also fail to solve the underlying task at hand. (1)
• Reference games are reasoning tasks designed to study cooperative linguistic interaction (Lewis, 1969). In the baseline RSA model, a complete specification of the world is contained within the structure of one-shot games.
Experimental paradigm: Treasure Hunt • Treasure Hunt is inspired by existing work on the sequential dynamics of dialogue, namely the Cards Game (Potts, 2012; Djalali, Clausen, & Lauer, 2011). • The task is intended to be simple enough for artificial agents to learn, and intuitive enough for humans to play, allowing for straightforward statistical and empirical verification of the model. • As in baseline RSA, participants do not always reliably, predictably, or efficiently convey information about the goal of the communication in their utterances, and the listener must account for this through pragmatic enrichment. • We model a dialogic speaker N and dialogic listener L, who reason about a literal speaker and listener respectively.
• A single game of Treasure Hunt contains four separate episodes which are concatenated to form a dialogue.
Experimental results
• Episodes are analogous to turns in natural language dialogue, as each subsequent episode is intended to bring the task at hand closer to completion by contributing new information or updating past information
We consider the board configuration in figure 1b. The bar chart in figure 2 shows predictions that baseline RSA and dRSA make for the set of all sequences of utterances that can correctly predict the switch order in figure 1a.
0.3
JleftK JrightK JtopK JbottomK JmiddleK Jtop leftK Jtop rightK Jbottom LK Jbottom RK
G j ≤5 G j ≥5 G i ≥5 G i ≤5 G{i≥3,≤7},{ j≤3,≥7} Gi≤5,j≥6 Gi≤5,j≤5 Gi≥6,j≥6 Gi≥6,j≤5
ˆ that communicates the switch • The Task of the dialogic speaker is to find a set of utterances D, order to L with minimal ambiguity.
Figure 1: Figures 1a and 1b depict how the Gridworld environment is presented to dialogic agents during experimentation. Figure 1c gives three of the nine possible Gridworld clusterings. Figure 1d displays the semantic values for each grid cluster.
dRSA
0.1
• The agent does this by extracting structure from a stochastic utterance production process N, namely how utterances relate to one-another as a sequence, rather than just individual utterances. • The agent uses this structure to steer the free evolution of N in a more desired direction.
TL.TL.TR.M
TL.TL.TR.B
TL.TL.T.M
TL.TL.T.B
TL.T.TR.M
TL.T.TR.B
TL.T.T.M
TL.T.T.B
TL.M.TR.M
TL.M.TR.B
TL.M.T.M
TL.M.T.B
T.TL.TR.M
T.TL.TR.B
T.TL.T.M
T.TL.T.B
• This is a measure of the dependent information between each utterance in the sequence, which taken as a whole, forms the predictive information of N.
T.T.TR.M
0.0
Enumeration of all possible dialogues
Box. 2 Figure 2: Predictions of baseline RSA compared to dRSA (α = 1) in the Treasure Hunt task. The black bars represents uniform utterance selection over all truth values.
How do we construct the optimal sequence of utterances? • We use the information bottleneck method for constructing dialogue sequences. • It works by constraining the free evolution of the stochastic process, by finding a minimally sufficient statistic, namely the predictive information of the stochastic process N. • Minimising the first term represents the desire to find a concise dialogue that reduces ambiguity. Maximising the second term represents the desire to maximise the ability to predict the terminal goal.
(d) The vocabularly V and denotational semantics C
(c) Jtop leftK, JBottomK, and JMiddleK
RSA
T.T.TR.B
C
lit
T.T.T.M
(b) Gridworld as seen by L.
legend
0.2
T.T.T.B
1 2 3 4 5 6 7 8 9 10
V
1 2 3 4 5 6 7 8 9 10
i
T.M.TR.M
10 9 8 7 6 5 4 3 2 1
Treasure Hunt rules
(5)
?
?
(a) Gridworld as seen by N
(4)
PL (Sˆi | u
Optimal utterance selection over a short time horizon
1 2 3 4 5 6 7 8 9 10
PN (ui | S
Finally, we have the dialogic listener function, L, given as equation 5. The pragmatic listener infers the possible meaning of u, by simulating the generative process that the speaker underwent when making the choice to utter u.
• We model an information asymmetry between L and N by assuming that L has an occluded perspective of the world.
?
(3)
i
T.M.TR.B
T
dRSA seeks to model the problem of choosing the correct order of utterances by ensuring that agents learn how each utterance relates to each other relative to a well specified shared task, over a finite (short) time horizon. To clarify this problem, consider the arrangment of the world in figure 1a, and then consider why the sequence “TL, M, TR, M” might mean something different from “T, TL, TR, B”.
• Both interlocutors share the same vocabulary and denotational semantics of utterances. We denote the shared vocabulary as V = {w1, ..., wn }. The words in V map to objects in the context, the environment where the reference game takes place, denoted C = {s1, ..., sn }.
?
PL0 (Si | ui ) ∝ δJui K(Si ) · P(Si )
Next in the pipeline is the dialogic speaker N. Intuitively, the dialogic speaker function encodes the pressure for a speaker to be informative at the current state of the dialogue, by considering what it uttered in the past (S
T.M.T.M
S3
A: Add the flour and water B: Uh huh A: Bake for 40 mins B: Yep A: Add the eggs and sugar. B: huh..?
T.M.T.B
S2
10 9 8 7 6 5 4 3 2 1
a. b. c. d. e. f.
The dRSA model begins with the literal listener function, L0, that receives an utterance ui as input at the beginning of each episode. The L0 function remains largely unchanged from baseline RSA, which works by recursively enumerating the truth conditions of ui , by querying the context model C and determining what switches lie in the region denoted by ui .
value
10 S1 9 8 7 6 5 4 3 2 1
Dialogic RSA
Box. 1
F = I [ Dˆ p; Dp ] − βI [ Dp; Df ]
(1)
• The predictive information is the the mutual information between the past realisations of N, and future realisations of N that maximise a terminal reward. "
#
P( Df | Dp ) I ( Dp; Df ) = E log . P( Dp )
(2)
References Djalali, A., Clausen, D., & Lauer, S. (2011). Modeling expert effects and common ground using Questions Under Discussion. In Aaai fall symposium series (pp. 10–15). Retrieved from http://www.aaai.org/ocs/index.php/FSS/FSS11/paper/download/4186/4502 Frank, M. C., & Goodman, N. D. (2012). Predicting Pragmatic Reasoning in Language Games. Science, 336(6084), 998–998. doi: 10.1126/science.1218633 Goodman, N. D., & Stuhlmuller, A. (2013). Knowledge and Implicature: Modeling Language Understanding as Social Cognition. Topics in Cognitive Science, 5(1), 173–184. doi: 10.1111/tops.12007 Lewis, D. (1969). Convention. Cambridge: Harvard University Press. doi: 10.1002/9780470693711 Potts, C. (2012). Goal-driven answers in the Cards dialogue corpus. In N. Arnett & R. Bennett (Eds.), Proceedings of the 30th west coast conference on formal linguistics (pp. 1–20). Somerville, MA.