The Many-Facet Rasch Model in the Analysis of the Go/No Go Association Task

Michelangelo Vianello (*) Egidio Robusto (*)

(*) Department of Applied Psychology, University of Padova

Corresponding author: [email protected]

Keywords: Many-Facet Rasch Model; Implicit Association Test; Go/No Go Association Task

2 The Many-Facet Rasch Model in the Analysis of the Go/No Go Association Task

Abstract

This paper provides a Many-Facet Rasch Measurement (MFRM) analysis of Go/No Go Association Task (GNAT)-based measures of implicit attitudes toward sweet and salty food. We describe the statistical model and the strategy we adopted to score the GNAT and we emphasize that, when analyzing implicit measures, MFRM indexes have to be interpreted in a peculiar way. In comparison with traditional scoring algorithms, an MFRM analysis of implicit measures provides some additional information, and suffers from fewer limitations and assumptions. MFRM might help to overcome some limitations of current implicit measures, since it directly addresses some known issues and potential confounds, such as those related to a rational zero point, to the arbitrariness of the metric, and to participants’ task-set switching ability.

3

During the last 20 years, research on implicit methods has increased exponentially. Major implicit (or indirect) techniques include Evaluative Priming (EP, Fazio, Sambomatsu, Powell & Kardes, 1986) and the Implicit Association Test (IAT, Greenwald, McGhee & Schwartz, 1998), which has been followed in rapid succession by many other tests, such as the Go/No Go Association Task (GNAT, Nosek & Banaji, 2001), the Extrinsic Affective Simon Task (EAST, De Houwer, 2003), the Affect Misattribution Procedure (AMP, Payne, Cheng, Govorun & Stewart, 2005), the Single Category-IAT (SC-IAT, Karpinski & Steinmen, 2006) and the Sorting Paired Features (SPF, Bar-Anan, Nosek & Vianello, 2009). Although these techniques may imply very different procedures from one another, they all share the aim of circumventing the influence of corrective processes involved in explicit measures (e.g. questionnaires), which may be heavily influenced by social desirability or impression management strategies. Indeed, implicit techniques do not rely on introspection. On the contrary, they provide behavioral measures of association strengths among mental representations, and they all rely on the assumption that the processing of a stimulus increases the accessibility of associated concepts (Higgins, 1996). For example, in an IAT for measuring implicit racial bias (one of the most common applications), participants categorize words into superordinate categories in two different sorting conditions. In one condition, participants categorize items representing “whites” (e.g., faces of white people) and “good words” (e.g., “good”, “beautiful”) with one response key, while categorizing items representing “blacks” (faces of black people) and “bad words” (“bad”, “evil”) using another response key. In the other condition, participants categorize the same stimuli, but in different pairs: “white” and “bad” items are categorized with one key while “black” and “good” items with the other. The first condition (white-good) is typically easier than the second

4 (white-bad, see Nosek et al., 2006). The individual difference in speed and/or accuracy between conditions is interpreted as a measure of participants’ implicit preference for whites over blacks, which has often been interpreted as a measure of implicit prejudice. Although existent measures of association strengths use distinct procedures and may tap different associative processes, they all derive their evaluations from comparisons between participants’ performance at different categorization or recognition tasks. For instance, individual scores at a Race-EP are obtained by comparing responses to targets that were preceded by a stimulus priming the concept “Black” to responses that were preceded by a neutral prime, or by a “White” prime. In the logic of response competition tasks rather than of sequential priming, the GNAT derives individual scores according to signal detection theory (Green & Swets, 1966). Hence, the individual measure of implicit association – which is called sensitivity (d') – is computed by subtracting the standardized proportion of Hits (correct responses to targets) from the standardized proportion of False alarms (incorrect responses to distracters). The d' represents the individual’s ability to discriminate signals (target stimuli) with noise from noise alone (distracter stimuli). All implicit techniques are characterized by specific strengths and limitations. GNAT, EP, SC-IAT, AMP and SPF have an advantage over the IAT in that they provide measurements of implicit associations that are not relative. For example, an IAT on racial prejudice provides a relative measure of participants’ implicit association of white people and good relative to the association between black people and bad. Nonetheless, GNAT and EP are characterized by a notable lack of reliability, both compared with the IAT and with other explicit techniques. The GNAT has values of internal consistency between .1 and .3 (Nosek & Banaji, 2001) while EP has even lower values (Bosson, Swann, and Pennebaker, 2000; Fazio & Olson, 2003; Olson & Fazio, 2003). Furthermore, the d' analysis requires that hit and false alarm rates be neither zero nor 100%, and corrections have to be applied in these cases (Banaji & Greenwald, 1995). In

5 addition, this analysis cannot be applied to participants with an error rate higher than 50%. Lastly, d' values are differential scores, which have been often criticized because of their low reliability (difference scores suffer from a lack of reliability, which is a function of the correlation between the original variables, see Nunnally & Bernstein, 1994 for an analytical demonstration). This study introduces an alternative model to analyze GNAT-based measures of implicit associations. The next section introduces the model.

The MFRM and its main advantages The Many Facet Rasch Model (Linacre, 1989) derives from the Simple Logistic Model (SLM Rasch 1960/1980). Given a response xni to a test, which is one if the response is correct or zero if the response is incorrect; βn the ability of the individual n, and δi the difficulty of the item i, the SLM takes the following mathematical form

P( X ni = xni β n , δ i ) =

exp[ xni ( β n − δ i )] 1 + exp( β n − δ i )

(1)

We can note that the model expresses, as for a logistic regression, the probability of obtaining a certain response as a function of the ability of the individual and of the item difficulty (βn – δi). The more (or less) able the individual is, and the easier (or more difficult) the item is, the more (or less) probable it will be that a correct response will be obtained. If, by using Equation 1, we intend to calculate the probabilities associated with the events “correct response” and “incorrect response”, then we obtain, respectively

6 P( X ni = 1 β n , δ i ) =

exp( β n − δ i ) 1 + exp( β n − δ i )

(2)

P ( X ni = 0 β n , δ i ) =

1 1 + exp( β n − δ i )

(3)

By considering (2) and (3) and calculating the logarithm we obtain

ln

P ( X ni = 1 β n , δ i ) P ( X ni = 0 β n , δ i )

= ln

exp( β n − δ i ) / [1 + exp( β n − δ i )] = βn − δi 1 / [1 + exp( β n − δ i )]

(4)

As it is evident in this formulation, we can introduce further parameters (facets) that all lay on the same trait. It therefore follows, as far as the application of the model to implicit techniques is concerned, that it is possible to introduce a third parameter that accounts for the probability of obtaining a given response on any given association task. This parameter is defined as the “condition of association” (γj), which assumes a different value for each different task (often called critical blocks) that is analyzed in the same model. For instance, in a 2-block GNAT measuring implicit prejudice toward black people one critical block (condition of association) would ask the participant to identify stimuli representing black people and good words (j = 1) and another one would ask to identify black people and bad words (j = 2). In case the GNAT employed four critical blocks, j = 3 would represent, e.g., the block white people and good words and j = 4 would represent, e.g., white people and bad words. Hence the model assumes the following 3-facet formulation

ln

P ( X nij = 1 β n , δ i , γ j ) P ( X nij = 0 β n , δ i , γ j )

= βn − δi − γ j

(5)

7

The Rasch model parameters are additive, fully satisfying one of the essential requisites of interval measures, and they are based on the transformation of scores into a logit scale, a logarithmic transformation of the probability of producing a particular response given certain conditions (participants’ ability, stimuli recognizability, and conditions’ difficulty). In Equation 5, the logit can be seen as the dependent variable, whereas the various factors (for example participants, items and conditions) act as independent variables that influence (or control) the response. The MFRM is a member of the Rasch family of models, therefore it is characterized by specific objectivity (or relational invariance), linearity, and measurement units (for a discussion of these properties see, for example, Andrich, 1988). Specific objectivity (SO) is one of the most interesting features of Rasch models for the analysis of implicit measures. SO postulates that within the same frame of reference, the comparison between objects (e.g. the difficulty of conditions of association) should be independent from other objects (e.g. subjects’ ability). A detailed explanation and proof of SO is provided in the Appendix. Sufficient Statistics. In Rasch models, the marginal sums of the data matrix are sufficient statistics to estimate the parameters. In case of Equation 5 – in a participant × conditions × item data matrix in which the response of subject n to item i in condition j is xnij, and where xnij has values 0 or 1 if, respectively, a wrong or correct answer is given – the sufficient statistics for parameter estimates are:

∑∑ i

j

xnij for participants ,

∑∑ n

j

xnij for stimuli, and

∑∑x n

i

conditions. Hence, sufficient statistics can be interpreted as accuracy scores. In summary, there are many advantages to using the MFRM to analyze association strengths: 1. All the facets lay on a common dimension of categorization accuracy; 2. As a

nij

for

8 consequence of SO, the measures obtained by the model are sample-, item- and condition-free (hence any parameter estimate can be compared with any other); 3. specific goodness-of-fit statistics allow us to assess the fit of the data to the model, and they help us to interpret the results; such statistics allow an examination of the data that is not only general or comprehensive, such as that available from internal consistency statistics, but also evaluates the fit of each single item, participant or association condition; 4. the experimental procedure can be limited to the simple registration of correct or incorrect response, and promote interval measures that express the latent trait; 5. the estimation of the parameters and the calculation of the associated measurement errors provide a simple and direct means of determining the significance of the differences between the experimental conditions, which will then represent individual and group-level measures of implicit associations. In the following section we will provide a MFRM analysis of a GNAT study, highlighting how this model helps us to answer important research questions regarding, for instance, the significance of group- and individual-level scores of implicit association, the quality of the stimuli utilized; potential confounding factors (e.g. taskset switching ability), sample size’s appropriateness, and many others.

Method

Participants, Materials and Procedure The study sample consisted of 60 psychology undergraduates from the University of Padua that participated in the study for no reward. The experimental procedure (software Inquisit) provided a GNAT for the evaluation of participants’ associations among sweet food, salty food, good words and bad words (evaluative attributes).

9 The GNAT is a single-category association task, in which participants are asked to “catch” words or images belonging to two categories (attribute and target) by pressing the space bar (Go) and to ignore (No Go) all other stimuli (distracters). In one of the two critical blocks (conditions of association), participants were asked to press the space bar if the stimulus in the centre of the screen belonged to one of two categories indicated under the stimulus (“Sweets” and “Good words”), and to do nothing if the stimulus did not belong to either of these two categories. Distracters were bad words and images of salty food. In a second critical block, participants were asked to press the space bar if the stimulus belonged to the categories “Sweets” and “Bad words”. Distracters in this block were good words and images of salty food. The GNAT effect is based on the individual difference in performance (accuracy) between these two critical blocks. Two more critical blocks were provided to evaluate the implicit attitude toward salty food. Errors were followed by a red cross. Stimuli appeared in black on a white background. Figure 1 provides all stimuli used in the procedure. In this figure, upper case labels (e.g. BEAUTIFUL) refer to words and lower case labels (e.g. Salty1.jpg) refer to photographs of actual products. The fixation point lasted 400 ms., the response windows lasted from 500 to 650 ms. for distracters and from 700 to 850 ms. for target stimuli. Following a three-wave longitudinal design, participants completed a first GNAT in time 1, a second GNAT after a week (time 2) and a third GNAT a month after time 1. The MFRM and the analysis of the GNAT We adopted a four-facet Rasch Model. The dependent variable “accuracy” is represented by the value 0 in the case of an error, and by the value 1 in the case of a correct response. The parameters β, δ, γ, and τ represent the location on the latent trait of participants, stimuli, conditions of association, and time when the measures were taken, respectively. The four

10 conditions asked participants to identify sweet food and good words, sweet food and bad words, salty food and good words, and salty food and bad words.

Results Data analysis was completed using Facets v. 3.66.1 (Linacre, 2009a). All fit indexes of the model were satisfactory. The data log-likelihood chi-square is L2(26217) = 19054.16 (p > .99). L2 is an index of global fit that provides a test of the divergence between observed and expected scores with degrees of freedom equal to the number of observations less the number of free parameters (Fisher, 1970). This statistic often shows significant misfit, especially when the d.f. is high. Moreover, when analyzing automatic association data, participants are expected to perform differently across the conditions (i.e. critical blocks), and this situation increases the global misfit represented by the log-likelihood chi-square. For this reason, Infit and Outfit indexes are more useful. Infit and Outfit represent the relationship between observed and model derived response probabilities and they have a range that goes from zero to infinity. Statistics equal or near 1 indicate perfect correspondence between observed and expected values; statistics above 1 indicate the presence of greater variance than that modeled (Noise); statistics below 1 indicate the existence of lower variance in the data than that predicted by the model (Muting). Infit/Outfit values higher than 2 signal the presence of serious distortions in the data; between 1.5 and 2 indicates the presence of distortions that do not, however, bias the overall goodness-of-fit of the measurement system; between .5 and 1.5 indicates a good fit of the data to the model; less than .5 signals the presence of distortions capable of artificially inflating reliability measures and internal consistency without altogether biasing the overall goodness-of-fit of the measurement system (Linacre, 2009a; Linacre & Wright, 1994). The difference between the Infit and the Outfit values derives from the way in which such statistics are calculated. Both are based on the

11 differences, calculated for all responses to all stimuli for each participant, between observed responses and model-derived response probabilities. These residuals ( Rnij = E nij − X nij ) are given by the difference between the model-derived expected scores ( Enij ), and the observed scores ( X nij ), and they can be standardized by dividing them by the square root of the variance of the expected scores ( VEnij ). Details on how Enij and VEnij are computed can be found in Myford and Wolfe, 2003. The Outfit statistic is a mean of the squares of the standardized residuals (that is, a mean of the variances); while the Infit statistic is calculated by weighting each squared standardized residual by the variance of: a) each participant, if the statistic regards an item or elements of other facets; b) each item, if the statistic regards a participant. For example, the Outfit of a participant is computed according to Equation 6:

J

Outfit =

I

∑∑ Z j =1 i =1

JI

2 Rnij

(6)

where Z Rnij is the standardized difference between model derived expected scores and those observed in the data (residuals); J is the number of conditions in the analysis; I is the number of stimuli. The Infit for the same participant is weighted in order to give less importance to extreme scores, and is computed according to Equation 7:

12 J

Infit =

I

∑∑ Z j =1 i =1 J I

2 Rnij

VEnij

∑∑ V j =1 i =1

(7)

Enij

Consequently, while the Outfit statistic places greater emphasis on the residuals associated with responses that are farther from the measure of a given element, the Infit statistic gives greater emphasis to those responses that are nearest to the measure of a given element (Bond & Fox, 2001; Wright & Masters, 1982). In our data, for the facet association conditions, the Infit ranges from .95 (Salty+Good) to 1.02 (Salty+Bad). For the time facet, Infit ranges from .98 to 1.02; for the participants facet, Infit values range from .94 to 1.05 and for the items facet, Infit values range from .95 to 1.08, largely within the acceptable range. According to Smith (1996, 2002), we can interpret this result as evidence of unidimensionality of the latent trait. Because previous studies have found that positive and negative formulations of items sometimes load on different dimensions, we deemed it important to provide further evidence of unidimensionality. Hence, following Linacre (2009b), we estimated individual abilities (βn) at the categorization task in two different models. The first one included only positive words, and the second one included only negative words (alternatively, the same test could also have been run for sweet and salty food, yet we think the latter would be less relevant). The uncorrected zero-order Pearson correlation between the two series of participants’ abilities is positive and close to perfection (r > .99). Finally, a Principal Components Analysis was run on standardized residuals. The unidimensionality is supported by this analysis as well, as the largest eigenvalue is equal to 2.54. In the estimated model, measurement error (i.e. the SE of the estimates) is quite low, both for the conditions and for the items (between .16 and .03), indicating a good level of measure

13 accuracy. Figure 1 represents how the various elements of the four facets lay on the “categorization accuracy” latent trait. The first column concerns the axis of the latent trait on which the various measures lay, and the values displayed are on the logit scale. The second column shows the difficulty of the stimuli. The third column shows the difficulty of each observation (time), the fourth provides participants’ ability and the fifth column shows the location of the four conditions of associations analyzed.

Insert Figure 1 about here

As far as the conditions facet (see Table 1) a chi-square may be used to test whether all elements of the facets (four in this case) have the same logit value. The Fixed (all same) Chisquare tests the hypothesis that all the elements of the facet have the same logit in the population, in relation to the measurement error (SE). Hence, it helps to reject the hypothesis that there is no group-level implicit association between targets and attributes. An approximation to the theoretical distribution of chi-square can be obtained as indicated in Equation (8),

⎛ γj ⎞ ⎜ ∑ ⎜ j SE ⎟⎟ γ 2j j ⎠ 2 χ =∑ 2 −⎝ 1 j SE j ∑j SE 2 j

2

(8)

where the statistics are calculated for the facet conditions, with j = 1, ..., L and d.f. = L – 1, where L is the number of elements in the facet.

14 Insert Table 1 about here

In this case the chi-square value is 353.10 (with 3 d.f. and p < .001), thus at least one association is significantly different from the others. In Rasch measures, it is possible to compare different logits by dividing their difference by the square root of the sum of their error variance t=

γ1 − γ 2 SE12 + SE22

(9)

The standardization provides a value belonging to the Student’s t distribution, with degrees of freedom equal to the sum of “free” observations of the two elements. In this case, standardized estimates help to compare the four different associations under investigation. In our data, the Salty+Bad association is the most difficult and specifically it is significantly more difficult than the condition Salty+Good (t(1) = 14.60; p < .001). The condition Sweet+Good is more difficult then the condition Sweet+Bad (t(1) = 13.80; p < .001). The conditions Sweet+Good and Salty+Bad (t(1) = 3.00; p = .20) and the conditions Sweet+Bad and Salty+Good share the same difficulty (t(1) = 2.60; p > .23). Hence we observed a positive implicit attitude towards salty food and a negative implicit attitude towards sweet food of approximately the same intensity. In order to further analyze the implicit associations we measured, it is possible to consider three different indexes: the G, H, and R statistics. These are based on the same information, but they highlight different aspects of it. For example, when the elements of the facets are expected to be homogenous then R is most useful, since when it is less than .5 it indicates that any differences of logit can be completely attributed to measurement error (Linacre, 2009a). Alternatively, if differentiation is expected (as is the case when the aim is to discriminate between the abilities of individuals), indexes G and H are more useful.

15 The Separation ratio (G) represents a measure of the difference between the scores obtained by the elements of the facet in relation to their precision (Linacre, 2009a; Myford & Wolfe, 2003). It is expressed as the relationship between the “true” standard deviation (that is the standard deviation of the estimates corrected for measurement error: adj SD = SD – RMSE2) and the average of the standard error of the elements (RMSE). Therefore, G = (adj SD)/RMSE (see Linacre, 2009b for computational details) . The separation ratio (G) is extremely important in the analysis of the critical blocks (e.g. good vs. bad) utilized in the experimental procedures. If only two conditions are included in the analysis their separation ratio is a measure of the mean automatic association effect among participants. The G of the facet conditions can be interpreted as a measure of the sensitivity of the instrument, and therefore it is the first index to look at in, for example, a study where groups are strongly polarized and where the expected value is obviously elevated. The separation of the participant facet is not as important as in traditional tests, where it represents a measure of the resulting discrimination. In classical intelligence and attainment tests we would expect a high person separation value. In the case of implicit measures it is different, because the measure of association is based on a comparison (bias/interaction analysis) between the performance in one condition (e.g. bad) and that in another condition (e.g. good). In implicit techniques, the general level of performance (speed and accuracy of response) is not of direct interest. We could, in theory, obtain an optimal measure of implicit association even without discriminating between participants in terms of their ability in completing the tasks. G for the participants’ facet simply gives us an idea of how difficult the procedure is, and, all things being equal, it is preferable to obtain a measure that is just as difficult for all participants, and therefore we expect low indexes of separation between participants. As far as the facet item is concerned, G provides useful information concerning the degree to which the stimuli represent the trait examined.

16 The second statistic on separation that we describe is the Separation index (H), which is very similar to G, as it defines the number of different groups (heterogeneous between themselves but internally homogeneous) that can be identified within the facet (Wright & Masters, 1982). If the cut-off point is set to three standard deviation points, and standard deviation for measurement error is considered, then H = (4G + 1)/3. H is useful while interpreting the participants and stimuli facets, and less useful for the conditions facet. Indeed, H assumes the estimates being normally distributed (Wright & Masters, 2002). When elements of the facets are too few to run a test of normality, H cannot be computed. The last statistic of this group is Separation Reliability (R), which indicates how well the elements of a facet separate out to reliably represent the facet. It reflects an estimation of the relationship between true scores and true variance, therefore: R = (True SD)2 / (Observed SD)2 = G2 / (1 + G2), where Observed SD is the standard deviation of the estimates (not corrected for measurement error). If R < .5, the value of G (separation) is probably due to measurement error. The expected value is high if homogeneity is expected between the facets and low if separation is expected. For example, in a situation in which several judges rate a series of participants on N factors, it would be expected that R is high for factors and for participants, but low for judges, so that we can say that the factors measure the same dimension, that the ratings have discriminated between the various participants, and that the judges are consistent in their rating (Myford & Wolfe, 2003). In the case of experimental procedures for the assessment of automatic associations, the reliability (R) of the items gives us a measure of their equivalence (or interchangeability). Thus, it is desirable to obtain low reliability indexes for the facet item. In our data, the separation index G of the conditions facet is 12.70, hence the group-level implicit association is very high and reliable (R = .99).

17 Turning to the other facets, participants were centered around a logit value of 0 (SD = .51). Among them, we can find 3 outliers below logit -1.00 (least accurate) and one above logit 1.00 (most accurate). Both at a visual inspection of Figure 1 and according to the KolmogorovSmirnov test (Chakravarti, Laha, & Roy, 1967), both stimuli and participants measures can be considered normally distributed (Zstimuli = .789, p = .56; Zpart = .48, p = .98), hence the assumptions for the computation of H are likely to be met. The separation ratio (G) of the facet participants was 3.25. The index of separation indicates that the number of heterogeneous groups that can be identified is 4 (at three SDs from one another). The reliability of this separation (R) is .91. These values indicate moderate individual differences in the ability of categorizing stimuli in the GNAT. As already noted, participants’ ability at the categorization task is not of direct interest when analyzing an implicit measure, but low person separations are preferable, because it would suggest that the technique is not influenced by participants’ task-set switching ability (see e.g. McFarland & Crouch, 2002). MFRM estimates, however, are sample-independent (a consequence of SO, see Appendix), hence even if the technique might suffer from such a potential confound, the Rasch estimates of the implicit associations do not. The MFRM also provides useful information on the stimuli used to measure the implicit construct under investigation. Low-fitting items are dangerous for implicit techniques because they might be ambiguous or they might represent concepts other than those activated by the other stimuli, whereas extreme items are dangerous because they might trigger reverse priming effects (Glaser & Banaji, 1999). In this case, the fit indexes for the stimuli facet are all largely inside the acceptable range (.94 < Infit < 1.08), suggesting unidimensionality. Yet, as can be seen in Figure 1, some extreme items (> 2 SDs from the mean logit) were included in the procedure, and should be avoided in future applications. Specifically, they are pictures of mass-produced jelly rolls, which were very different from the home-made cakes represented by the other stimuli.

18 On the other hand, a very attractive chocolate cake (a “Sacher torte”) turned out to be too easy to be categorized and should be avoided as well. The reliability of the stimuli is .75 (G = 1.74; H = 2.76), meaning that the sample is sufficiently large (although not huge) and that the stimuli measures are adequately spread. The time when measures were taken shows a learning effect that influence the categorization task (χ2(2) = 189.60, p < .001; G = 7.46; R = .98). Time 1 is the most difficult, and time 2 and 3 do not differ with each other (t(1) = 1.20; p = .44). The second and third time participants took the GNAT, they were more accurate. However, this learning effect does not influence the individual estimates of implicit attitude toward sweet food (F(2,56) = .36; p = .70) and salty food (F(2,56) = 1.01; p = .37). MFRM also provides the possibility of running bias/interaction analyses between two or more facets. The “Differential Person Functioning” (DPF) analyzes the interaction between the elements of the facet “Participants” and elements of other facets. Of particular interest in this context is the interaction with the facet “Conditions”. The Bias index involves introducing an interaction parameter into the model between the facets (e.g. ξnj for the interaction participants by conditions). With the aim of evaluating interaction significance, such parameters are often transformed into t points according to the following formula t=

ξ nj SEn2 + SE 2j

(10)

Bias terms (ξnj) are calculated using a two-stage calibration. In the first stage the incomplete model, without interaction, is estimated. Subsequently, all the parameters are linked to the values that have been previously calculated, and only ξnj is estimated (Linacre, 2009a). The bias term represents the distance between expected and observed scores in logit units. In our analysis, it is an estimate of the implicit association between each pair of categories included in

19 the task (i.e. Salty+Bad, Salty+Good, Sweet+Bad, Sweet+Good). Differential estimates of the implicit attitude toward the target can be obtained by subtracting the value of the beta parameter of a condition (e.g. Sweet+Bad) from the value of the beta parameter of another condition (e.g. Sweet+Good). These are also called pairwise contrasts, and their standard error is SEnj = SEn2 + SE 2j , also called joint SE (Linacre 2009a). Dividing the contrast by its joint SE, a t is obtained.

Insert Figure 2 about here

The plot in Figure 2 provides individual t values from the pairwise comparison between the negative and the positive condition of each target (sweet and salty food). Altogether, 15 participants show significant implicit associations. Six of them, which are highlighted by squares in the figure, show a positive implicit attitude toward sweet food. Five show a positive implicit attitude toward salty food (circles) and four show a negative implicit attitude toward salty foods (triangles). As an example, Table 2 provides relevant information for the 9 participants that showed a significant positive or negative implicit evaluation of salty food.

Insert Table 2 about here

In this table, the “Salty+Good” and “Salty+Bad” columns show the difficulty (in logit) of the conditions (including measurement error). The “Contrast” column displays the difference in logit between the two conditions, the standard error of contrast, the value of t associated with such a difference, the degrees of freedom and the associated significance levels (two-tailed).

20 The t values that can be computed with a DPF analysis are standardized, usually reliable, easily interpretable and normally distributed (if d.f > 30) estimates of the individual implicit association studied. They can be computed both for a single element (dividing the Rasch measure by its SE) and for a difference of logits (using the joint SE). The reliability (r00) of these MFRM-derived implicit association scores can be computed according to its classical definition (true variance + error variance = observed variance). Specifically, the variance of the single (ξnj) or pairwise (contrast) measure of association across subjects (“true” variance) should be divided by the sum of true variance and error variance (the mean across subjects of the squared standard errors). Table 3 provides mean Rasch measures of association, the standard deviations and the reliabilities of these measures, separately by the time in which the GNAT was taken. As can be seen in Table 4, these estimates are substantially correlated with the scoring procedure originally suggested for the GNAT (d'), but they never share more than 38% of variance. The statistic d' is computed as the difference between the standardized proportion of correct responses to targets (Hits) and the standardized proportion of errors to distracters (False Alarms). According to Signal Detection Theory (Green & Swets, 1966), this index provides a measure of the individual ability to discriminate signals (target stimuli) with noise (distracters) from noise alone. Although the lack of correlation we found between the two alternate scoring methods might be due to the low reliability of d' scores (mean α = .22), we think the two measures are actually different, and that Rasch estimates are preferable because they are more reliable (mean r00 = .58). A direct comparison against concurrent measures (e.g. other implicit and/or explicit measures of the same construct or measures of actual behavior theoretically related to the construct under investigation) might indicate which model, and related scoring, better fits the needs and expectations of a researcher using the GNAT, or any other implicit technique.

21 Insert Table 3 about here

Insert Table 4 about here

Discussion The present study has demonstrated the applicability of the MFRM to data obtained using the GNAT procedure, and the multiple ways in which it can be used. The most relevant aspects will now be discussed. Latent trait. The statistical procedure allows the determination of the goodness-of-fit of the data to the model through a specific index of global fit (L2) and through item-level standardized Infit and Outfit statistics that, if adequate, allow us to consider the facets as aspects of the same trait, with a common measurement unit. Our results showed that all standardized Infit and Outfit statistics were far away from the critical level of 2. Furthermore, we showed that participants’ parameters do not change whether the positive or the negative stimuli alone are included in the analysis, and that a principal components analysis on standardized residuals extracts components with an eigenvalue smaller than 3 (see Linacre, 2009b). Hence, we can say that a unidimensional latent trait has been defined as precision on the association task, for participants, item’s time of observation, and conditions of association. Reliability analysis. Specific indices (R, G and H) allow the analysis of the reliability of the separation of the elements in each facet. In particular, the Person Separation Index (H = 4.66) has shown that the four groups can at least be identified for which the task is differently difficult. The Item Reliability Index (R = .75) showed that the sample size is adequate, although not huge. The Separation Ratio of the Time facet (G = 7.46) showed that a learning effect took place, since

22 participants are much more accurate in Time 2 and 3. The Reliability Index for the last facet suggests that the conditions are reliably different (R = .99), and indicate the existence of a grouplevel implicit positive attitude toward salty food and a negative implicit attitude toward sweet food. Item analysis. The Infit and Outfit values indicate the goodness-of-fit of the items to the latent trait and allow the selection of the more unidimensional stimuli. Furthermore, this analysis can be extended to any other facet of the model as, for example, the conditions or the participants. The estimates of the parameters and the associated standard errors allow the statistical evaluation of the different/identical location of the stimuli on the latent trait, and, therefore, their recognizability. In implicit techniques the choice of stimuli is extremely important, because they directly affect the validity of the measure. They should adequately represent the subject of the study, and they should all be equally recognizable. When response times or recognition errors of a categorization task are analyzed, the δ parameters – which typically reflect item difficulty – represent both their “recognizability” and their “prototypicality” compared with the nominal category of interest. The length of a word can mean more time is needed to read it, and consequently more time to respond. But even the prototypicality of a stimulus can influence the accuracy (or the speed) of the response, both in reading and in making decisions. For example, the word “Duck-billed platypus” is not especially representative of the category “Mammals”, which would be better represented by a stimulus such as “Monkey” or “Elephant”. The analysis of the δ parameters allows the diagnosis of any anomalies arising from the choice of stimuli. Interactions. The model allows an analysis of the interaction between various facets. We considered interactions between participants and conditions (Differential Person Functioning), because it provides individual estimates of the implicit associations under investigation and

23 represents a scoring procedure that is different from that already present in the literature (d', Nosek & Banaji, 2001). Although we have seen that our Rasch-based estimates of implicit association are much more reliable than d' scores, future criterion-related validity studies might suggest which scoring best suits the needs of an implicit measurement. However, from a mathematical point of view, t values derived from a Rasch analysis are superior to d' for many reasons. First, the d' cannot be computed for participants with an error rate greater than 50%, or with a proportion of hits and false alarms equal to zero or 100%. Then, t values are preferable because we know their distribution and because they are computed from measures that are independent – by definition – from the elements of all other facets included in the analysis. As a consequence, there is no risk that a Rasch-based measure of implicit association is influenced by participants’ ability or by the time when measures are taken, which have been found to be serious potential confounds (McFarland & Crouch, 2002, Robusto, Cristante & Vianello, 2008). Lastly, Blanton & Jaccard (2006) recently raised an issue against the arbitrariness of implicit measures. The MFRM elegantly solve this potential problem, because it attributes a rational zero point to the measurement undertaken, which can be fixed to the mean of the implicit measure obtained.

Conclusions Implicit measures had a great impact in psychological research and have been extensively studied. At the date this paper was written, the three most used techniques (EP, Fazio et al., 1986; IAT, Greenwald et al., 1998; GNAT, Nosek & Banaji, 2001) have been cited in almost 3500 different papers (source: Google Scholar). Yet, they are far from being a closed chapter, both from a theoretical and methodological point of view. As far as the theory, many contributions keep clarifying what an implicit measure is (and is not). For instance, Gawronski,

24 LeBel and Peters (2007) recently questioned some common assumptions about the stability, the lack of consciousness, and the resistance to social desirability of implicit measures, concluding that the available evidence is still equivocal. As far as the method, a number of papers scrutinized their psychometric properties, which have also been recently and comprehensively reviewed or meta-analyzed in many others (e.g. Greenwald, Poehlman, Uhlmann, & Banaji, 2009; Hofmann & Schmitt, 2008; Lane, Banaji, Nosek & Greenwald, 2007; Schnabel, Asendorpf & Greenwald, 2008). In addition, some contributions proposed empirically validated scoring algorithms (Greenwald, Nosek & Banaji, 2003), multinomial models to separate automatic and controlled processes (Conrey et al. 2005) and applications of formal models previously developed to analyze latencies (Klauer, Voss, Schmitz & Teige-Mocigemba, 2007). Yet, many methodological issues are still open. For example, no previous contribution successfully solved the problems of metric arbitrariness (Blanton & Jaccard, 2006), and those related to participants’ ability (McFarland & Crouch, 2002). These issues have been addressed in this paper, in which the MFRM was proposed as a formal model for the analysis of the GNAT. Specifically, we proposed an analysis strategy that fits implicit measures and the most common needs of the researchers that use them, showing that the MFRM provides a number of statistics and useful information that are not available in other models. For example, we described the Separation Index G that gives a sample-level effect size of the intensity of the implicit associations under investigation; the logit scale and the fit indexes, according to which bad stimuli can be avoided in future applications; and finally the bias term and the t value associated as individual estimates of the implicit associations. Notably, we found that Rasch-based individual measures of implicit associations are much more reliable than the d' scores originally proposed for the GNAT. Furthermore, among the many benefits, the MFRM resolves the issue of arbitrariness (Blanton & Jaccard, 2006)

25 because MFRM estimates are centered by construction around a rational zero point (e.g. their mean). Lastly, Rasch-based individual measures of implicit association are – by definition – independent from participants’ task-set switching ability, hence they also prevent the implicit measure from being affected by a potential confound that has been first studied by McFarland and Crouch (2002). The measurement model and the analysis strategy we adopted in this paper should easily fit other implicit techniques such as the AMP (Payne et al., 2005). Yet, some more formal research has to be conducted before a Many-Facet Rasch model for continuous variables can be used to analyze latency-based techniques of implicit constructs such as the IAT. In the meanwhile, we hope this paper will motivate researchers to use the MFRM to analyze errors (e.g. in a GNAT) and dichotomous choices (e.g. in an AMP), in order to construct high-quality and readily useful measures of the implicit constructs they are investigating.

26 References

Andrich, D. (1988). Rasch models for measurement. Beverly Hills: Sage. Bar-Anan, Y., Nosek, B. A., & Vianello, M. (2009). The sorting paired features task: A measure of association strengths. Experimental Psychology, 56(5), 329-343. Banaji, M., & Greenwald, A.G. (1995). Implicit gender stereotyping in judgments of fame. Journal of Personality and Social Psychology, 68(1), 181-198. Blanton, H. & Jaccard, J. (2006). Arbitrary Metrics in Psychology. American Psychologist, 61(1), 27-41. Bond, T. G. & Fox, C. M. (2001). Applying the Rasch model: fundamental measurement in the human sciences. Mahwah: Erlbaum. Bosson, J. K., Swann, W. B., & Pennebaker, J. W. (2000). Stalking the perfect measures of implicit self-esteem: The blind men and the elephant revisited? Journal of Personality and Social Psychology, 79(4), 631-643. Chakravarti, I. M., Laha, R. G. & Roy, J.(1967). Handbook of Methods of Applied Statistics, Volume I, NY: Wiley. Conrey, F. R., Sherman, J. W., Gawronski, B., Hugenberg, K., & Groom, C. J. (2005). Separating Multiple Processes in Implicit Social Cognition: The Quad Model of Implicit Task Performance. Journal of Personality and Social Psychology, 89(4), 469-487. De Houwer, J. (2003). The extrinsic affective Simon task. Experimental Psychology, 50(2), 7785. Fazio, R. H., & Olson, M. A. (2003). Implicit measures in social cognition research: Their meaning and uses. Annual Review of Psychology, 54, 297-327.

27 Fazio, R. H., Sanbonmatsu, D. M., Powell, M. C. & Kardes, F. R. (1986). On the automatic activation of attitudes. Journal of Personality and Social Psychology, 50(2), 229-238. Fisher, R.A. (1970). Statistical Methods for Research Workers, 14th Edit. Oliver and Boyd, Edinburgh. Fischer, G.H. (1995). Derivations of the Rasch Model. In: G.H. Fischer & I.W. Molenaar (Eds.), Rasch Models. Foundations, Recent Developments, and Applications. New York: Springer. Gawronski, B., LeBel, E. P., & Peters, K. R. (2007). What do implicit measures tell us? Scrutinizing the validity of three common assumptions. Perspectives on Psychological Science, 2, 181-193. Glaser, J., & Banaji, M. R. (1999). When fair is foul and foul is fair: Reverse priming in automatic evaluation. Journal of Personality and Social Psychology, 77(4) 669-687. Green, D.M., Swets J.A. (1966) Signal Detection Theory and Psychophysics. New York: Wiley. Greenwald, A. G., McGhee, D. E., & Schwartz, J. L. K. (1998). Measuring individual differences in implicit cognition: The implicit association test. Journal of Personality and Social Psychology, 74(6), 1464-1480. Greenwald, A. G., Nosek, B. A., & Banaji, M. R. (2003). Understanding and using the Implicit Association Test: I. An improved scoring algorithm. Journal of Personality and Social Psychology, 85(2), 197-216. Greenwald, A. G., Poehlman, T. A., Uhlmann, E., & Banaji, M. R. (2009). Understanding and using the Implicit Association Test: III. Meta-analysis of predictive validity. Journal of Personality and Social Psychology, 97, 17–41.

28 Higgins, E., T. (1996), Knowledge activation: Accessibility, applicability and salience. In: E.T. Higgins and A.W. Kruglanski, (Eds.), Social psychology: Handbook of basic principles, Guilford, New York, pp. 133–168. Hofmann, W.; & Schmitt, M. (2008). Advances and Challenges in the Indirect Measurement of Individual Differences at Age 10 of the Implicit Association Test. European Journal of Psychological Assessment, 24(4), 207-209. Karpinski, A., & Steinmen, R. B. (2006). The Single Category Implicit Association Test as a Measure of Implicit Social Cognition. Journal of Personality and Social Psychology, 91(1), 16-32. Klauer, K.C., Voss, A., Schmitz, F., & Teige-Mocigemba, S. (2007). Process components of the Implicit Association Test: A diffusion-model analysis. Journal of Personality and Social Psychology, 93(3), 353–368. Lane, K.A., Banaji, M.R., Nosek, B.A., & Greenwald, A.G. (2007). Understanding and using the Implicit Association Test: IV: Procedures and validity. In B. Wittenbrink & N. Schwarz (Eds.), Implicit measures of attitudes: Procedures and controversies (pp. 59-102). New York: Guilford Press. Linacre, J. M. (1989). Multi-facet Rasch measurement. Chicago: MESA Press. Linacre, J. M. (2009a). Facets Rasch measurement computer program. Chicago: Winsteps.com. Linacre, J. M. (2009b). A User's Guide to WINSTEPS/MINISTEP Rasch-Model Computer Programs. Chicago: Winsteps.com. Linacre, J. M., & Wright, B. D. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8, 370. Available at http://rasch.org/rmt/rmt83.htm. McFarland, S. G., & Crouch, Z. (2002). A cognitive skill confound on the Implicit Association Test. Social Cognition, 20(6), 483-510.

29 Myford, C. M., & Wolfe, E. W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422. Nosek, B. A., & Banaji, M. R. (2001). The Go/No-go Association Task. Social Cognition, 19(6), 625-666. Nosek, B. A., Greenwald, A. G., & Banaji, M. R. (2006). The Implicit Association Test at age 7: A methodological and conceptual review. In J. A. Bargh (Ed.), Social Psychology and the Unconscious: The Automaticity of Higher Mental Processes (pp. 265-292). Philadelphia, PA: Psychology Press. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric Theory. New York, NY: McGraw-Hill. Olson, M. A., & Fazio, R. H. (2003). Relations between implicit measures of prejudice: What are we measuring? Psychological Science, 14(6), 636-639. Payne, B. K., Cheng, C. M., Govorun, O., & Stewart, B. (2005). An inkblot for attitudes: Affect misattribution as implicit measurement. Journal of Personality and Social Psychology, 89(3), 277-293. Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago: The University of Chicago Press. (Original work published 1960). Robusto, E., Cristante, F., & Vianello, M. (2008). Assessing the impact of replication on Implicit Association Test effects by means of the Extended Logistic Model for the Assessment of Change. Behavior Research Methods, 40(4), 954-960. Schnabel, K. Asendorpf, J.B. & Greenwald, A.G. (2008). Assessment of individual differences in implicit cognition: A review of IAT measures. European Journal of Psychological Assessment, 24(4), 210-217. Smith, R. M. (1996). A comparison of methods for determining dimensionality in Rasch measurement. Structural Equation Modeling, 3(1), 25-40.

30 Smith, R. M. (2002). Detecting and evaluating the impact of multidimensionality using item statistics and principal component analysis of residuals. Journal of Applied Measurement. 3(2), 205-231. Wright B. D., & Masters G. N. (1982). Rating Scale Analysis, Chicago: MESA Press. Wright B. D. & Masters G. N. (2002). Number of Person or Item Strata. Rasch Measurement Transactions, 16, p. 888.

31 Appendix Specific Objectivity in Rasch Measurement

Rasch’s favorite example to explain the meaning of specific objectivity (Rasch, 1960/1980) was related to the joint definition and measurement of mass and force in classical mechanics (see Fischer, 1995). Let Ov, v = 1,2, …, be rigid bodies and let their masses be Mv. Furthermore, let there be some experimental conditions where forces Fi are applied to each of the masses, such as to produce acceleration Avi. According to the Second Newtonian Axiom (Force = Mass ×Acceleration) acceleration is proportional to the force exerted on the object, and inversely proportional to the object’s mass Avi = Mv-1 Fi. Therefore any two masses Mv and Mw can be compared according to the quotient

Av M v−1Fi M v = = . This implies that the two masses can be Aw M w−1Fi M w

compared independently (i.e. without knowledge of) the forces that are applied. Rasch (1960/1980) developed a model in which these specific objective comparisons can be applied to subjects and items of a test. The demonstration follows considering that the probability of obtaining a given pattern of responses x from a subject n to k items is the product of the probabilities that the subject n has of giving each response

P ( xn1, xn 2 ,..., xnk ) = P ( xn1 ) P ( xn 2 ) ...P ( xnk ) . Equation 1 describes the probability of a response Xni = (1,0), P( X ni = xni β n , δ i ) =

hence Equation A1 can be written as

exp[ xni ( β n − δ i )] , 1 + exp( β n − δ i )

(A1)

32 P ( xn1, xn2 ,..., xni ) =

exp[ xn1 ( β n − δ1 )]exp[ xn2 ( β n − δ 2 )]...exp[ xni ( β n − δ i )] [1 + exp ( β n − δ1 )][1 + exp ( β n − δ 2 )]...[1 + exp ( β n − δ i )]

=

exp ( xn1β n − xn1δ1 + xn 2 β n − xn 2δ 2 + ... + xnk β n − xnkδ k ) k

∏ 1 + exp ( β i =1

=

− δi )

v

exp[ β n ( xn1 + xn 2 + ... + xnk ) − xn1δ1 − xn 2δ 2 + ... − xnkδ k ] k

∏ 1 + exp ( β i =1

v

− δi )

k

Now, given that ( xn1 + xn 2 + ... + xnk ) = ∑ xni = rn represent the pattern of responses of i =1

subject n to k items, we can synthesize writing:

P ( xn1 , xn 2 ,..., xnk ) =

⎛ ⎝

exp ⎜ β n rn − k

∏

∑ x δ ⎞⎟⎠ k

ni

i

i =1

1 + exp ( β n − δ i )

(A2)

i =1

Consider two items, if the sum of their scores is rn = 1, the possible pattern are (1,0) and (0,1), and the probabilities of these, following Equation A2, are:

P (1, 0 ) =

exp ( β n 1 − 1δ1 − 0δ 2 ) k

∏1 + exp ( β i =1

P ( 0,1) =

n

− δi )

exp ( β n 1 − 0δ1 − 1δ 2 ) k

∏1 + exp ( β i =1

n

− δi )

=

exp ( β n − δ1 ) k

∏1 + exp ( β i =1

=

n

− δi )

exp ( β n − δ1 ) k

∏1 + exp ( β i =1

n

− δi )

As a consequence, the probabilities of a given rn is the sum of the probabilities of all possible patterns that can produce rn:

33

∑

P ( rn = xn1 + xn 2 + ... + xnk ) =

xn1 , xn 2 ,..., xnk rn

k ⎛ ⎞ exp ⎜ β n rn − ∑ xniδ i ⎟ i =1 ⎝ ⎠ k

∏ 1 + exp ( β i =1

n

− δi )

(A3)

The conditional probability of a pattern given rn is the ratio between the probability of that pattern and the probability of obtaining any other pattern with the same rn. Given two items and rn = 1, the probability of a specific pattern – (1,0) for example – is

P ( (1,0) rn = 1) =

P (1,0) P (1,0) + P ( 0,1)

in which the denominator is justified because with dichotomies two cases exist for which rn = 1: (1,0) and (0,1). The conditional probability of a pattern given rn is therefore obtained by dividing (A2) by (A3): k ⎛ ⎞ exp ⎜ β n rn − ∑ xniδ i ⎟ i =1 ⎝ ⎠ k

P ( xn1 , xn 2 ,..., xnk rn ) =

∏1 + exp ( β i =1

∑

xn1 , xn 2 ,..., xnk rn

=

=

− δi )

k ⎛ ⎞ exp ⎜ β n rn − ∑ xniδ i ⎟ i =1 ⎝ ⎠ k

∏1 + exp ( β i =1

=

n

n

− δi )

exp ( β n rn − xn1δ1 − xn 2δ 2 + ... − xnk δ k )

exp ( β n rn − xn1δ1 ) + exp ( β n rn − xn 2δ 2 ) + ... + exp ( β n rn − xnk δ k ) exp ( β n rn ) exp ⎡⎣ − ( xn1δ1 ) ⎤⎦ exp ⎡⎣ − ( xn 2δ 2 ) ⎤⎦ … exp ⎡⎣ − ( xnk δ k ) ⎤⎦

exp ( β n rn ) exp ⎡⎣ − ( xn1δ1 ) ⎤⎦ + exp ( β n rn ) exp ⎡⎣ − ( xn 2δ 2 ) ⎤⎦ + … + exp ( β n rn ) exp ⎡⎣ − ( xnk δ k ) ⎤⎦

exp ( β n rn ) exp ⎡⎣ − ( xn1δ1 ) ⎤⎦ exp ⎡⎣ − ( xn 2δ 2 ) ⎤⎦ ...exp ⎡⎣ − ( xnk δ k ) ⎤⎦

{

}

exp ( β n rn ) exp ⎡⎣ − ( xn1δ1 ) ⎤⎦ + exp ⎡⎣ − ( xn 2δ 2 ) ⎤⎦ + ... + exp ⎡⎣ − ( xnk δ k ) ⎤⎦

34

=

exp ⎡⎣ − ( xn1δ1 ) ⎤⎦ exp ⎡⎣ − ( xn 2δ 2 ) ⎤⎦ ...exp ⎡⎣ − ( xnk δ k ) ⎤⎦

exp ⎡⎣ − ( xn1δ1 ) ⎤⎦ + exp ⎡⎣ − ( xn 2δ 2 ) ⎤⎦ + ... + exp ⎡⎣ − ( xnk δ k ) ⎤⎦

⎛ k ⎞ exp ⎜ −∑ xniδ i ⎟ ⎝ i =1 ⎠ = ⎛ k ⎞ exp ⎜ −∑ xniδ i ⎟ ∑ ( xn1 , xn 2 ,..., xnk ) rn ⎝ i =1 ⎠

(A4)

Notice that βn is not present in (A4) anymore. Hence, it is evident that − the distribution of probabilities of rn is only a function of items’ difficulties (δ); furthermore, since the pattern of responses of each individual does not contain more information of those provided by rn, this is considered a sufficient statistic to estimate βn; − the estimation of items’ difficulties is completely independent form subjects’ ability. This demonstration is analogous for each facet of the Rasch model.

35 Table 1 The “Association Conditions” Measurement Report provides estimates of implicit associations at the group level, single and group fit indexes, and indexes of separation between conditions.

Infit Condition

Obs. Score

Measure

Outfit

Model SE MnSq

ZStd

MnSq

ZStd

Salty+Bad

5382

-.23

.03

1.02

1.00

1.07

2.6

Sweet+Good

5538

-.38

.03

1.00

.10

1.01

2

Salty+Good

5947

-.96

.04

.98

-.50

.92

-2.0

Sweet+Bad

5974

-1.07

.04

.99

-.40

.97

-.7

Mean

5710.3

-.66

.04

1.00

.10

.99

.00

SD

256.4

.36

.00

.01

.60

.06

1.7

Notes: RMSE = .04 (Population); Adj SD = .36; G = 9.28; H = 12.70; R = .99; Fixed (all same) χ2(3) = 353.2 p < .001

36 Table 2. Differential Person Functioning analysis of nine participants that showed significant implicit associations toward sweet or salty foods.

Salty+Bad Measur Participant

Salty+Good

Contrast (Bad-Good)

Measur SE

e

SE

Measure

SE

t

df

p(t)

e

8

-.15

.24

-1.28

.22

1.13

.33

1.64

219

<.001

3

.59

.31

-.45

.29

1.04

.42

-1.86

222

.014

36

.39

.29

-.57

.27

.96

.4

-1.35

219

.016

44

-.54

.22

-1.41

.21

.87

.31

-1.38

219

.005

14

-.3

.23

-1.11

.23

.81

.33

-1.38

219

.014

1

-.88

.21

.21

.37

-1.09

.42

-1.51

203

.010

54

-.34

.23

.94

.51

-1.28

.56

-1.81

192

.023

19

-.76

.21

.56

.42

-1.32

.47

1.77

197

.005

32

-.76

.21

.72

.46

-1.48

.5

-2.24

192

.003

37 Table 3. Means, Standard Deviations and Reliabilities of the Rasch measures of implicit association.

Implicit Association

Sweet+Bad

Sweet+Good

Salty+Bad

Salty+Good

Sweet (Bad vs. Good)

Salty (Bad vs. Good)

Notes

Time

Mean

SD

Reliability

1

.14

.83

.58

2

.31

.91

.50

3

.33

.97

.51

1

.14

.50

.51

2

.01

.75

.60

3

.18

.86

.61

1

.17

.71

.64

2

.16

.85

.67

3

.12

.70

.58

1

.25

.89

.61

2

.25

.86

.46

3

.31

.92

.48

1

.01

.96

.57

2

.30

1.18

.50

3

.03

1.20

.55

1

-.08

1.06

.63

2

-.09

1.22

.62

3

-.22

1.05

.55

38 Values in the third column are mean ξnj for the associations Sweet+Bad, Sweet+Good, Salty+Bad, and Salty+Good and mean contrasts for the associations Sweet (Bad vs. Good) and Salty (Bad vs. Good). The reliability is computed according to its classical definition (true variance + error variance = observed variance). Specifically, we divided the variance of the single (ξnj) or pairwise (contrast) measure of association across subjects (“true” variance) by the sum of true variance and error variance (the mean across subjects of the squared standard errors).

39

Table 4 Zero-Order Pearson correlations among MFRM standardized estimates of individual associations (t) and d' scores.

Sweet+Bad (t) Sweet+Bad (d')

.560**

Sweet+Good (t) -.120

Salty+Good (t)

-.349**

-.002

-.178

-.184 -.148

Sweet+Good (d')

-.144

Salty+Bad (d')

-.240

-.179

.498**

Salty+Good (d')

-.034

-.255*

-.299*

Note: ** p < .01; * p < .05

.499**

Salty+Bad (t)

.616**

40 Figure Captions

Figure 1 Elements of the four facets on the latent trait “Accuracy”. Figure 2. Plot of individual estimates (t) of the implicit attitude toward sweet and salty food.

41 Figure 1 Measure's Logits (Accuracy)

Stimuli's difficulty

Time difficulty

Participants' ability

Blocks' difficulty

More

Measure's Logits (Accuracy) More

56 17 30 58 9

1

"Sweet9.jpg"

"Sweet7.jpg"

0 "SUCCULENT" "SUBLIME" "Sweet3.jpg" "NICE"

"Salty8.jpg" "Salty3.jpg" "Salty6.jpg"

"PLEASANT"

"Sweet1.jpg"

"ATTRACTIVE"

T1

"Salty15.jpg" "Salty5.jpg"

"Sweet10.jpg" "Sweet8.jpg"

"HEAVEN"

"Sweet11.jpg" "Sweet13.jpg" "Sweet5.jpg"

"PARADISE"

"EXCELLENT"

"UNPLEASANT"

"NASTY"

"Salty7.jpg"

29 27 39 53 46 47 48 60 2 35 41 51 40 42 50 55 15 18 33 36 23 4 10 13 16 25 31 12 21 3 45 54 32 57 6 1 22 37 5 7 28 43 49 19 8

1

0 Salty + BAD Sweet + GOOD

"Salty14.jpg" "Salty2.jpg"

14 24 44 11 26 34 59

"Sweet12.jpg" "Sweet16.jpg" "Salty1.jpg" "Salty11.jpg" "TASTY" "GOOD" "BAD" "DISGUSTING" "Sweet15.jpg" "BETTER" "PLEASURE" "COOL" "DISGUST" "Salty10.jpg" "Salty13.jpg" "Salty16.jpg" "Salty4.jpg"

-1

"AWFUL"

"Sweet14.jpg" "Sweet4.jpg"

"Sweet6.jpg"

"MARVELLOUS" "WORST"

"HELL"

"HATE"

"BEAUTIFUL"

"STINK" "WORSE"

"UGLY"

"STENCH"

"REVOLTING"

"FUNNY"

"Salty9.jpg"

"Salty12.jpg"

T2 T3 20 38 52

"VOMIT"

Salty + GOOD Sweet + BAD

-1

"NAUSEA" "Sweet2.jpg"

Less Measure's Logits (Accuracy)

Note: L2(26217)= 19054.16, p=1.

Less

Stimuli's difficulty

Participants' ability

Blocks' difficulty

Measure's Logits (Accuracy)

42 Figure 2 5

4

Conditions: pairwise t values

3

2

1

Sweet+Bad vs. Sweet+Good

0

Salty+Bad vs. Salty+Good

-1

-2

-3

-4

-5 1

3

5

7

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 Participants

Notes: positive t values indicate positive implicit attitude; t values higher than |2| indicate a significant implicit association.