Journal of Behavioral Decision Making J. Behav. Dec. Making, 17: 333–348 (2004) Published online in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/bdm.474

Combining Advice: The Weight of a Dissenting Opinion in the Consensus CLARE HARRIES1*, ILAN YANIV2 and NIGEL HARVEY1 1 University College London, UK 2 Hebrew University ofJerusalem, Israel

ABSTRACT We present two studies that evaluate how people combine advice and how they respond to outlying opinions. In a preliminary study, we found that individuals use discounting strategies when they encounter an extreme opinion in a small sample of opinions taken only once (a one-shot advice-taking situation). The main study examines the influence of outlying opinions (which may or may not be accurate) within a learning paradigm with feedback. This study shows that it is easy to reinforce a discounting strategy (with feedback) whereas it is more difficult to counteract this default strategy. In the discussion we consider cognitive, statistical, and strategic justifications for discounting opinions, from both theoretical and practical points of view. Copyright # 2004 John Wiley & Sons, Ltd. key words

advice taking; combining forecasts; judgment; dissenting opinions; outliers

In realistic decisions people typically consult only a small number of advisors. For example, in making editorial decisions journal editors often consult the opinions of two or three independent referees. Similarly, patients typically seek two or three opinions before undertaking an important medical decision. If all opinions are in consensus (e.g., three journal referees give a paper a low rating and recommend that it be rejected), then the course of action is clear since almost any reasonable combination strategy would yield the same result. Consider, however, the situation where some advisors (e.g., two journal referees) advocate one course of action (e.g., rejection of the paper) while a third advisor offers an opposing opinion (a strong acceptance recommendation). In situations where few opinions are consulted, the choice of a strategy for combination may prove critical and the appropriate approach to decision-making is open to debate (see Baumeister, 2001, for just such a debate). Consider solutions to this aggregation problem. Suppose two hypothetical business forecasters predict that the sales volume of their company will increase by 8% and 10%, respectively, whereas a third manager

* Correspondence to: Clare Harries, Department of Psychology, University College London, Gower Street, London WC1E 6BT, UK. E-mail: [email protected] Contract/grant sponsor: Economic and Social Research Council, UK; contract/grant number: R000236827. Contract/grant sponsor: Israel Science Foundation; contract/grant number: 822.

Copyright # 2004 John Wiley & Sons, Ltd.

334

Journal of Behavioral Decision Making

predicts a surge of 25%. Whereas exclusive acceptance of the outlier would yield a final estimate of 25%, and an outlier-discounting approach would generate a global forecast in the neighborhood of 9%, an equal weighting strategy (the mean) would result in a prediction of 14%. Clearly, with a small number of opinions consulted, different combination strategies yield very different final predictions. The focus of the present research is on how individuals solve this aggregation problem. In particular, how do decision makers treat an outlying opinion when aggregating several pieces of advice? We shall argue that the default strategy in novel situations is to discount an outlying opinion but that the choice of strategy depends upon experience. For example, a manager may have received employees’ forecasts on past sales items; her aggregation strategy for their current sales forecasts may depend on this previous experience.

Normative bases for discounting outlier opinions An input judgment is called ‘‘outlying’’ if it is extreme relative to other opinions in the sample to be combined. Taking the median results in discounting extreme values, as compared to using an equal weighting strategy. Similarly, strategies that select just one opinion are more likely to discount the outlier than the other opinions. Trimming strategies eliminate any outliers from the sample prior to combining the remaining opinions. One can argue that pieces of advice should be equally weighted under conditions of ignorance (when the general goodness of each source is unknown). However, it is well known that people making decisions within group contexts tend to conform to the views of the majority (e.g., Sherif, 1935). Although such findings are often interpreted as demonstrating that people comply with social pressure, there is also recognition that people may conform because doing so leads to better decisions. ‘‘From birth on, we learn that the perceptions and judgments of others are frequently reliable sources of evidence about reality’’ (Deutsch & Gerard, 1955, p. 635) so that ‘‘group majorities may frequently be regarded as ‘tools’ or problem-solving aids, whose general value has been established through past experience’’ (Penner & Davis, 1969, p. 299). More recently, Henrich and Boyd (1998), Boyd and Richerson (2001), Gigerenzer (in press) have argued that conforming, or the do-what-the-majority-do heuristic, is adaptive under the sort of conditions that often hold in natural environments (though see also Kameda & Nakanishi, 2002, for conditions in which it is of limited benefit). In fact, the usefulness of discounting an outlier, statistically speaking, depends on the parent distribution from which the sample has been drawn. Yaniv (1997) outlined several justifications for the use of discounting strategies. One of these is based on considerations of the distribution of judgmental estimates around the true answer. He conducted a computer simulation of the aggregation of samples of opinions. The opinions were sampled from pools of judgmental estimates (of historical dates) that were obtained from respondents who had participated in a prior study. Thus the estimates were ecologically representative of estimates that might be obtained in daily life when seeking answers to questions. The rationale for using a computer simulation of an aggregation of samples of estimates was that the usefulness of any aggregation scheme ultimately depends on the properties of the judgments that are being aggregated (e.g., the variability of opinions, their bias, and the frequency of extreme opinions). The simulation results suggested that taking the median achieves greater accuracy than simple averaging, as does conditional trimming.1 Moreover, according to De Groot (1986, pp. 564–569; Wilcox, 1992), given an underlying symmetric distribution with relatively thick tails, a trimmed sample mean is to be preferred to the raw sample mean as an estimator of the central tendency of the distribution (see also Streiner, 2000). In fact, there is some evidence that the distribution of many human test results, opinions and judgments is not normal. In an analysis of large samples of results (general distributions, ability measures, psychometric measures, test results), Micceri (1989) found that the performance distributions of more than half of these samples had at least one heavy 1 Conditional trimming in this case involved the selective exclusion of outlying estimates which were more than two standard deviations away from the average.

Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

C. Harries et al.

Dissenting Opinion

335

tail and about a fifth had small tails. In other words, in many samples drawn from human responses, discounting outliers would be a preferable aggregation strategy to taking the mean when the accuracy of the sources of the opinions is unknown. Arguably, if otherwise ignorant as to the quality of reviewers’ judgments, an editor should examine the statistical structure of the judgmental ratings of papers, and then, if appropriate, combine judgments using a trimming strategy. In his analysis of the benefits of combining forecasts, Armstrong (2001) advocates trimming extreme forecasts on the grounds of the role of ‘‘miscalculations, errors in data or misunderstanding’’ in creating large errors. This is compatible with an argument based on the underlying distributions. In fact, he advocates trimming unconditionally when combining five or more forecasts from any source, and cites evidence to demonstrate the benefits of use of the median over the mean. The results of the computer simulations and the statistical arguments establish useful benchmarks for assessing human behavior. Arguably, there is an infinite number of potential schemes that decision-makers could use for aggregation. It is practically impossible to test or even enumerate any great number of them. We focus only on some basic aggregation policies such as mean, median, and conditional trimming because each presents a different approach to the aggregation problem.

Psychological motivations for discounting outlier opinions Whereas arguments as to why an outlier should be discounted pertain to the underlying statistical distribution of advice and to accuracy, an account of why an outlier might actually be discounted involves the judge’s goals, perception and effort. An outlier might be discounted as part of a deliberate strategy to avoid dissonance and reduce effort. The need to create internal consistency or harmony among opinions (and actions) runs through much of the literature in social cognitive psychology. Indeed, a variety of mental processes in which the goal is to achieve consonance have been documented (e.g., Festinger, 1957; Heider, 1958). Selection of redundant information may be seen as increasing consonance (Soll, 1999). Trimming is just one mechanism for creating consonance among dissonant pieces of information. Decision-makers may rely on a basic cognitive process that trims inconsistent data as part of a generalized strategy for resolving inconsistencies and conflicts among input opinions. But decision-makers who trim outlying opinions may, unknowingly, be ignoring their best data: although dissenting estimates differ from the consensus, they are not necessarily wrong. Moreover, a tendency to resolve inconsistencies by trimming outlying opinions, like discounting evidence that challenges one’s prior beliefs, can hamper proper revision of beliefs (Bochner & Insko, 1966; Lord, Ross, & Lepper, 1979; Yaniv, 2004a). In a different vein, an outlier might be discounted while pursuing a particular strategy aimed at maximizing accuracy. Taking the median in a one-shot situation is a better strategy in that it uses all the pieces of advice, but is not overly influenced by the magnitudes of the outliers. People deliberately discounting outliers from some particular source will be slower to learn if that outlier source actually produces accurate opinions. In repeated situations, the perceived merit of different strategies depends on the advice-taker’s experience. Faced again and again with pieces of advice to combine from the same set of sources, decision-makers could utilize any of a number of one-shot strategies based on simple descriptive statistics of the sort we have discussed. Alternatively, they could adopt one of a number of long-term strategies appropriate to repeated judgment situations with feedback. In such situations, advisors develop reputations, and the decision-maker forms impressions of these advisors based on their past achievements (Harvey, Harries, & Fischer, 2000; Yaniv & Kleinberger, 2000). In such repeat situations, the decision-maker can rely on feedback to evaluate the feasibility of various aggregation policies (such as different weighting policies) to judge which best suit the types of advice that are available. The suggestion here is that decision makers will entertain several policies or rules. Studies of concept formation and multiple cue probability learning suggest that ecological and, perhaps, cognitive factors lead individuals to prefer particular rules or hypotheses over others. Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

336

Journal of Behavioral Decision Making

In their study of concept formation, Bruner, Goodnow, and Austin (1956) found that many people tested hypotheses about the nature of the target concept sequentially using a win–stay lose–shift strategy. Later, the same type of model was applied to the learning of functional relations between variables in multiple cue probability learning tasks (e.g., Brehmer, 1974; Sniezek & Naylor, 1978). It was found that certain relations were easier to pick up than others (e.g., positive linear relations were easier than negative linear relations and both of these were easier than curvilinear relations). Hence, Brehmer (1974) proposed a hierarchy of rules (or hypotheses) that individuals entertain when they consider relationships in the environment, with some relations being more natural for them to expect than others. We suggest that in solving the aggregation problem, people may also entertain a series of strategies such that default strategies precede more elaborate ones. For instance, strategies that discount outliers may be treated as default and hence initially take precedence over strategies that use all the information. Similarly, one-shot strategies may precede those that weight and integrate information in accordance with the past accuracy of the sources. Like Brehmer’s rules, the use of different combination strategies is likely to depend both on how simple they are and on how appropriate they are to the statistical environment of the task. Overview of studies In the present research, we do not aim to investigate all possible strategies but to focus on a restricted set that statistical and theoretical considerations suggest may be used by judges. In our preliminary study we considered various one-shot combination strategies, such as equal weighting (average), median (which is insensitive to extreme values and thus effectively discounts extreme opinions), midrange (which is sensitive only to extremes), and other measures of central tendency. These one-shot strategies assume that decision-makers have no prior information on which to base their weighting of the opinions. In particular, they use only the opinions presented to them and require no access to any additional information about the advisors’ ability, past accuracy, or confidence. In our main study, we again evaluate these one-shot combination strategies and also compare them with long-term strategies that use information about advisors’ past accuracy. These longterm strategies rely on the decision-maker’s experience with the advisors’ earlier performance to guide their weighting policies. We focus on people’s ability to change from the default strategy used in one-shot situations to other more complex strategies. In both studies our participants make quantitative judgments. PRELIMINARY STUDY The first study was intended to help us conceptualize the issues of strategy in a one-shot situation, and to set the stage for the larger computerized study. The study involved a short paper-and-pencil task in which participants aggregated a set of opinions. Their task was embedded in the following fictional scenario: Imagine that you are traveling to different countries identified here as A, B, and C. In each country you meet groups of other students from your home country. You use these encounters to ask them questions about that country, and in particular about the year the capital city of that country was founded and your current distance from that city. In each case, you will receive the opinions of several students. They each answer your question on the basis of their memory and best judgment. Your task is to determine what you think the true answer might be, based on the opinions given to you. Figure 1 shows a pair of questions used in the study. The two questions involve the same estimates, except that the third opinion in the second question (360) is shifted four standard deviations away from its value in the first question (770). This made it an outlier relative to the remaining opinions in the advice set. More details on the construction of the outliers are provided below. Each question pertained to a different foreign country, which was identified by a different letter (e.g., Country A, Country B) and involved also a different set of advisors. Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

C. Harries et al.

Dissenting Opinion

337

Figure 1. Sample questions

In the analysis we considered the fit of the following series of one-shot aggregation strategies to the aggregate estimates produced by the respondents: Mean: the average of all opinions. Median: the median of all opinions; half the opinions are greater than this value and half are less than it. Trimmed mean: the most extreme opinion (the furthest point from the median) is dropped and the mean is calculated across the remaining opinions. Conditionally-trimmed mean: the trimmed mean is taken only if the most extreme opinion is more than four standard deviations away from the median. Otherwise, the mean of all opinions is taken. Midrange: the midpoint between the two most extreme opinions. Take the outlier: A strategy that uses only the outlier (used as benchmark). The different strategies were chosen on two grounds. First, we wanted to use schemes that were grounded in standard descriptive statistics. Second, we wanted to use schemes that would help to clarify the principles that individuals use in aggregating opinions. For instance, the mean represents the equal weighting of opinions whereas conditional trimming represents the attempt to look for and focus on the consensual opinions. The midrange and trimmed mean (and also take the outlier) helped us establish benchmarks relative to which we could assess other schemes. Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

338

Journal of Behavioral Decision Making

Method The participants (N ¼ 62) were undergraduate social science students at the Hebrew University of Jerusalem. They were divided into two groups. Participants in each group received different (paper and pencil) questionnaires, each including six questions requesting them to provide their best estimates. For each respondent three of the questions contained opinions that included an outlier and three did not. Two of the questions contained three opinions, two of them contained four opinions and two contained five opinions. No feedback was given on performance and the participants were told that each set of opinions had been produced by different groups of individuals. In the consensus condition, opinions were similar in the sense that they all had z-scores less than 2.0, whereas in the outlier condition, the outlier had a z-score greater than 2.0 and the remaining opinions had z-scores less than 2.0. To compute the z-score of a given opinion, the mean and standard deviation of the remaining opinions were first calculated. The degree of similarity of each opinion was then expressed in terms of a z-score relative to the mean and SD of the remaining opinions in that particular sample. For example, in Figure 1, the third opinion 770 had a z-score of 0.43 whereas the z-score of 360 (in the outlier set) was 5.25. It should be kept in mind that, when there are no outliers, the conditionally-trimmed mean and the mean strategy yield the same best estimate.

Results An aggregate estimate was calculated using each of the strategies for each of the samples presented to respondents. To evaluate how the various strategies fit the respondents’ estimates, we calculated the mean absolute difference (MAD) for each strategy for each respondent. The results, averaged across respondents, are shown in Table 1. As might be expected, the fit of each of the strategies was substantially worse in the presence of an outlier. Importantly, however, certain strategies that ignored outliers provided better fit of respondents’ estimates. Strategies that give special weight to the outlier (i.e., midpoint and ‘‘take the outlier’’) perform especially poorly. (We note, in passing, that the poor strategies’ performance might not seem surprising, but we have included them in order to provide benchmarks for the results of the main study that involved learning.) A repeated measures analysis of variance was performed on the fit measure with outlier condition (consensus, outlier) and strategy (6 levels) as within-subject factors. The analysis revealed significant effects of outlier condition, F(1, 246) ¼ 640.9, p < 0.0001, strategy, F(5, 1230) ¼ 422.4, p < 0.0001, and a conditionby-strategy interaction, F(5, 1230) ¼ 195.4, p < 0.0001. Table 1. Fit (mean absolute difference) of each strategy by outlier condition: preliminary study* Condition Strategy type Median Conditional trim** Trim Mean Midrange Take the outlier

Consensus

Outlier

18.4 21.6 24.4 21.6 29.5 87.2

64.9 78.7 78.7 90.3 146.3 409.9

*Lower numbers indicate better fit. The numbers in bold show the best-fitting strategy. **In the consensus condition, the conditional trimmed mean, by definition, equals the mean (since no opinions are trimmed). In the outlier condition, the conditional trimmed mean, by definition, equals the trimmed mean (since outliers are trimmed in both).

Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

C. Harries et al.

Dissenting Opinion

339

The analysis of variance was followed by three planned pair-wise comparisons (using paired t-tests) that were theoretically critical to evaluate the fits of the various strategies. In both conditions the median was a significantly better fit than the other strategies ( p < 0.0001), and the midrange was significantly worse than the mean ( p < 0.0001). The conditional trimmed mean was a significantly better fit than the trimmed mean in the consensus condition ( p < 0.0001) and equal to it (by definition) in the outlier condition. Thus the median emerges as the best fit for people’s aggregation strategies of the basic schemes tested. We address the implications of these results next. Discussion This preliminary study was concerned with aggregation in one-shot situations. No information was provided about the past performance of the advisor; feedback was not given and reputation, therefore, could not be formed (Yaniv & Kleinberger, 2000). Under the assumption that the shape of the underlying distribution is normal, the statistical norm would be to combine pieces of information by giving them equal weight (i.e., by taking the mean of the advice). Under such a strategy, outliers would have the same influence as any other information. Our participants did not appear to follow such a strategy; instead they appeared to discount the outlier. The strategy that best matched their responses was the median of opinions. This strategy was the best fit both when there was an outlier in the sample and when there was a consensus. The advantage of the median is emphasized when there is an outlier; when there is none, it makes less difference whether one uses the mean or the median (although in our data the median still emerged as the best fit). This preliminary study establishes that the median is used as a default strategy for combining opinions in one-shot situations, in which information about the advisors’ past performance is not available and respondents cannot, therefore, form any impressions about their abilities. In the next study, we consider the use of various strategies in a repeated forecasting situation. MAIN STUDY Our preliminary study showed that taking the median is a default strategy in a novel situation. It results in outliers being discounted. One rationale for discounting an outlier in an everyday setting is that it is liable to be far from the actual value, whereas the consensus of the sample will tend to be close to the truth. This is based on the assumption that the mass of the distribution (of opinions) is located near the truth. However, there are situations in which a single expert produces advice that is both accurate and outlying. For example, only one reviewer may spot methodological flaws in a paper, or only one reviewer may realize the theoretical importance of a paper (Baumeister, 2001). Here we look at the ability of participants to learn to trust this outlier. The relative difficulty of acquiring reputation in these conditions will tell us which strategy is natural for people to use. If outlier discounting is used as the default strategy and is the result of consonance seeking, it should be hard for people to stop using it even after getting feedback that it is inaccurate. It should be relatively difficult for participants to pick up a weighting strategy that disagrees with their natural approach to combining opinions. In this study, each individual’s strategy was examined across 50 trials. On each trial a sample of forecasts was presented that included one outlier. The initial ten trials, as well as the last ten trials were run without feedback. Feedback was given on the middle 30 trials. Comparing performance on the first and last blocks of ten enabled us to assess the effects of feedback. In some conditions the outlier was misguided (thereby justifying trimming), while in others the outlier opinion was more accurate than the consensus. Thus, we could test respondents’ reactions to these different conditions, and, in particular, how they learn which advisor(s) to trust (consensus or outlier). We also examined the effect of whether advisors were consistently or inconsistently named across trials. We expected consistent naming to facilitate development of appropriate weighting strategies for two reasons. Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

340

Journal of Behavioral Decision Making

The first was concerned with cue redundancy. In the consistent naming conditions, participants would be able to reject bad advice (outlier incorrect conditions) or select good advice (outlier correct conditions) on the basis of two redundant cues: the name of the source of advice and whether the source produces outlying advice. In the inconsistent naming condition, however, only a single cue would be available for this purpose: whether the source produced outlying advice. Following Brunswik (1952), it is reasonable to argue that increasing the number of means to reach a goal state (an appropriate weighting strategy) should increase the probability that the goal state is reached. The second reason we anticipated consistent naming to have an advantage over inconsistent naming was concerned with plausibility. In a single group of advisors, it is moderately unlikely that an outlier will provide better advice than the consensus. In a number of different groups of advisors, however, it is highly unlikely, though not impossible, that the outliers in each group will produce the best advice. Participants may well be affected by this type of statistical consideration.

Method Participants were told to imagine that they had just started a new job that involved traveling to different parts of the United Kingdom. The day before traveling to each place they wanted to know what the weather would be like and so received the opinions of five work colleagues who were already there. They used this advice to make their own estimate of the maximum temperature. In a computer-based task, each participant aggregated five pieces of advice into their own forecast of the maximum temperature, for each of fifty places. For the first and last ten places they did not receive feedback. For the middle 30 places, after making each forecast they received feedback—the actual maximum temperature. Participants and design. Sixty people participated in the study at the Department of Psychology, University College London. They were randomly assigned to one of four conditions in a 2  2 between-participant design. The naming of the advisors in the different places was either constant or changing. The actual outcome was either around the majority opinion (consensus-correct condition) or around the outlier (outliercorrect condition). Stimulus materials. For each of 30 places, five pieces of advice (four conformers and one outlier) were selected from a pool of estimates of maximum temperature. The pool of weather judgments had been given by 52 participants who, in a separate study, were each asked to estimate the maximum temperature and the likelihood of rain for each of 30 places. For the purpose of the present study, five of these pieces of advice were selected for each place using the following process. First, twenty samples of five pieces of advice were drawn at random and with replacement from the pool of 52 temperature estimates. From each set of 20 samples one sample was chosen (by the experimenters) with four similar forecasts and one outlier. For half the places, the outlier’s estimate was higher than that of the consensus; for half it was below that of the consensus. The samples were appropriate in the sense that they were drawn from pools of actual estimates made by ordinary people about real places. Just as the hypothetical work colleagues’ estimates of the following day’s weather would have been based on knowledge of the weather there today, these people knew the weather on one day and estimated the following day’s weather. For each of the 30 sets of five pieces of advice, an actual outcome was generated artificially. In the consensus-correct conditions, the outcome was selected at random from a normal distribution centered on one of the conforming pieces of advice, with a standard deviation of one unit. All participants in the consensuscorrect conditions saw the same set of outcomes. In the outlier-correct conditions, the outcome was selected at random from a normal distribution centered on the outlier, with a standard deviation of one unit. All participants in the outlier-correct conditions also saw the same set of outcomes. Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

C. Harries et al.

Dissenting Opinion

341

On each trial, the five pieces of advice appeared horizontally on a screen against a background picture of clouds, next to the label ‘‘Maximum temperature  C’’. Beneath them appeared a display of the names of five advisors. Then beneath this appeared the phrase ‘‘Your estimate of the maximum temperature’’, a text box above a horizontal scroll bar, and a ‘‘confirm’’ button. Moving the scroll bar to the right increased the estimate up to a maximum of 25 C, moving it to the left decreased the estimate down to a minimum of 3 C (a range corresponding to 27–77 F). Participants used the scroll bar to produce their forecasts and then clicked the confirm button. In the constant-names conditions, the five names of advisors were assigned randomly once and then remained the same throughout the task. In the changing-names conditions, the names were picked randomly for each trial. Each set of five names was picked at random from pools of over 300 forenames and surnames of students in the psychology department at University College London. Altogether participants responded to 50 trials. All of them saw the same middle 30 sets of five pieces of advice (in an order that was randomized for each participant) and received feedback. In addition, ten of the 30 sets were selected at random for presentation without feedback at the beginning and at the end of the experiment. A different selection of ten sets was made for each participant. Each place was labeled only by a number according to its sequential position; the actual names of places in the UK were never displayed so that participants could use only the advice to make their forecast. Results We carried out two analyses. First, we evaluated the fit of one-shot strategies on the no-feedback trials (first and last ten trials). Second, we looked at the change in the relative weighting of the outlier forecast as a function of the experimental factors. Fit of strategies. The fit of the strategies used in analysis of our preliminary study is shown in Table 2. The strategies were compared over the first ten trials using three orthogonal planned t-tests. The fit of a strategy is the mean absolute difference between the strategy-based forecast and the human forecast. Therefore lower numbers indicate better fit. The median had a significantly better fit than the mean (MAD ¼ 1.09 and 1.23, respectively, t(59) ¼ 2.39, p < 0.05). The conditional trimmed mean had a significantly better fit than the trimmed mean (MAD ¼ 1.18 and 1.35 respectively, t(59) ¼ 3.26, p < 0.05). The midrange (MAD ¼ 2.14) had a significantly Table 2. Fit (mean absolute differences) of each strategy by condition: main study* Outlier correct Constant names Block 1 Median Mean Conditional trim** Trimmed mean Midpoint Take the outlier Block 5 Median Mean Conditional trim** Trimmed mean Midpoint Take the outlier

Consensus correct

Changing names

Constant names

Changing names

0.93 1.08 1.03 1.15 2.16 6.35

0.85 1.10 1.05 1.18 2.03 6.18

1.27 1.42 1.32 1.39 2.36 6.45

1.29 1.29 1.32 1.67 2.02 5.90

5.88 5.04 5.53 6.35 3.80 1.05

4.56 4.07 4.33 4.98 3.40 2.84

1.32 1.86 1.54 1.26 3.20 7.47

1.16 1.37 1.31 1.23 2.42 6.54

*Lower numbers indicate better fit. The numbers in bold show the best-fitting strategy. **Trim advice that is more than 2 standard deviations from the median.

Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

342

Journal of Behavioral Decision Making

worse fit than the others (average MAD of other strategies ¼ 1.24, t(59) ¼ 7.70, p < 0.001). Thus, over the first ten trials, where there was no feedback, the results were similar to those in the preliminary study; the participants’ aggregation behavior was best described by the median of the pieces of advice. The strategy ‘‘take the outlier’’ is also included in Table 2. This strategy is a very poor fit in the first ten trials in all conditions. However, as Table 2 shows, it provided the best fit in the last ten trials of the outliercorrect conditions. Its advantage was especially marked when names were constant. We analyze the weight given to the outlier in the next section to show this change in strategy more clearly. Relative weight of outlier (RW). In order to investigate participants’ use of the outliers, we created a measure that captures the influence of the outlier (relative to the other forecasts) on participants’ final forecast. The measure was defined as follows: RW ¼

MCon  judgment ; MCon  outlier

where RW is the relative weight of outlier on a trial, MCon is the average of the opinions of the four consensus advisors on that trial, outlier is the opinion of the outlying advisor, and judgment is the participant’s forecast. A relative weight of 1.0 indicates that the participant’s judgment is based completely on the outlying advice (it matches the outlier); a relative weight of 0.0 indicates that the participant’s judgment is based completely on the consensus (it matches the trimmed mean). If respondents give equal weights to all five opinions, then the resulting relative weight will be 0.2. Therefore 0.2 is an important benchmark relative to which we can assess the respondents’ relative weights. The lines in Figure 2 track the changes in the relative weights assigned to the outliers over the five blocks of the study. In addition four benchmarks are shown. These are the relative weights based on: (1) equal weighting; (2) median strategy; (3) the correct advisor in outlier-correct condition; and (4) the best advisor in the consensus-correct condition. We analyzed the relative weights in the first and last blocks. We looked at the effect of naming and criterion type in the first block and in the last block. Analysis of the first block showed no significant effect of naming (constant vs. changing) or criterion type (outlier correct vs. consensus correct). Analysis of the last block revealed a significant main effect of criterion type (F(1, 56) ¼ 113.09, p < 0.05), no main effect of naming (F(1, 56) ¼ 3.71, p ¼ 0.059), and a significant interaction between naming and criterion (F(1, 56) ¼ 7.15, p < 0.05). The nature of this interaction is further clarified by the following analyses. In the outlier-correct condition, participants seeing constant names placed a significantly higher weight on the outlier forecast than those who saw changing names (F(1, 28) ¼ 5.95, p < 0.05). In the consensus-correct conditions, participants seeing constant names did not have a significantly different weight for the outlier forecast from those who saw changing names. In other words, participants in all conditions gave the same weight to the outlier in the first block, but by the fifth block those with the outlier correct weighted it more highly than those in the consensus-correct conditions. Those with consistently named correct outliers weighted them even more. Discussion On the first ten trials people behaved as they did in the one-shot situation examined in our preliminary study: their judgments approximated the median of the five advisors. This was true in all conditions. The strategy that relies primarily on the consensus to make the best estimate needs little reinforcement to continue. When the best advisor was part of the consensus, the default strategy of taking the median was appropriate throughout the task and seemed to be maintained in preference to any other strategies. However, when feedback indicated that the outliers were correct, this default policy was modified. This took some time: about 10 to 20 trials were needed for most people to learn to rely on outliers. Furthermore, Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

C. Harries et al.

Dissenting Opinion

343

Figure 2. Relative weight in different conditions across blocks of trials in the main study

learning to rely on the outlier was much quicker when that advisor was consistently named. As we mentioned above, this may have occurred for either of two reasons. There was cue redundancy only when the advice was consistently named: in that condition, good or poor advice could be identified by the name of its source as well as by whether it was outlying. It is also possible that in the changing names condition people were more unlikely to accept the outlier’s opinion as correct since it is even more unlikely that it will be correct in many different groups than in a single group.

GENERAL DISCUSSION The occurrence of an outlier opinion is common in both of situations that we have studied. In one-shot situations, decision-makers combine several pieces of advice from different advisors with whom they have no Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

344

Journal of Behavioral Decision Making

prior experience (e.g., asking directions in a foreign city or consulting several doctors about a rare medical condition). In iterated tasks, decision-makers combine several pieces of advice from advisors with whom they have had repeated experience. An example of this situation is forming an opinion about a new movie based on the reviews of familiar movie critics who write for daily newspapers. Here decision-makers can form an impression about the advisors over a period of time and learn to weight their opinions based on differential characteristics of performance such as historical accuracy (Harvey & Fischer, 1997; Harvey, Harries, & Fischer, 2000; Yaniv & Kleinberger, 2000). Our studies suggest that, in both one-shot and iterated judgment situations, decision-makers effectively discount extreme, outlier opinions, by using a default strategy that approximates to the median. Consensus-based strategies (e.g., outlier-discounting) seem natural for aggregation, easy to reinforce with feedback, and hard to eradicate. Indeed in a similar advice-combining situation (but without outliers), Budescu, Rantilla, Yu and Karelitz (2003) found that the median was the best-fitting model among the strategies that did not take into account information about advisors’ accuracy and the number of cues that they had available.

Justifications for discounting outlier opinions Brehmer (1980) noted that certain relationships between variables are easier to acquire than others, suggesting a hierarchy of hypotheses that people entertain when learning. Only if the simpler ones are refuted can more complex ones be entertained. This type of explanation suggests one possible motivation behind use of the median as a default strategy: it is easier to implement than others. But, in fact, there are other possible motivations for its use as the default strategy, such as people’s motivation to produce cognitive consonance and the possibility that they may also be motivated by socially strategic or statistically normative factors. The pattern of results in our studies allows us to assess the importance of these explanations. We shall consider them in turn. Considerations of strategic behavior in judgment may lead participants to favor trimming. For illustration, imagine the head of a committee who adopts a combination policy that ensures that members’ opinions are equally weighted. Under such a policy, an individual member might strategically influence the aggregate opinion by announcing an unduly extreme evaluation. Combination policies with provisions for conditionally trimming extreme opinions could attenuate strategic behavior of this sort. Indeed, median rank aggregation of opinions on multiple sources has been shown to be less prone to the influence of strategic voting than other trimmed mean procedures (Hurley & Lior, 2002). The risk of being trimmed could dissuade a judge from the temptation to exaggerate (Yaniv, 1997, 2004b). This problem has been recognized in certain sports settings, such as diving and gymnastics competitions. The use of median type strategy protects against a judge who develops a ‘‘liking’’ for a certain participant and thus, consciously or unconsciously, produces an exaggerated evaluation. The use of policies that down-weight those extreme judgments may dissuade judges from acting strategically and, at any rate, attenuate their influence more effectively than would an equal-weighting policy.2 However, although strategic concerns may be a justification for trimming or discounting in socially rich environments, there was no evidence that participants were acting strategically in our tasks or that they believed they were being strategically manipulated by the advisors. Furthermore, half the participants saw different advisors on each trial (name-changing condition) but still used the median. Although we

2 One of the authors noted this in a competition of floor gymnastics for junior school students where four or more judges evaluated the performance of each performer using a 100-point scale. Multiple evaluations are typically used since these judgments are inherently less reliable than, say, those of athletic performance. The evaluations are then converted into an overall score for a given exercise. The combination policy in the words of one gymnastics trainer was ‘‘to average just the two central judgments out of the four.’’ This, in fact, equals to taking the median of the four opinions. The rationale for this is presumably not to facilitate computations (the administrative desk at the competition is well equipped with calculators), but to down-weight extreme judgments.

Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

C. Harries et al.

Dissenting Opinion

345

can reject this as an explanation for the use of the median on our tasks, such socially manipulative strategies may be real in other situations. Another justification for a median strategy is that participants are following a statistical norm based on an understanding of the distributional properties of the judgment set. One could attribute participants’ behavior to a rational understanding of the distributional properties of the judgmental errors, such as the thickness of its tails and prevalence of outliers. Such normative considerations favor use of the median. However, such an argument might lead us to expect different strategies in the absence of outliers (presumably suggesting that the distribution has normal rather than heavy tails). Our results suggest that strategies were not based purely on statistical grounds: in our preliminary study, the median strategy was used both when outliers were present and when they were not;3 in the main study there was no significant switch to more accurate strategies in the consensus-correct conditions as learning occurred. The median strategy provides one way of dealing with the strategic and the statistical motivations for trimming outliers. However, as we have seen, these motivations do not appear to have been primary reasons for use of this strategy in our experiments. Instead, the most likely explanation for its use was its simplicity: it was easy to use because it involved merely sorting the advice and then picking a single piece of it. It has other advantages too. All opinions are taken into account rather easily without undue influence from apparent outliers or the apparent consensus or from the underlying but unknown structure of the variable. It coincides with human tendencies to rely on a generalized strategy of resolving inconsistencies and conflicts among input opinions by removing the dissonant data. It is not surprising that, for it to be overturned, considerable gains in accuracy must be produced by an alternative strategy.

Relation to research on minority dissenters in small-group decision making Our work using advice-taking tasks can be compared to that on small-group decision making. Research on small groups using social decision schemes (SDS) (see, e.g., Stasser, 1999; Parks & Kerr, 1999; Levine, 1999; DeDreu & DeVries, 2001) has examined the appropriateness of different hypothesized models for describing the ‘‘rules or functions, by which individual contributions are translated into group products’’ (Levine, 1999, p. 23). Rules based on combinatorial mathematics have been tested when the group output is a categorical judgment. Statistical decision schemes such as the mean and median have been tested when the group output is quantitative. We have used just such an approach here. However, advice taking by one judge differs systematically from small-group decision making on a number of counts. For example, Salvadori, van Swol and Sniezek (2001) found that judge–advisor systems led to less repetition of information held uniquely by participants, less equality of participation, and less consensus seeking compared to groups. All the same, our results on minority influence in advice taking do parallel those in small-group decision making. Using SDS techniques, small-group researchers have demonstrated that when the group task is intellective, correct minority opinions that are demonstrably true will prevail. However, in judgmental tasks, arguments are less easily made and the truth of any one opinion is not immediately obvious. In such cases, minority opinion will be overruled by that of the majority (see Smith, Tindale, & Anderson, 2001). The tasks in both our experiments were judgmental ones: the truth of an opinion was not immediately obvious and could not be demonstrated. In parallel to work on small-group decision making, our participants’ primary strategy was to discount minority opinions. Such ‘‘truth-wins’’ and ‘‘majority-wins’’ models, based on combinatorial mathematics, have been applied to situations in which the group output is a categorical decision or a choice (such as guilty or not guilty). Where the group output is a quantitative judgment (such as a compensation figure), Hinsz (1999) suggests that 3 When no outliers were present the median strategy was identical to the mean. It may be that participants calculated the mean in this situation but used the median when outliers were present. Parsimony would lead us to assume that they did not, and that the strategy of taking the median is not based on an understanding of the underlying distribution.

Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

346

Journal of Behavioral Decision Making

the role of arguments (backing one option rather than another) is less relevant and has found that the median is one of the best models of group-goal decision making. Again, our results parallel Hinsz’ results: our tasks were quantitative and the best-fitting model to describe primary advice taking was use of the median. Those working on small-group decision making have suggested explanations for these findings that are in some respects analogous to those that we outlined above in the context of our own results. They have interpreted their results primarily in terms of social motivations and the need for accuracy. Reliance on the majority in judgmental tasks has been interpreted as use of a ‘‘consensus implies correctness’’ heuristic (Henrich & Boyd, 1998; Boyd & Richerson, 2001; Kameda & Nakanishi, 2002; Gigerenzer, in press) whereas minority opinions are more influential in group decision making if a person is consistent in putting them forward (or the truth is demonstrative as might be the case in intellective tasks). Our results showed that consistently named outliers more readily influenced participants’ judgments than inconsistently named outliers when the outlying advice was correct. Our interpretation of this was in terms of cue redundancy and scenario plausibility.

Strengths and weaknesses of the current research Our studies appear to have been the first to explore the role of minority opinion in advice taking. The parallels with work on small-group decision making suggest that research on the relative role of minority opinion in intellective tasks and in non-quantitative judgments could be investigated as well in advice taking. The approach we used allowed us to examine specific hypotheses in advice taking, just as the use of social decision schemes has done in group decision making. Our methods were designed to examine, first, use of outlying opinions in one-shot advice-taking situations, and, second, ability to learn to use truthful minority opinions when taking advice over a number of trials. Will these results generalize outside the laboratory? We demonstrated that when given the goal of maximizing accuracy, participants can diverge from their default behavior and learn to rely on accurate outlying opinions when advisers are consistently named. They are slower to do so when advisers are inconsistently named. It may well be that participants in other advicetaking situations have goals other than accuracy (see Harvey & Fischer, 1997), or make use of other information about the advisors such as that concerned with trust and expertise (Sniezek & van Swol, 2001). Further research is needed to reveal how these factors interact with the presence of minority opinions. Conclusions Our findings imply that there is a hierarchy of rules for solving the problem of combining advice. The default (in one-shot situations) is to use a median-like strategy, a comparatively easy strategy that discounts the potentially over-influential outliers. In the presence of feedback, and with repeated experience of different sources of advice, people come to weight information according to its accuracy and can eventually learn to rely on discordant but correct advisors. Such results suggest that people’s common working assumption is that the consensus opinion is correct while that outlying opinion is better discounted. ACKNOWLEDGEMENTS The research was supported by Economic and Social Research Council grant R000236827 to Nigel Harvey, and by Israel Science Foundation grant 822 to Ilan Yaniv. REFERENCES Armstrong, J. S. (2001). Combining forecasts. In J. Scott Armstrong (Ed.), Principles of forecasting: A handbook for researchers and practitioners (pp. 417–439). Norwell, MA: Kluwer. Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

C. Harries et al.

Dissenting Opinion

347

Baumeister, R. F. (2001). Dissenting reviews. Dialogue, Official Newsletter of the Society for Personality and Social Psychology, March 25th, 15. Bochner, S., & Insko, C. A. (1966). Communicator discrepancy, source credibility, and opinion change. Journal of Personality and Social Psychology, 4, 614–621. Boyd, R., & Richerson, P. J. (2001). Norms and bounded rationality. In G. Gigerenzer, & R. Selten (Eds.), Bounded rationality: The cognitive toolbox (pp. 281–296), Cambridge, MA: MIT Press. Brehmer, B. (1974). Hypotheses about relations between scaled variables in the learning of probabilistic inference tasks. Organizational Behavior and Human Performance, 11, 1–27. Brehmer, B. (1980). In one word: not from experience. Acta Psychologica, 45, 223–241. Bruner, J. S., Goodnow, J. J., & Austin, G. A. (1956). A study of thinking. New York: Wiley. Brunswik, E. (1952). The conceptual framework of psychology. Chicago: Chicago University Press. Budescu, D. V., Rantilla, A. K., Yu, H.-T., & Karelitz, T. M. (2003). The effects of asymmetry among advisors on the aggregation of their opinions. Organizational Behavior and Human Decision Processes, 90, 178–194. De Dreu, C. K. W., & De Vries, N. K. (2001). Group consensus and minority influence: Implications for innovation. Oxford, UK: Blackwell Publishers. De Groot, M. H. (1986). Probability and statistics (2nd ed.). Reading, MA: Addison-Wesley. Deutsch, M., & Gerard, H. B. (1955). A study of normative and social influences upon individual judgment. Journal of Abnormal and Social Psychology, 51, 629–636. Festinger, L. (1957). A theory of cognitive dissonance. Stanford University Press: Stanford, CA. Gigerenzer, G. (in press). Bounded rationality: the study of smart heuristics. In D. J. Koehler, & N. Harvey (Eds.), Handbook of judgment and decision research. Oxford, UK: Blackwell Publishing. Harvey, N., & Fischer, I. (1997). Taking advice: accepting help, improving judgment and sharing responsibility. Organizational Behavior and Human Decision Processes, 70, 117–133. Harvey, N., Harries, C., & Fischer, I. (2000). Using advice and assessing its quality. Organizational Behavior and Human Decision Processes, 81, 252–273. Heider, F. (1958). The psychology of interpersonal relations. New York: Wiley. Henrich, J., & Boyd, R. (1998). The evolution of conformist transmission and the emergence of between-group differences. Evolution and Human Behavior, 19, 215–241. Hinsz, V. B. (1999). Group decision making with responses of a quantitative nature: the theory of social decision schemes for quantities. Organizational Behavior and Human Decision Processes, 80, 28–49. Hurley, W. J., & Lior, D. U. (2002). Combining expert judgment: on the performance of trimmed mean vote aggregation procedures in the presence of strategic voting. European Journal of Operational Research, 140, 142–147. Kameda, T., & Nakanishi, D. (2002). Cost-benefit analysis of social/cultural learning in a nonstationary uncertain environment: an evolutionary simulation and an experiment with human subjects. Evolution and Human Behavior, 23, 373–393. Levine, J. M. (1999). Transforming individuals into groups: some hallmarks of the SDS approach to small-group research. Organizational Behavior and Human Decision Processes, 80, 21–27. Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: the effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37, 2098–2109. Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166. Parks, C. D., & Kerr, N. L. (1999). Twenty-five years of social decision scheme theory. Organizational Behavior and Human Decision Processes, 80, 1–2. Penner, L. A., & Davis, J. H. (1969). Conformity and the ‘‘rational’’ use of unanimous majorities. Journal of Social Psychology, 78, 299–300. Savadori, L., Van Swol, L. M., & Sniezek, J. A. (2003). Information sampling and confidence within groups and judge advisor systems. Communication Research, 28, 737–771. Sherif, M. (1935). A study of some social factors in perception. Archives of Psychology, 27, No. 187. Smith, C. M., Tindale, R. S., & Anderson, E. M. (2001). The impact of shared representations on minority influence in freely interacting groups. In C. K. W. De Dreu, & N. K. De Vries (Eds.), Group consensus and minority influence: Implications for innovation (pp. 183–200). Oxford, UK: Blackwell Publishers. Sniezek, J. A., & Naylor, J. C. (1978), Cue measurement and functional hypothesis testing in cue probability learning. Organizational Behavior and Human Performance, 22, 367–374. Sniezek, J. A., & van Swol, L. M. (2001). Trust, confidence and expertise in a judge–advisor system. Organizational Behavior and Human Decision Processes, 84, 288–307.

Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

348

Journal of Behavioral Decision Making

Soll, J. B. (1999). Intuitive theories of information: beliefs about the value of redundancy. Cognitive Psychology, 38, 317–346. Stasser, G. (1999). A primer of social decision scheme theory: models of group influence, competitive model-testing, and prospective modeling. Organizational Behavior and Human Decision Processes, 80, 3–20. Streiner, D. L. (2000). Do you see what I mean? Indices of central tendency. Canadian Journal of Psychiatry, 45, 833– 836. Wilcox, R. R. (1992). Why can methods for comparing means have relatively low power, and what can you do to correct the problem? Current Directions in Psychological Science, 1, 101–105. Yaniv, I. (1997). Weighting and trimming: heuristics for aggregating judgments under uncertainty. Organizational Behavior and Human Decision Processes, 69, 237–249. Yaniv, I. (2004a). Receiving other people’s advice: influence and benefit. Organizational Behavior and Human Decision Processes, 93, 1–13. Yaniv, I. (2004b). The benefit of additional opinions. Current Directions in Psychological Science, 13, 75–78. Yaniv, I., & Kleinberger, E. (2000). Advice taking in decision making: egocentric discounting and reputation formation. Organizational Behavior and Human Decision Processes, 83, 260–281. Authors’ biographies: Clare Harries is a lecturer in the Department of Psychology at University College London. Her research interests include implicit and explicit processes in judgment and decision making, advice taking and giving, and strategic decision making. Nigel Harvey is Professor of Judgment and Decision Research at University College London. His main interests include judgmental forecasting, advice giving and taking, subjective probability, and learning processes in judgment and decision making. Ilan Yaniv teaches at the psychology department of the Hebrew University, and is a member of the Center for the Study of Rationality and also of the Swiss Centre for Conflict Research. His research interests involve social and cognitive issues in judgment, decision, and negotiations processes. Authors’ addresses: Clare Harries, Department of Psychology, University College London, Gower Street, London WC1E 6BT, UK. Nigel Harvey, Department of Psychology, University College London, Gower Street, London WC1E 6BT, UK. Ilan Yaniv, Department of Psychology, Hebrew University of Jerusalem, Mt. Scopus, Jerusalem 91905, Israel.

Copyright # 2004 John Wiley & Sons, Ltd.

Journal of Behavioral Decision Making, 17, 333–348 (2004)

Combining advice: the weight of a dissenting ... - Wiley Online Library

We present two studies that evaluate how people combine advice and how they ..... the phrase ''Your estimate of the maximum temperature'', a text box above.

145KB Sizes 6 Downloads 150 Views

Recommend Documents

Principles of periodontology - Wiley Online Library
genetic make-up, and modulated by the presence or ab- sence of ... can sense and destroy intruders. ..... observation that apparently healthy Japanese sub-.

XIIntention and the Self - Wiley Online Library
May 9, 2011 - The former result is a potential basis for a Butlerian circularity objection to. Lockean theories of personal identity. The latter result undercuts a prom- inent Lockean reply to 'the thinking animal' objection which has recently suppla

Five lessons of a dumbledore education - Wiley Online Library
new educational theories. The true beauty of her work rests in helping read- ers experience the everyday world in a new light and discuss common theories.

TARGETED ADVERTISING - Wiley Online Library
the characteristics of subscribers and raises advertisers' willingness to ... IN THIS PAPER I INVESTIGATE WHETHER MEDIA TARGETING can raise the value of.

PDF(3102K) - Wiley Online Library
Rutgers University. 1. Perceptual Knowledge. Imagine yourself sitting on your front porch, sipping your morning coffee and admiring the scene before you.