Identifying Perspectives in Text and Video Thesis Proposal Wei-Hao Lin Language Technologies Institute School of Computer Science Carnegie Mellon University December 24, 2006

Committee:

Alexander G. Hauptmann (chair) William W. Cohen Eric Xing Janyce Wiebe (University of Pittsburgh)

Abstract Polarizing discussions about social controversies (e.g., abortion) and regional conflicts (e.g., the Palestinian-Israeli conflict) take place commonly in broadcast news, newspapers, and blogs. To facilitate mutual understanding between people holding different beliefs, everyone needs to become more aware of different viewpoints toward an issue. Can computers learn the elusive concept of ”perspective” to highlight the different perspectives in everyday communication? We propose to study how different perspectives are reflected in text and video. First, given two document collections, how do computers determine if they are written from different perspectives? Second, how do computers identify the perspective from which an individual document is written? Third, can computers highlight key paragraphs and sentences that reflect strongly a particular perspective so that computer prediction about a document perspective can be justified? Finally, can computers go beyond text? Can computers learn to identify the perspective from which broadcast news programs are produced? We approach the above research questions in a statistical learning framework. Experimental results show that by comparing the divergence between the statistical distributions of two document collection, we can successfully determine whether two document collections are written from different perspectives (e.g., Democratic and Republican candidates in presidential debates). We show that statistical learning methods can identify the perspective from which a document is written with high accuracy. Based on these results we propose to test if the statistical regularities hold for a variety of document collections. We propose to develop a joint model of topic and perspective to automatically uncover the latent structure of contrasting perspectives on a topic. Finally, we plan to conduct annotation study on sentence-level perspectives so that the predictions from our model can be directly evaluated.

ii

Contents 1

Introduction 1.1 Outline of Thesis Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 6

2

Literature Review 2.1 Perspectives in Text . . . . . . . . . . . . . . . . . . . . 2.1.1 Computer Simulation of Belief Systems . . . . . 2.1.2 Subjectivity, Sentiment, and Discourse Analysis . 2.1.3 Text Categorization and Topic Models . . . . . . 2.2 Perspectives in Video . . . . . . . . . . . . . . . . . . .

6 6 6 7 8 8

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

3

Experimental Data 9 3.1 Text Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 Video Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4

Regularities of Different Perspectives in Text 4.1 Identifying Text Collections of Different Perspectives 4.2 Quality of Monte Carlo Estimates . . . . . . . . . . 4.3 Test of Different Perspectives . . . . . . . . . . . . . 4.4 Personal Writing Styles or Perspectives? . . . . . . . 4.5 Origins of the Differences . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

12 13 14 14 15 16

Regularities of Individual Perspectives in Text 5.1 Modeling Perspectives at the Document Level . 5.2 Identifying Perspectives at the Document Level 5.3 Latent Sentence Perspective Models . . . . . . 5.4 Identifying Perspectives at the Sentence Level .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

18 19 19 21 23

5

. . . .

. . . .

. . . .

6

Regularities of Different Perspectives in Video

23

7

Proposed Work 7.1 Explore Problem Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Evaluation of Sentence-Level Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Joint Model of Topic and Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

24 25 26 26

8

Schedule

27

A Gibbs Samplers for Modeling Individual Perspectives

iii

27

1

Introduction

We propose to investigate how different perspectives are reflected in text and video. By perspective we mean a point of view, for example, from the perspective of Democrats or Republicans. Specifically, we define “perspective” in this proposal as follows, Definition 1. Perspectives manifest themselves when two or more parties attach importance differently to aspects of a topic. A document in this proposal is defined in its broadest sense, including text and video stories. Political conflicts and social controversies involve two or more parties holding different beliefs. A conflict will not exist if two groups agree with each other. An issue will not be controversial if there are no two different views. For example, two presidential candidates, John Kerry and George W. Bush, gave the following answers to a question on abortion during the third presidential debate in 2004: (1)

Kerry: What is an article of faith for me is not something that I can legislate on somebody who doesn’t share that article of faith. I believe that choice is a woman’s choice. It’s between a woman, God and her doctor. And that’s why I support that.

(2)

Bush: I believe the ideal world is one in which every child is protected in law and welcomed to life. I understand there’s great differences on this issue of abortion, but I believe reasonable people can come together and put good law in place that will help reduce the number of abortions.

The above examples show that two candidates expressed two very different perspectives on abortion. One candidate takes a so-called “pro-choice” position that values a woman’s choice, while the other takes a “pro-life” position that values life of a unborn child. Perspectives do not automatically manifest themselves when any two documents are contrasted. Take the following sentences from Reuters newswire as examples. (3)

Gold output in the northeast China province of Heilongjiang rose 22.7 pct in 1986 from 1985’s level, the New China News Agency said.

(4)

Exco Chairman Richard Lacy told Reuters the acquisition was being made from Bank of New York Co Inc, which currently holds a 50.1 pct, and from RMJ partners who hold the remainder.

The above sentence pair does not exhibit as strongly opposing “perspectives” as those in the Kerry-Bush answers cited earlier. Rather, as the Reuters indexers did, people would label Example 3 as “GOLD” and Example 4 as “ACQuisition”, namely, as two topics, not two perspectives. Why do people perceive different perspectives in the pair of Example 1 and Example 2, but perceive no perspectives in the pair of Example 3 and Example 4? We propose to investigate the following question: Research Question 1. Can computer programs automatically determine if two document collections are written from two perspectives? There are four key elements in our definition of a perspective: topic, aspects, two or more parties, and importance. The GOLD-ACQ example does not exhibit perspective because, by out definition, two sentences are on two different topics, while the Kerry-Bush examples focus on the same topic. We illustrate how the Kerry-Bush examples (Example 1 and Example 2) fit our definition in Table 1. Answers to Research Question 1 is not only of great scientific interest to computational linguistics, but also will enable automatic discovery of contrasting perspectives. Political analysts regularly monitor the 1

Element Topic Aspects Two or more parties Importance

Abortion A woman’s choice vs. the life of a unborn child Kerry and Bush Response to the question

Table 1: How the Kerry-Bush examples fits our definition of a perspective

positions that foreign countries take on international and domestic issues. Media analysts frequently survey broadcast news, newspapers, and blogs for differing viewpoints. Contrary to costly human monitoring, computer programs can decide the existence of contrasting perspectives among huge collections of documents. In addition to discover document collections that contain opposing perspectives, analysts are interested in identifying documents that are written from a particular perspective. For example, in the context of the Palestinian-Israeli conflict: (5)

The inadvertent killing by Israeli forces of Palestinian civilians – usually in the course of shooting at Palestinian terrorists – is considered no different at the moral and ethical level than the deliberate targeting of Israeli civilians by Palestinian suicide bombers.

(6)

In the first weeks of the Intifada, for example, Palestinian public protests and civilian demonstrations were answered brutally by Israel, which killed tens of unarmed protesters.

Example 5 is written from the Israeli perspective; Example 6 is written from the Palestinian perspective. Analysts who follow the development of the Israeli-Palestinian conflict want not only to know that Example 5 and Example 6 are written from opposing perspectives, but also to look for more documents that are written from a particular perspective of interest. Peoples knowledgeable about the Israeli-Palestinian conflict can easily identify the perspective from which a document was written, but human reviewing, again, is costly when there are huge numbers of documents. Computer programs that can automatically identify the perspective from which a document is written will be a valuable tool for people analyzing text from different perspectives. We propose to study the following research question: Research Question 2. Can computers learn to identify the perspective from which a document is written? We can evaluate the effectiveness of such computer programs in the task of predicting the perspective of a document. The more effective computer programs, the higher classification accuracy they will achieve. When an issue is discussed from different perspectives, not every sentence strongly reflects the overall perspective an author holds. For example, the following sentences are written by one Palestinian and one Israeli, respectively: (7)

The Rhodes agreements of 1949 set them as the ceasefire lines between Israel and the Arab states.

(8)

The green line was drawn up at the Rhodes Armistice talks in 1948-49.

Example 7 and Example 8 introduce the background of the ceasefire line drawn in 1949, and no explicit perspectives are expressed. Analysts who sift through huge large collections of documents are interested in 2

not only quickly retrieving documents of a particular perspective of interest (i.e., Research Question 1), but also identifying which part of a document reflects strongly a perspective. We propose to study the following research question: Research Question 3. Can computers discriminate between sentences that strongly express a perspective and sentences that do not? Perspectives, however, are not fixed. Beliefs that were not widely held before may become popular later. Similarly, common consensus in political and social issues may break, resulting two new, opposing perspectives. We propose to study the following question: Research Question 4. Can computers detect the emergence and split of perspectives? So far we make an assumption about learning different perspectives: annotated data are available. By assuming each training document is labeled with a topic and a perspective, we can “train” our statistical models to distinguish different perspectives on a topic. Annotated data, however, are prohibitively expensive to obtain. We are thus interested in proceeding the previous research questions in a unsupervised fashion, and propose to study the following question: Research Question 5. Can computers automatically identify topics and their differing perspectives at the same time? We plan to develop a joint model of topic and perspective to uncover the latent structure of different perspectives on a topic simultaneously. Text is not the only medium that perspectives are regularly expressed. Video has been a popular medium of conveying subjective beliefs and values. For example, two news clips in Figure 1 from different countries portray the same news story about “Arafat’s funeral” from different perspectives.

(a) From a Arabic news channel (LBC)

(b) From a American news channel (NBC)

Figure 1: News clips from different countries on the same story, “Arafat’s funeral” (from TRECVID’05, see Section 3.2). Not every two news clips sharing similar characteristics will exhibit different perspectives. For example, two news clips in Figure 2 are both broadcast in Chinese, but viewers will not perceive different perspectives as strongly as those in Figure 1. To identify perspectives expressed in video, we propose to address the following research question: 3

(a) From a Chinese news channel (NTDTV)

(b) From a Chinese news channel (CCTV)

Figure 2: News clips in the same language, Chinese (from TRECVID’05, see Section 3.2).

Research Question 6. Can computers determine if two sets of broadcast news clips are produced from different perspectives? We focus on broadcast news video in this proposal. Broadcast television news has been the predominant way of understanding the world nowadays. News networks set their own agenda and can greatly shape how people perceive economical, political, and social issues. A recent poll conducted by University of Maryland and Knowledge Networks (Kull, 2003) shows that the number of misconceptions about the Iraq War respondents have vary significantly according to their main news source. 80% of respondents whose primary news source is FOX have one or more misconceptions, while among people whose primary source is CNN, 50% have misconceptions. Computer programs that can automatically identify the perspective from which broadcast news video is produced will facilitate mutual understanding between people of different cultures and beliefs. Such computer programs can highlight which part of news video reflecting strongly a perspective, and help news viewers become more aware of the bias of individual news networks. Furthermore, computer programs can point viewers to the news stories of opposing perspectives on the same issue from other news networks. News viewers are thus encouraged to consider controversial issues from broader and multiple viewpoints. Automatically identifying perspective in broadcast news video enables allow analysts to monitor broadcast news on a larger scale. Political analysts constantly watch broadcast TV news from multiple news networks to detect any policy shift and opinion change. Counter-terrorism agencies continuously monitor domestic and foreign broadcast news to identify possible terrorist activities (Popp et al., 2004). Automatic systems can alert analysts to perspective change and provide the evidence of from multiple modalities of video. One key question in identifying perspectives in video is how to represent video. Unlike text, there are no clear choices of discrete units in video. We adopt a bag of “concepts” in this proposal. Similar to a bagof-words representation, a video frame is represented as a bag of high-level visual concepts. For example, the video frame in Figure 4 is represented as a bag of concepts from LSCOM Annotations (see Section 3.2 4

for more details). The human annotations on video, however, are prohibitively expensive to obtain. Machine learning algorithms such as Support Vector Machine (SVM) have been shown to be effective in classifying many concepts (Naphade & Smith, 2004). Instead of relying on human annotations, we can develop video concept classifiers based on labels and video data (e.g., from TREC Video Track 2005 (Over et al., 2005)), and apply the classifiers to automatically generate a bag-of-concepts representation of a video shot. We propose to assess to what degree computers can identify different perspectives from automatically detected concepts instead of manual annotations. Research Question 7. Can computers still effectively identify different perspectives with machine learned concept detectors? We are interested in how the accuracy of video classifiers affects the performance of identifying perspectives in video. Finally, a multimedia document (e.g., broadcast news video) consists of multiple modalities, which can both contribute to expressing different perspectives. Figure 3 shows a broadcast news video that consists of both text and video. The textual modality includes closed captions, ASR transcripts (in Arabic), and the English translations of the transcripts. We propose to study the following question: Research Question 8. Will computers perform better in identifying perspectives of multimedia documents by jointly modeling textual and visual modalities? The text-based model captures topics and perspectives based on a bag-of-words representation, and the image-based model uses a bag-of-concepts representation. There are one perspective and one topic variable for each video shot, but there will be two different Multinomial distributions to generate text words and video concepts.

Figure 3: A broadcast news clip (from an Arabic channel in TRECVID’05, see Section 3.2) consists of text and video modalities.

5

1.1

Outline of Thesis Proposal

The rest of the proposal is organized as follows. We first review relevant literature on identifying perspectives in Section 2, including perspectives reflected in text and video. Experimental data are described in Section 3. We then present three pieces of supporting evidence to show why it is feasible to study perspectives in a statistical learning framework in Section 4, Section 5, and Section 6. Finally we present proposed work in Section 7. In Section 4 we first show that documents of different perspectives can be reliably distinguished from other kinds of text collections. To address Research Question 1, we develop statistical models for document collections, and measure the “distance” between the models of two document collections. We measure how well the model-based approach can distinguish document collections that are written from contrasting perspectives (e.g., Palestinian vs. Israeli) and document collections that are not written from different perspectives (e.g., different news topics). The preliminary results show that the model-based approach can successfully separate document collections of different perspectives from document collections under other conditions. In Section 5 we show individual perspectives also exhibit strong statistical regularities. We approach the task of identifying the perspective of a document develop (i.e, Research Question 2) as a classification problem. We evaluate how well statistical learning algorithms can learn a perspective reflected in word usage. The experimental results show that machine learning algorithms perform very well, and can predict the perspective from which a document is written with high accuracy. We show in Section 6 that video documents, which are represented as a bag of concepts, also exhibit statistical regularities similar to text documents of different perspectives. The regularities based on Kullback-Leibler divergence consistently are shown in text and video, and pave the way for proposed work in Section 7.

2

Literature Review

2.1 2.1.1

Perspectives in Text Computer Simulation of Belief Systems

Perspectives are manifested when authors hold different beliefs, and research on modeling belief systems is highly relevant to our work. Abelson and Carroll pioneered simulating the beliefs system of individuals in computers (Abelson & Carroll, 1965). The simulation system, as known as the Goldwater machine, represented the beliefs of a right-wing politician on foreign policies in the Cold War as a set of English sentences consisting subject followed by verb and object, for example, “Cuba subverts Latin America.” Abelson (1973) later extended the simple sentence-based representation to a hierarchical representation. The extended representation, closely following the Schank and Abelson (1977)’s framework of knowledge presentation, distinguished between actions and purposes of an actor, captured a sequence of actions for a purpose, and modeled interactions between multiple actors. Carbonell (1978) proposed POLITICS, a simulation system that can interpret a political event in text from two conflicting ideologies, e.g., conservative and liberal (Carbonell, 1979). POLITICS focused on understanding the goals of actors, and a new structure, goal trees, was developed to perform “counter-planning”, that is, to thwart other actors in achieving their goals. Research questions raised in Section 1, however, have not been fully addressed in previous work. Computer simulation in previous work was not an end, but a means of making assumptions about human belief

6

systems explicit. Therefore, early computer simulation programs could neither determine if two text documents expressed conflicting views nor predict the belief that the author of a document holds. Beliefs in previous work were manually collected and translated into computer-readable forms, which is very different from our goal of automatically learning perspectives from a collection of documents. Previous work takes a top-down approach to modeling beliefs, while our approach in this proposal is bottom-up. Manually-constructed knowledge base has been known to suffer from “acquisition bottleneck” (Buchanan et al., 1983), and is difficult to transfer to new domains. Learning one’s attitude toward an issue directly from written or spoken documents was considered to be impossible in previous work. Abelson and Carroll expressed a very pessimistic view on the possibility of learning beliefs from text without putting into any prior knowledge: The simulation of the belief systems of other individuals [other than Goldwater] with very different views is also being contemplated, but this step cannot be undertaken lightly since the paraphrasing procedure [one method of manually representing beliefs in computers (see Abelson & Carroll, 1965, pg. 29)] extremely difficult. One might suppose that fully automatic content analysis methods could be applied to the writings and speeches of public figures, but there is an annoying technical problem which renders this possibility a vain hope. We do not subscribe to this view. Instead, we believe that statistical modeling allows perspectives to be learned from training documents without human supervision. Part of our goal in this proposal is to show to what degree can statistical learning learn perspectives automatically. 2.1.2

Subjectivity, Sentiment, and Discourse Analysis

Recently there has been increasing interest in subjectivity and sentiment analysis. Subjective language is used to express opinions, emotions, and sentiments. There has been studies on learning subjective language (Wiebe et al., 2004; Riloff et al., 2003; Riloff & Wiebe, 2003), identifying opinionated documents (Yu & Hatzivassiloglou, 2003) and sentences (Yu & Hatzivassiloglou, 2003; Riloff et al., 2003; Riloff & Wiebe, 2003), and discriminating between positive and negative language (Yu & Hatzivassiloglou, 2003; Turney & Littman, 2003; Pang et al., 2002; Dave et al., 2003; Nasukawa & Yi, 2003; Morinaga et al., 2002). By its very nature we expect much of the language for presenting a perspective or point-of-view to be subjective and opinionated. Labeling a document or a sentence as subjective, however, does not solve all the problem of identifying perspectives. Given two document collections, we can apply subjectivity classifiers to obtain how many subjective sentences and documents are there. The proportion of subjective sentences or documents, however, does not help answer if A and B contain opposing perspectives (i.e., Research Question 1). As we will show later in Section 3.1, there are a large proportion of subjective sentences in documents of contrasting perspectives, but the proportionality by itself is not indicative if two document collections are written from the same or different perspectives. Research on the automatic classification of movie or product reviews as positive or negative (Turney & Littman, 2003; Nasukawa & Yi, 2003; Mullen & Collier, 2004; Beineke et al., 2004; Pang & Lee, 2004; Morinaga et al., 2002; Hu & Liu, 2004) is similar to identifying individual perspectives (i.e., Research Question 2). By our definition of a perspective, sentiment expressed in movie or product reviews is one kind of perspectives. Reviews use sentimental language to assign subjective importance on various aspects of a movie (e.g., actors, plots, lighting, etc) or a product. In the domains other than movie or products reviews, importance of different aspects is, however, assigned in a manner more complex than plain sentiments. It is not clear if how importance is assigned when people express perspectives in social and political issues can be as clear as sentiments in movie or product reviews. 7

How different perspectives are expressed in political discourse has been studied in the field of discourse analysis (van Dijk, 1988; Pan et al., 1999; Fang, 2001; Geis, 1987; Wilson, 1990). Although their research have goals similar to ours, they do not take a computational approach to analyze large collections of the documents. To the best of our knowledge, our approach to automatically identify perspectives in discourse is unique. 2.1.3

Text Categorization and Topic Models

From comprehensive comparison between competing classifiers (Yang & Liu, 1999), to feature selection (Yang & Pedersen, 1997), to new algorithms (Joachims, 1998; McCallum & Nigam, 1998), to utilization of unlabeled data (Nigam et al., 2000), to new kinds of text documents (Klimt & Yang, 2004; Lewis et al., 2004), the problem of text categorization has been extensively studied (also see a survey paper (Sebastiani, 2002)) and has shown that text documents can be classified into pre-defined categories with high accuracy. We borrow many techniques and methodology from text categorization. The most popular and successful choice of representation of text is a bag-of-words representation. Each document is represented as a vector with each coordinate being the count of a term within the document, i.e., term frequency (TF), and the inverted count of a term appearing in multiple documents, i.e., inverted document frequency (IDF). The bag-of-words representation ignores word order and does not utilize rich information in syntax and semantics, and makes strong assumptions that words are independent from each other, which is not true in natural languages. However, the bag-of-words representation has been shown very effective in many natural language process tasks, including text categorization (Sebastiani, 2002) and information retrieval (Lewis, 1998). One can regard two contrasting perspectives as two categories, and approach the task of identifying the perspective from which a text document was written (i.e., Research Question 2) as text categorization. However, so far text categorization focus on “topical” documents (e.g., news topics in the Reuters corpus), and it is not clear how successful this approach will be for “perspective” documents. Research on topics models (Blei et al., 2003; Griffiths & Steyvers, 2004; Rosen-Zvi et al., 2004; McCallum et al., 2004) (also see a survey paper (Steyvers & Griffiths, In Press)) show promising results on recovering the latent structure in topical documents. They provide solid foundation for us to further investigate the interaction between topics and perspectives. Recent work in evolution of topics (Xing, 2005; Blei & Lafferty, 2006) is of particular interest, and we plan to extend them to model the temporal aspect of perspectives (i.e., Research Question 4).

2.2

Perspectives in Video

Video has been a popular medium of expressing social and political perspectives. There have been works in multimedia that make viewers more aware of the perspectives in video. Minions (Ireson, 2004) is an interactive art installation that confronts visitors with video from two religious perspectives, Christianity and Islam. VOX POPULI (Bocconi & Nack, 2004) is a computer system that can make a documentary from a pool of interview clips based on viewer’s position on a issue, e.g., “Iraq War.” Besides the art installation and video generation work, very few work in the field of multimedia studies the problem of identifying different perspectives in video. The perspective of a video in previous works is either assumed to be known or manually labeled. Manual annotation makes it almost impossible to analyze large number of videos. Instead we are interested in developing automatic methods of identifying the perspectives of videos.

8

Miyamori et al. (2005) present a system that can summarize TV sports programs based on the “perspective” of a viewer expressed in on-line chat rooms. On the surface (Miyamori et al., 2005) is very similar to VOX POPULI, and the only difference are types of videos. However, there is a major difference between two video summarization works. We contrast two studies with our proposal in Table 2.

Topic Data Aspects Importance Output

Miyamori et al. (2005) Sports Fixed TV programs Video shots Inferred from chat-room logs Video summaries

Bocconi and Nack (2004) War Interview Clips Video summaries

Our Proposal ? ? ? ? Video segments

Table 2: Comparison between (Miyamori et al., 2005; Bocconi & Nack, 2004) and our proposal Under our definition of perspective, because the perspectives of interview clips are already labeled, VOX POPULI need not infer the perspective of a video. On the contrary, (Miyamori et al., 2005) need to infer the perspectives of users and generate summaries accordingly. Miyamori et al. (2005) infers how users attach importance on video shots of TV sports program from chat-room logs. In this proposal we are interested in a more challenging problem. Given two sets of video segments, how do we know if they contain different perspectives? We neither know what they talk about (i.e., topic) nor have access to complete raw footage they are chosen from. Furthermore, how importance are attached are hidden and need to be inferred from produced videos. There have been works linking stories on the same topic across news sources (Zhang et al., 2004; Zhai & Shah, 2005) , and this will a necessary component in our system. Visual similarity between two news stories are shown to be of moderate help, and text similarity (from closed captions or ASR transcripts) contribute much more to the success of linking stories in broadcast news (Zhai & Shah, 2005).

3

Experimental Data

3.1

Text Data

We prepared two corpora consisting of documents that were written or spoken from contrasting perspectives. The first corpus, bitterlemons, contains written documents that were written from an Israeli or a Palestinian perspective. The second corpus, 2004 Presidential Debates, consists of spoken documents that were spoken by Kerry or Bush in the 2004 Presidential Debates. To test how well our proposed methods can distinguish document collections of contrasting perspectives from documents of no perspectives, we need to a corpus that is commonly regarded as different in any way but “perspective.” We focus on a particular difference, topicality, and choose a corpus, Reuters-21578, that contains news stories in different topics. The bitterlemons corpus consists of the articles published on the website http://bitterlemons. org/. The website is set up to “contribute to mutual understanding [between Palestinians and Israelis] through the open exchange of ideas.”1 Every week an issue about the Israeli-Palestinian conflict is selected for discussion (e.g., “Disengagement: unilateral or coordinated?”), and a Palestinian editor and an Israeli editor each contribute one article addressing the issue. In addition, the Israeli and Palestinian editors invite 1

http://www.bitterlemons.org/about/about.html

9

one Israeli and one Palestinian to express their views on the issue (sometimes in the form of an interview), resulting in a total of four articles in a weekly edition. We collected a total of 594 articles published on the website from late 2001 to early 2005. We choose the bitterlemons website for two reasons. First, each article is already labeled as either Palestinian or Israeli by the editors. Second, the bitterlemons corpus enables us to test the generalizability of the proposed methods in a very realistic setting: training on articles written by a small number of writers (two editors) and testing on articles from a much larger group of writers (more than 200 different guests). We removed metadata from all articles, including edition numbers, publication dates, topics, titles, author name and biography. We used OpenNLP Tools2 to automatically decide sentence boundaries, and reduced word variants using the Porter stemming algorithm (Porter, 1980). To test if the ratio of subjective sentences to objective sentences will help distinguish one perspective from the other, as mentioned in Section 2.1.2, we estimated the subjectivity of each sentence using a automatic subjective sentence classifier (Riloff & Wiebe, 2003). We found that 65.6% of Palestinian sentences and 66.2% of Israeli sentences were classified as subjective. The almost equivalent percentages of subjective sentences in the two perspectives support that a perspective is largely expressed in subjective language, but that the amount of subjective sentences in a document is not necessarily indicative of its perspective. One perspective is not necessarily more subjective than the other perspective. The 2004 Presidential Debates corpus consists of the spoken transcripts of three Bush-Kerry debates in 2004. The transcripts are from the Commission on Presidential Debates3 . We segmented the transcripts according to the speaker tags in the transcripts. Each spoken document was either an answer to a question or a rebuttal. The words from moderators were discarded. The Reuters-21578 corpus4 is newswire from Reuters in 1987. Reuters-21578 is one of the most common testbeds for (topical) text categorization. Each document is classified into none, one, or more of the 135 categories (e.g., “Mergers/Acquisitions” and “U.S. Dollars”). The number of documents in each category is not evenly distributed (median 9.0, mean 105.9). To perform reliable statistical estimation, we consider only the seven most frequent categories (more than 500 documents) in our experiments: ACQ, CRUDE, EARN, GRAIN, INTEREST, MONEY-FX, and TRADE (in the Reuters codes). The number of documents, average document length, and vocabulary size of three text corpora are summarized in Table 3.

3.2

Video Data

Our video corpus is from the development set of 2005 TREC Video Evaluation (TRECVID) (Over et al., 2005). Similar to text version of TREC, TRECVID provides large video collections and queries for researchers to evaluate their systems on shot detection, high-level feature extraction, and video retrieval. The TRECVID’05 video collection is comprised of broadcast news programs recorded in late 2004 in three languages: Arabic, Chinese, and English. Every video is an one-hour or half-hour news program. The number of videos in each language is in Table 4. Echo video is pre-divided into segments, roughly corresponding to a news story. How do we represent video in a format that computers can manipulate them? While the natural choice of basic units for text documents is word, it is not clear how to represent video, a continuous stream of imagery. This is the fundamental question of image or video indexing (Rasmussen, 1997). We adopt a 2

http://sourceforge.net/projects/opennlp/ http://www.debates.org/pages/debtrans.html 4 http://www.ics.uci.edu/∼kdd/databases/reuters21578/reuters21578.html 3

10

Corpus

Bitterlemons

2004 Presidential Debates

Reuters-21578

Subset Palestinian Israeli Palestinian Editor Palestinian Guest Israel Editor Israel Guest Kerry Bush 1st Kerry 1st Bush 2nd Kerry 2nd Bush 3rd Kerry 3rd Bush ACQ CRUDE EARN GRAIN INTEREST MONEY-FX TRADE

|D| 290 303 144 146 152 151 178 176 33 41 73 75 72 60 2448 634 3987 628 513 801 551

¯ |d| 748.7 822.4 636.2 859.6 819.4 825.5 124.7 107.8 216.3 155.3 103.8 89.0 104.0 98.8 124.7 214.7 81.0 183.0 176.3 197.9 255.3

V 10309 11668 6294 8661 8512 8812 2554 2393 1274 1195 1472 1333 1408 1281 14293 9009 12430 8236 6056 8162 8175

¯ , and vocabulary size V of three text Table 3: The number of documents |D|, average document length |d| corpora.

Language Arabic Chinese English

Duration 33 52 73

Channels LBC CCTV, NTDTV CNN, NBC, MSNBC

Table 4: The channels and the duration of broadcast news video (in hours) in each language in the TRECVID’05 video archive.

11

concept-based representation that represents an image as a set of “concepts.” Examples of concept-based indexing are shown in Figure 4.

(a) Vehicle, Armed Person, Sky, Outdoor, Desert , Ar- (b) Walking, Hill, Male Person, Civilian Person, Standmored Vehicles, Daytime Outdoor, Machine Guns, Tanks, ing, Standing , Logos Full Screen, Adult, Adult, WalkWeapons, Ground Vehicles ing Running, Sky, Animal, Person , Outdoor, Clouds, Daytime Outdoor, Flags, Group, Powerplants, Suits

Figure 4: The key frames of two shots from TRECVID’05 and their LSCOM annotations. Large-Scale Concept Ontology for Multimedia (LSCOM) is a collection of concepts that are designed for supporting concept-based video retrieval (Hauptmann, 2004). Each segment in TRECVID’05 is comprised of video shots, and one of the video frames in a shot is chosen as a key frame. Every shot is manually checked if one of 856 concepts (the complete list can be found at http://www.ee.columbia.edu/ ln/dvmm/lscom/ConceptInfo.html) is present on the screen. Suppose the key frame of a video shot is the left image in Figure 4. The video will be represented as a 856-dimensional vector, and the corresponding component of the vector is one if a concept is present, and zero otherwise. We represent an image as a bag of concepts.

4

Regularities of Different Perspectives in Text

We take a model-based approach to show there exist statistical regularities that document collections of different perspectives can be automatically determined. A document is represented as a point in a V -dimensional space, where V is vocabulary size. Each coordinate is the frequency of the word within the document, that is, term frequency. We assume that a collection of N documents, y1 , y2 , . . . , yN are generated by the following sampling process, θ ∼ Dirichlet(α) yi ∼ Multinomial(ni , θ). We first sample a V -dimensional vector θ from a Dirichlet prior distribution with a hyper-parameter α, and then sample a document yi repeated from a Multinomial distribution conditioned on the parameter θ, where ni is the document length of the ith document in the collection and assumed to be known and fixed.

12

We update our knowledge about θ with the information in the documents by Bayes’ Theorem, p(θ|A) =

p(A|θ)p(θ) p(A)

= Dirichlet(θ|α +

X

yi ).

yi ∈A

The posterior distribution p(θ|·) is again a Dirichlet distribution, because a Dirichlet distribution is a conjugate prior for a Multinomial distribution. We are interested in comparing θ of two document collections A and B, but they are not directly observable. How can we measure the difference between two posterior distributions p(θ|A) and p(θ|B)? One way to measure the difference between distributions is Kullback-Leibler (KL) divergence (Kullback & Leibler, 1951), defined as follows, Z p(θ|A) D(p(θ|A)||p(θ|B)) = p(θ|A) log dθ. (9) p(θ|B) Directly calculating KL divergence according to (9) involves high-dimensional integral and is difficult. Alternatively we approximate the value of KL divergence using Monte Carlo methods as follows, P 1. Sample θ1 , θ2 , . . . , θM from Dirichlet(θ|α + yi ∈A yi ). ˆ = 2. Return D

1 M

p(θi |A) i=1 log p(θi |B)

PM

as a Monte Carlo estimate of D(p(θ|A)||p(θ|B)).

Algorithms of sampling from a Dirichlet distribution can be found in (Ripley, 1987). As M → ∞, the Monte Carlo estimate will converge to true KL divergence by the Law of Large Numbers.

4.1

Identifying Text Collections of Different Perspectives

A test of different perspectives is acute only when it can draw distinctions between document collection pairs of different perspectives and document collection pairs of the same perspective and others. We evaluate the proposed test of different perspectives and apply it to the following four types of document collection pairs (A, B): Different Perspectives (DP) A and B are written from different perspectives. For example, A is written from the Palestinian perspective and B is written from the Israeli perspective. Same Perspective (SP) A and B are written from the same perspective. For example, A and B consist of the words spoken by Kerry. Different Topics (DT) A and B are written on different topics. For example, A is about acquisition and B is about crude oil. Same Topic (ST) A and B are written on the same topic. For example, A and B are both about earnings. The effectiveness of the proposed test of different perspectives can be measured by how the distribution divergence of DP document collection pairs is separated from the distribution divergence of SP, DT, and ST document collection pairs. The less the overlap, the more acute the test of different perspectives. To account for the considerable variation in the number of words and vocabulary size across corpora (see Table 3), we normalize the total number of words in a document collection to be the same K, and consider 13

only the top C% frequent words in the document collection pair in the estimation of KL divergence. We vary the value of K and C, and find that the K change the absolute scale of KL divergence but do not affect the rankings of four conditions. Rankings among four conditions is consistent when C is small. We only report results of K = 1000, C = 10 in the proposal due to space limit. No stemming algorithms is performed and no stopwords are removed, but case is ignored in the indexing process. There are two kinds of variance in the estimation of divergence between two posterior distributions and should be carefully checked. The first kind of variance is attributed to Monte Carlo methods. We assess the Monte Carlo variance by calculating a 100α percent confidence interval as follows, ˆ ˆ α σ ˆ ˆ − Φ−1 ( α ) √σ [D , D + Φ−1 (1 − ) √ ], 2 2 M M where σ ˆ 2 is the sample variance of θ1 , θ2 , . . . , θM , and Φ(·)−1 is the inverse standard normal cumulative density function. The second kind of variance is due to the intrinsic uncertainties of data generating processes. We assess the second kind of variance by repeating each document collection pair with 1000 bootstrapped samples, i.e., sampling with replacement.

4.2

Quality of Monte Carlo Estimates

The Monte Carlo estimates of the KL divergence from several document collection pairs are listed in Table 5. We can see that the 95% confidence intervals capture well the Monte Carlo estimates of KL divergence. Other document collection pairs show similar tight confidence intervals – in fact, arbitrarily precision can be obtained by increasing the number of samples, M – and thus a complete list of the results is omitted. Because the Monte Carlo estimates are close to true values, we assume them to be exact and do not report the confidence intervals for the rest of the proposal. A ACQ Palestinian Palestinian Israeli Kerry ACQ

B ACQ Palestinian Israeli Palestinian Bush EARN

ˆ D 2.76 3.00 27.11 28.44 58.93 615.75

95% CI [2.62, 2.89] [3.54, 3.85] [26.64, 27.58] [27.97, 28.91] [58.22, 59.64] [610.85, 620.65]

ˆ and 95% confidence interval (CI) of the Kullback-Leibler divergence Table 5: The Monte Carlo estimate D of some document collection pairs (A, B) with the number of Monte Carlo samples M = 1000. Note that KL divergence is not symmetric. For example, the value of KL divergence of the pair (Israeli, Palestinian) is not necessarily the same as (Palestinian, Israeli). KL divergence is guaranteed to be greater than zero (Cover & Thomas, 1991) and is equal to zero only when document collections A and B are exactly the same. (ACQ, ACQ) is close to but not exactly zero because they are different samples of documents in the ACQ category.

4.3

Test of Different Perspectives

Now we present the main result of using distribution divergence to test if two document collections are written or spoken from difference perspectives. We calculate the KL divergence between posterior distributions of document collection pairs in four conditions using Monte Carlo methods, and plot the results in Figure 5. 14

500 100 200 50 20 10 1

2

5

KL Divergence

ST

SP

DP

DT

Figure 5: The values of KL divergence of the document collection pairs in four conditions: Different Perspectives (DP), Same Perspective (SP), Different Topics (DT), and Same Topic (ST). Note that the y axis is in log scale. The horizontal lines are drawn at the points with equivalent densities (based on Kernel Density Estimation, see (Lin & Hauptmann, 2006) for more details).

The test of different perspectives based on statistical distribution divergence is shown to be very acute. The KL divergence of the document collection pairs in the DP condition fall mostly in the middle range, and is well separated from the high KL divergence of the pairs in DT condition and from the low KL divergence of the pairs in SP and ST conditions. By simply calculating the KL divergence of a document collection pair, we can predict that they were written from different perspectives if the value of KL divergence falls in the middle range, from different topics if the value is very large, from the same topic or perspective if the value is very small.

4.4

Personal Writing Styles or Perspectives?

One may suspect that the mid-range distribution divergence is attributed to personal speaking or writing styles and has nothing to do with different perspectives. The doubt is reasonable since half of the bitterlemons corpus was written by one Palestinian editor and one Israeli editor (see Table 3), and the debate transcripts were from only two candidates. We test the theory and show that mid-range distribution divergence indeed is attributed to different perspectives by providing a counterexample, the document collection pair (Israeli Guest, Palestinian Guest) in the Different Perspectives condition. There are more than 200 different authors in each Israeli Guest or Palestinian Guest collection. How the distribution divergence of the pair is distributed cannot be attributed to writing styles but mostly to different perspectives. We compare the distribution divergence of the pair (Israeli Guest, Palestinian Guest) with others in Figure 6. The results show that the distribution divergence of the (Israeli Guest, Palestinian Guest) pair, as other pairs in the DP condition, falls in the middle range, and is well separated from SP and ST in the low range and DT in the high range. Therefore we become more confident that the test of different perspectives based on distribution divergence indeed captures different perspectives, not personal writing or speaking styles. 15

500 100 200 50 20 10 1

2

5

KL Divergence

ST

SP

DP

Guest

DT

Figure 6: The average KL divergence of document collection pairs in the bitterlemons Guest subset (Israeli Guest vs. Palestinian Guest), ST, SP, DP, DT conditions. The horizontal lines are the same ones estimated in Figure 5.

4.5

Origins of the Differences

The effectiveness of the test of different perspectives is clearly demonstrated in Figure 5, but one may wonder why the distribution divergence of the document collection pair with different perspectives falls in the middle range and what causes the large and small divergence of the document collection pairs with different topics (DT) and the same topic (ST) or perspective (SP), respectively. We answer the question by taking a closer look at the causes of the distribution divergence in our model. We compare the expected marginal difference of θ between two posterior distributions p(θ|A) and p(θ|B). The marginal distribution of the i-th coordinate of θ, that is, the i-th word in the vocabulary, is a Beta distribution, and thus the expected value can be easily obtained. We plot the ∆θ = E[θi |A] − E[θi |B] against E[θi |A] for each condition in Figure 7. How ∆θ is deviated from zero not only reaffirm the unique statistical regularities of document collections of different perspectives, but also explains the origin of distribution divergence in other conditions. In Figure 7d we can see that the ∆θ increases with the value of θ, and the deviance from zero is much greater than those in the Same Perspective (Figure 7b) and Same Topic (Figure 7a) conditions. The large ∆θ not only accounts for large distribution divergence of the document pairs in DT conditions, but also marks a distinct word distribution of document collection of different topics: a word that is frequent in one topic is less likely to be frequent in the other topic, and make readers perceive the topic of documents. At the other extreme, document collection pairs of the Same Perspective or Same Topic show very little difference in θ, which matches our intuition that documents of the same perspective or the same topic read similarly. The manner in which ∆θ is varied with the value of θ in the Different Perspective condition is unique. The ∆θ in Figure 7c is not as small as those in the SP and ST conditions, but at the same time not as large as those in DT conditions, resulting in mid-range distribution divergence in Figure 5. Why do document collections of different perspectives distribute this way? Since documents of different perspectives focus on a closely related issue (e.g., the Palestinian-Israeli conflict in the bitterlemons corpus, or the political 16

0.04 ●

●●









● ● ● ●● ● ● ● ● ● ● ●● ● ● ●●● ● ● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●●●● ●●●●● ● ●



●●



● ●



−0.04

−0.04 0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.02

0.02

0.04

(b) Same Perspective (SP)

0.04

(a) Same Topic (ST)

● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●



● ●

● ●







● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ● ● ●● ●●● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●





● ●





−0.04

−0.04

−0.02



0.00

● ● ●

−0.02

0.00



−0.02



0.00

0.02

0.04 0.02 0.00

● ●●

−0.02

● ● ● ● ● ● ●● ●● ● ●●● ● ● ● ●● ● ●● ●● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.00

0.01

0.02

0.03

0.04

0.05

0.06

(c) Different Perspective (DP)



● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ●● ● ●●● ●● ●●●● ●







● ●

●● ●







d D e en Top cs DT

Figure 7: The ∆θ v s θ plots of the typical document collection pairs in four conditions The horizontal line is ∆θ = 0

17

and economical issues in the 2004 Presidential Debates corpus), they are expected to share a common vocabulary, but the different perspectives manifest themselves in subtle but not the same word emphasis. We list the top frequent words and their expected Multinomial θ from Palestinian and Israeli documents in Table 6. We can see the vocabulary from two perspectives highly overlaps with subtle difference in θ. E[θ|Palestinian] palestinian (0.0394) israel (0.0372) state (0.0095) politics (0.0077) peace (0.0071) international (0.0066) people (0.0060) settle (0.0057) occupation (0.0055) sharon (0.0055) right (0.0054) govern (0.0049) two (0.0047) secure (0.0044) end (0.0042) conflict (0.0042) process (0.0042) side (0.0038) negotiate (0.0038)

E[θ|Israeli] israel (0.0341) palestinian (0.0255) state (0.0089) settle (0.0072) sharon (0.0071) peace (0.0064) arafat (0.0059) arab (0.0057) politics (0.0051) two (0.0050) process (0.0044) secure (0.0043) conflict (0.0039) lead (0.0039) america (0.0035) agree (0.0034) right (0.0034) gaza (0.0034) govern (0.0033)

Table 6: The statistical regularities of perspectives in text is highly overlapping vocabulary of subtle differences in frequencies.

5

Regularities of Individual Perspectives in Text

In addition to regularities in different perspectives shown in Section 4, in this section we show individual perspectives exhibit strong statistical regularities. We develop algorithms of learning perspectives in a statistical framework. Let us denote a training corpus as a set of documents Wn and their perspectives labels Dn , n = 1, . . . , N , N is the total number of ˜ with a unknown document perspective D, ˜ identifying the documents in a corpus. Given a new document W perspective is to calculate the following conditional probability, ˜ W ˜ , {Dn , Wn }N P (D| n=1 ).

(10)

In addition to the perspective at the document level, we are interested in how strongly each sentence in the document conveys perspective. Let us denote the intensity of the m-th sentence of the n-th document as a binary random variable Sm,n , m = 1, . . . , Mn , Mn is the total number of sentences in the n-th document. Evaluating how strongly a sentence conveys a particular perspective is to calculate the following conditional probability, P (Sm,n |{Dn , Wn }N (11) n=1 ). 18

5.1

Modeling Perspectives at the Document Level

We model the process of generating documents from a particular perspective as follows, π ∼ Beta(απ , βπ ) θ ∼ Dirichlet(αθ ) Dn ∼ Binomial(1, π) Wn ∼ Multinomial(Ln , θd ).

Firstly the parameters π and θ are sampled once from prior distributions for the whole corpus. We choose Beta and Dirichlet distributions because they are conjugate priors for binomial and multinomial distributions, respectively. We set the hyperparameters απ , βπ , and αθ to one, i.e., non-informative priors. A document perspective Dn is then sampled from a binomial distribution with the parameter π. The value of Dn is either d0 (Israeli) or d1 (Palestinian). Words are then sampled from a multinomial distribution, where Ln is the length of the document. The graphical model representation of the model is shown in Figure 8.

Dn

Wn N

π

θ

Figure 8: Na¨ıve Bayes Models The above model is commonly known as known as na¨ıve Bayes models (NB), which is oversimplified and ignores rich syntactic and semantic structures in a document. The NB model is a building block for the model described in the next section that incorporates sentence-level perspectives. To predict the perspective of a unseen document using na¨ıve Bayes models , we calculate the posterior ˜ in (10) by integrating out the parameters. distribution of D Z Z ˜ π, θ|{(Dn , Wn )}N , W ˜ )dπdθ P (D, (12) n=1 The above integral is, however, difficult, and alternatively we use Markov Chain Monte Carlo (MCMC) methods to obtain samples from the posterior distribution. Details about MCMC sampling can be found in Appendix A.

5.2

Identifying Perspectives at the Document Level

To evaluate how well the statistical models learn to identify perspectives expressed at the document level, we train the models against on the bitterlemons corpus and calculate how accurately a model assigns the 19

perspective of a document in a ten-fold cross-validation manner. The average classification accuracy over 10 folds is reported, i.e., the number of documents whose perspectives are correctly predicted divided by total number of documents in the testing set. The accuracy of baseline, which randomly assigns the perspective of a document into Palestinian or Israeli, is 0.5, because there are equivalent numbers of documents from two perspectives (see Table 3). We evaluate three different models on the task of identifying perspective at the document level, including na¨ıve Bayes models (NB) with two different inference methods and Support Vector Machines (SVM). NB-B uses full Bayesian inference and NB-M uses Maximum a posteriori (MAP). SVM is a popular binary classifier that has been shown to be very effective in text classification (Joachims, 1998). Unlike generative probabilistic models like na¨ıve Bayes models , SVM is based on the idea of finding a hyperplane in the feature space that separates two classes of data points while keeping the margin as large as possible. More details about SVM can be found in (Cristianini & Shawe-Taylor, 2000). To train SVM we represent each document as a V -dimensional feature vector, where V is the vocabulary size and each coordinate is the normalized term frequency. We use linear kernel for SVM and find the best parameters using grid search. Our SVM implementation is based on LIBSVM (Chang & Lin, 2001). Model Baseline SVM NB-M NB-B SVM NB-M NB-B

Data Set Editors Editors Editors Guests Guests Guests

Accuracy 0.5 0.9724 0.9895 0.9909 0.8621 0.8789 0.8859

Reduction

61% 67% 12% 17%

Table 7: Results of Identifying Perspectives at Document Levels The results in Table 7 show that both na¨ıve Bayes models and SVM perform very well on both Editors and Guests subsets of the bitterlemons corpus. The last column of Table 7 is error reduction relative to SVM, and we can see that na¨ıve Bayes models further reduce errors even when SVM achieves high accuracy, possible due to the properties of achieving optimality with the size of a training set (Ng & Jordan, 2002). By considering the full posterior distribution NB-B further improves on NB-M, which only performs point estimation. Overall the results strongly suggest that the choices of words made by authors, either consciously or subconsciously, reflect much of their political perspectives, and statistical models can capture these clues in words well, even though both SVM and na¨ıve Bayes models make strong word independence assumption. Given the performance gap between Editors and Guests, one may argue that there exist some editing artifacts or writing styles, and the statistical models capture artifacts or writing styles rather than perspectives. To further test if the statistical models truly acquire perspectives, we conduct additional experiments that training and testing data are mismatched, i.e.from different subsets of the corpus. If what SVM and na¨ıve Bayes models learn is writing styles or editing artifacts, the classification performance on the mismatched conditions will be considerably degraded. The results on mismatched training and testing data are shown in Table 8. Both SVM and two variants of na¨ıve Bayes models perform still very well on two combinations of training and testing data. Similar to in Table 7 na¨ıve Bayes models perform better than SVM with large error reduction rate, and NB-B appears to consistently improve on NB-M. The conjecture that statistical models learn writing styles or editing artifacts instead of perspectives is shown to be not the case. The results reaffirm that the perspective of a document 20

Model Baseline SVM NB-M NB-B SVM NB-M NB-B

Training

Testing

Guests Guests Guests Editors Editors Editors

Editors Editors Editors Guests Guests Guests

Accuracy 0.5 0.8822 0.9327 0.9346 0.8148 0.8485 0.8585

43% 44% 18% 24%

Table 8: Identifying Document-Level Perspectives with Different Training and Testing Sets

can be reliably identified based on statistical analysis of the words chosen by writers.

5.3

Latent Sentence Perspective Models

We introduce a new binary random variables, S, to model how strongly a perspective is expressed at the sentence level. The value of S is either s1 or s0 , where s1 means a sentence is written strongly from a perspective while s0 is not. The whole generative process is modeled as follows, π ∼ Beta(απ , βπ ) τ

∼ Beta(ατ , βτ )

θ ∼ Dirichlet(αθ ) Dn ∼ Binomial(1, π) Sm,n ∼ Binomial(1, τ ) Wm,n ∼ Multinomial(Lm,n , θ) π and θ have the same semantics as in na¨ıve Bayes models. S is naturally modeled as a binomial variable where τ is the parameter of S. S represents how likely a sentence strongly convey a perspective. We call this model Latent Sentence Perspective Models (LSPM) because S is not directly observed. The graphical model representation of LSPM is shown in Figure 9. ˜ with unknown sentence perspectives S˜ using LSPM To identify the perspective of a new document D , we calculate the posterior probability by summing out possible combinations of perspective intensities of all sentences in the document and parameters, Z Z Z XX ˜ Sm,n , S, ˜ π, τ, θ|{(Dn , Wn )}N , W ˜ )dπdτ dθ. P (D, n=1 Sm,n S˜

The integral in (13) is, however, very difficult. Similar difficulties arise when one is interested in inferring how strongly a perspective is expressed at the sentence level. We resort to MCMC methods to sample from the posterior distributions of interest, (10) and (11). The details about sampling can be found in Appendix A. As often encountered in mixture models, there is an identifiability issue in LSPM. The problem is that the values of S can be permuted without changing the likelihood function. Therefore the meanings of s0 and s1 are ambiguous. 21

π

τ

θ

Dn Sm,n

Wm,n Mn N

Figure 9: Latent Sentence Perspective Model (LSPM)

0 s

θd ,s 0

0

0 d

1 d

1 s

0 s

θd ,s 0

1

1 s

1 s

θd ,s 1

0

0 s

θs

0

0 d

θd ,s θd ,s 1

(a) s0 and s1 are not identifiable

1

0

1

1 d

θd ,s 1

1

(b) sharing θd1 ,s0 and θd0 ,s0

Figure 10: Two different parameterizations of θ

In Figure 10a we see that four θ are used to represent four possible combinations of document perspectives d and sentence perspective intensity s. Without imposing any constraints on the model, s1 and s0 are exchangeable. Furthermore, we can not interpret high s0 as sentences that express little or no perspectives anymore. Any improvement from LSPM in this parameterization cannot be fully attributed to the existence of sentences that convey little or no perspectives. S may just capture different aspects within individual perspectives, for example, s0 for Editors and s1 for Guests in the bitterlemons corpus, which may fit the specific dataset better but will fail to test our hypothesis. We solve the identifiability problem by forcing θd1 ,s0 and θd0 ,s0 to be identical, reducing the number of θ parameters to three, as shown in Figure 10b. Intuitively sentences that do not strongly convey perspectives should not depend not the overall perspective of a document, and should be common between two perspectives. The graphical structure of the Latent Sentence Perspective Models is very similar to SpeClustering (Huang & Mitchell, 2006) for clustering emails with user feedback. Although SpeClustering is not specifically designed for identifying perspectives at different levels, both models are developed to capture a similar 22

idea: not every data point is meaningful. In SpeClustering there are words unrelated to a topic; in LSPM there are sentences conveying less strongly a perspective.

5.4

Identifying Perspectives at the Sentence Level

No annotation on how strongly sentences covey particular perspectives in the bitterlemons corpus poses great challenges for statistical learning algorithms and quantitatively evaluation of the proposed Latent Sentence Perspective Models (LSPM). While the posterior probability that a sentence strongly covey a perspectives in (11) is of most interest, we can not directly evaluate the estimation from LSPM without truth (we propose to manually collect truth in Section 7.2). We, however, can still evaluate how accurate LSPM predicts the perspective of a document because the labels at the document levels are available. If LSPM does not achieve similar identification accuracy after modeling sentence-level perspectives, we will doubt the quality of the prediction by LSPM on how strongly a sentence convey a perspective. Model Baseline NB-M NB-B LSPM NB-M NB-B LSPM

Training

Testing

Guests Guests Guests Editors Editors Editors

Editors Editors Editors Guests Guests Guests

Accuracy 0.5 0.9327 0.9346 0.9493 0.8485 0.8585 0.8699

Table 9: Results of identifying perspectives at the document level. The experimental results are shown in Table 5.4. We copy the results of na¨ıve Bayes models from Table 8 for easy comparison between models with sentence-level modeling (LSPM) and those without (two na¨ıve Bayes models variants). The accuracy of LSPM is comparable or even slightly better than those of na¨ıve Bayes models , which is very encouraging and suggests that the proposed LSPM closely captures how perspectives are expressed at both document and sentence levels. The sentences that are inferred by LSPM with high probabilities of strong perspectives, i.e.high Pr(S˜ = s1 ), are shown in Examples 5 and 6, and sentences of high probabilities of little or no perspectives, i.e.high Pr(S˜ = s0 ), are shown in Examples 7 and 8. The comparable performance between na¨ıve Bayes models and LSPM should not be dismissed quickly. One can ignore the uncertainties at the sentence level and train na¨ıve Bayes models directly on the sentences to classify a sentence into Palestinian or Israeli perspective. Here a sentence is correctly classified if the prediction for the sentence is the same as the perspective of the document where the sentence is extracted. The accuracy is merely 0.7529, which is much lower than the accuracy previously achieved at the document level (See Table 7. Identifying perspective at the sentence level is thus much harder than at the document level, and the high accuracy achieved by LSPM is not easy, which strongly suggests the quality of prediction on sentence-level perspectives is far beyond random.

6

Regularities of Different Perspectives in Video

We propose to go beyond text and develop the test of different perspectives based on distribution divergence in a video domain (i.e., Research Question 6). Video has been a common medium for people to express 23

their perspectives. The same news event, for example, can be covered very differently by individual news networks. We conduct experiments to test if any statistical regularities exist in different perspectives expressed in video. Similar to the text version of the experiment in Section 4.1, we compare the following three combinations of two video collections in a “bag-of-concepts” representation: • Different Perspectives (DP): news footage on the same topic but from channels in different languages, e.g., LBC vs. MSNBC on “Arafat’s funeral.” • Different Topics (DT): news footage on different topics but from the same news channel, e.g., “Powell’s resignation” vs. “International Olympic Committee visit Beijing” on NTDTV. • Same topic and perspective (STP): news footage on the same topic from the same channel, e.g. “Ukrainian presidential election” on CCTV. A video segment is relevant to a topic if given a query it is returned by a text retrieval engine that searches over ASR transcripts (and the English translations of ASR transcripts). The relevant news stories of a total of seven topics are collected. 1. Iraq war 2. United States presidential election 3. Suicide bomb in Tel Aviv 4. The resignation of Powell 5. Olympics visit Beijing 6. Ukrainian presidential election 7. Japanese hostage held in Iraq Our retrieval engine is based on Lemur5 . The experimental results are shown in Figure 11. The pattern of the values of KL divergence in DP conditions falling in the middle range is striking similar to the text version shown in Figure 5. The results show that there indeed exist statistical regularities for different perspective in video, and we can further exploit this characteristic in modeling of different perspectives in video.

7

Proposed Work

We propose to investigate how different perspectives are reflected in text and video documents. When people discuss controversial issues, their written or spoken words are not only about the topic, i.e. topical, but also reflect inner attitude and opinions toward an issue. Automatically understanding perspectives from written or spoken documents is a scientifically challenging problem. Moreover, it will enable many applications that can survey public opinion on social issues and political viewpoints on a much larger scale. Machine understanding of subjective beliefs, however, has been deemed “a vain hope.” In this proposal we take up the challenge and approach the problem in a statistical learning framework. The experimental results show that perspectives could be successfully identified by modeling documents and their relationship statistically. 5

http://www.lemurproject.org/

24

250 200 150 100 50

KL Divergence

STP

DP

DT

Figure 11: The results showed that the values of KL divergence of different perspectives (DP) exhibit a strikingly similar pattern as text documents in Figure 5.

7.1

Explore Problem Space

Although preliminary results in Section 4, 5, and 6 show that statistical regularities exist for different perspectives in text and video, only a few perspectives are tested: Palestinian vs. Israeli in the bitterlemons corpus, Kerry vs. Bush in the 2004 Presidential Debates corpus, and broadcast news video from different counties in 2004. The statistical regularities based on the values of KL divergence and high classification accuracy of identifying individual perspectives can be further strengthened if more perspectives are systematically tested. We propose to explore further the problem space of perspectives in text and video. An ultimate question is if the observed mid-range values of KL divergence are a special case or a general law. We will collect data of a large variety and assess whether the statistical regularities can be consistently observed. Specifically, we will collect more data differing in the following four dimensions: Topic We plan to collect data beyond the Palestinian-Israeli conflict and the United States politics in 2004. For examples, the Issue Guide on the Public Agenda website6 lists at least 21 issues (e.g., abortion, America’s global role, right to die, etc), each of which contains at least two differing perspectives. Format We plan to collect data in the format other than editorials (bitterlemons) and debates (the 2004 Presidential Debates). It is of great interest to see if statistical regularities can be observed in different kinds of formats (e.g., blog postings in the BLOG06 data (Macdonald & Ounis, 2006)). Genre Although we strongly argue perspectives are reflected in subtle difference on word frequencies, it is still not clear how perspectives are different from subjectivity and sentiments. We propose to compare the techniques and data in subjectivity and sentiment analysis and provide direct evidence that distinguish documents of different perspectives and subjective or sentimental documents. Non-first person perspective So far both the bitterlemons and the 2004 Presidential Debates corpora are all documents narrated from a first person perspective, i.e., the perspective expressed in the document is the same as the perspective of the author. We plan to collect corpora that are not narrated from a first 6

http://www.publicagenda.org/issues/issuehome.cfm

25

person perspective, for examples, newspapers. Newspapers articles may contain quotes that strongly reflect a particular perspective, but the writer of a news article may hold a totally different perspective. Documents narrated from a non-first person perspective thus pose a greater challenge for computers to identify the overall perspective of a document. We also plan to explore video data beyond broadcast news, for example, political campaign ads. There are a total of 43 issues on 182 political ads listed at Washington Post’s Political Ads Database7 .

7.2

Evaluation of Sentence-Level Perspectives

Although the small but positive improvement due to sentence-level modeling in LSPM is encouraging, the results in Section 5.4 at best suggestive of the quality of the predictions at the sentence level. We propose to investigate how consistent the predictions at the sentence level by LSPM with human annotations. We plan to recruit annotators during summer to annotate part of the bitterlemons corpus. We will annotate a subset of sentences in the bitterlemons corpus, half of which are random samples of sentences in the corpus, and the remaining half are those that LSPM predicts to most strongly reflect a perspective. Annotations will not be told a sentence is from random sampling or LSPM. At least two annotators will annotate how strongly each sentence expresses a particular perspective. We will devise an annotation scheme for sentence-level perspectives and train annotators to label sentences as strong or neutral. In the training phase, annotators will be first asked to annotate a small, common set of sentences, and sit down together afterward to discuss any disagreement between annotators. The training process repeats until annotators achieve high degree of agreement (e.g., high kappa statistics (Carletta, 1996)). Then annotators will label the remaining sentences independently. We will report inter-rater agreement rate on labeling perspectives at the sentence level, and the accuracy of LSPM against the annotated truth.

7.3

Joint Model of Topic and Perspective

The distinct statistical characteristics of different perspectives shown in Section 4 and those of individual perspectives shown in Section 5 motivate us to develop a joint model of topic and perspective. Topical words vary from topic to topic, resulting in large KL divergence. On the other hand, words that strongly reflect a perspective are commonly shared by both point of views and differ on a smaller scale in frequency, resulting in medium values of KL divergence. How frequent a word is chosen by an author in expressing his or her opinions toward an issue appears to be determined by two factors: a topic to be discussed, and a perspective an author holds. The statistical phenomenon we propose to capture is illustrated in Figure 12. The 2-D simplex represents all possible Multinomial parameters one can sample from. Suppose there are three topics, T1 , T2 , and T3 as shown in diamonds. The distance between two topics (i.e., Ti and Tj ) are large since the values of KL divergence from different topics (DT) is large, as empirically shown empirically Figure 5. On the other hand, the two perspectives, P3,+ and P3,− of Topic 3, shown in circles, are like two satellites closely surrounding T3 , as the values of KL divergence from different perspectives (DP) is small shown in Figure 5. We will evaluate the proposed joint model on a corpus consisting of multiple topics and perspectives. Bitterlemons (see Section 3.1) is such a corpus, which is labeled with two perspectives (Palestinian and Israeli) as well as (weekly) topics. We will evaluate the model in terms of perplexity reduction and the accuracy on recovering topics and perspectives. 7

http://projects.washingtonpost.com/politicalads/

26

T1

P3,+ T3 P3,-

T2

Figure 12: A 2-D simplex for three words. Three example topics are labeled as T1 , T2 , and T3 . Two perspectives on Topic 3 is labeled as P3,+ and P3,− .

8

Schedule

Table 10 summarizes proposed work in Section 7. Proposed work is scheduled for 2007, as shown in Figure 13.

A

Gibbs Samplers for Modeling Individual Perspectives

Based the model specification described in Section 5.3 we derive the Gibbs samplers (Chen et al., 2000) for Latent Sentence Perspective Models as follows, π (t+1) ∼ Beta(απ +

N X

dn + d˜(t+1) ,

n=1

βπ + N −

N X

dn + 1 − d˜(t+1) )

n=1

τ

(t+1)

∼ Beta(ατ +

Mn N X X

sm,n +

n=1 m=1

βτ +

N X n=1

θ

(t+1)

Mn −

Mn N X X

∼ Dirichlet(αθ +

X

s˜m )

m=1 Mn N X X n=1 m=1

27

s˜m ,

m=1 ˜ M

˜ − sm,n + M

n=1 m=1

˜ M X

wm,n )

Task Identify statistical regularities of different perspectives in text Identify statistical regularities of different perspectives in video Identify the perspective from which a document is written Collect and compare more data in different topics Collect and compare more data in different formats Collect and compare more data in different genres Collect and compare more data in different narrating perspectives Collect and compare more video data Develop an annotation scheme and conduct pilot annotation Train annotators and label sentence-level perspectives Compare predictions from LSPM with human annotations Develop a joint model of topic and perspective Implement and evaluate the joint model of topic and perspective

Section 4 6 5 7.1 7.1 7.1 7.1 7.1 7.2 7.2 7.2 7.3 7.3

Completeness 100% 80% 100%

Table 10: The summary of proposed work

Jan

Mar

May

2007 Jul Aug

Collect and compare more data in different topics Collect and compare more data in different formats Collect and compare more data in different genres Collect and compare more data in different narrating perspectives Collect and compare more video data Visit Al Jazeera Develop an annotation scheme and conduct pilot annotation Train annotators and label sentence−level perspectives Develop a joint model of topic and perspective Implement and evaluate the joint model of topic and perspective Wrap up experiments and finish thesis writing

Figure 13: A timeline of the proposed schedule in Section 8.

28

Oct

Dec

(t+1) Pr(Sn,m = s1 ) ∝ P (Wm,n |Sm,n = 1, θ(t) ) (t+1) Pr(Sm,n = 1|τ, Dn )

˜ (t+1)

Pr(D

1

=d )∝

˜ M Y

(t+1)

dbinom(τd

)

m=1 ˜ M Y

(t) dmultinom(θd,m ˜ (t) )dbinom(π )

m=1

where dbinom and dmultinom are the density functions of binomial and multinomial distributions, respectively. The superscript t indicates that a sample is from the t-th iteration. We run three chains and collect 5000 samples. The first half of burn-in samples are discarded. The Gibbs samplers are implemented in R (R Development Core Team, 2005).

29

References Abelson, R. P. (1973). Computer models of thought and language, chapter The Structure of Belief Systems, 287–339. W. H. Freeman and Company. Abelson, R. P., & Carroll, J. D. (1965). Computer simulation of individual belief systems. The American Behavioral Scientist, 8, 24–30. Beineke, P., Hastie, T., & Vaithyanathan, S. (2004). The sentimental factor: Improving review classification via human-provided information. Proceedings of the Association for Computational Linguistics (ACL2004). Blei, D. M., & Lafferty, J. D. (2006). Correlated topic models. Advances in Neural Information Processing Systems (NIPS). Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3. Bocconi, S., & Nack, F. (2004). VOX POPULI: Automatic generation of biased video sequences. Proceedings of the First ACM Workshop on Story Representation, Mechanism and Context (pp. 9–16). Buchanan, B. G., Barstow, D., Bechtal, R., Bennett, J., Clancey, W., Kulikowski, C., Mitchell, T., & Waterman, D. A. (1983). Building expert systems, chapter Constructing an Expert System, 127–167. AddisonWesley. Carbonell, J. G. (1978). POLITICS: Automated ideological reasoning. Cognitive Science, 2, 27–51. Carbonell, J. G. (1979). Subjective understanding: Computer modles of belief systems. Doctoral dissertation, Yale University. Carletta, J. (1996). Assessing agreement on classification tasks: The kappa statistic. Computational Linguistics, 22. Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm. Chen, M.-H., Shao, Q.-M., & Ibrahim, J. G. (2000). Monte carlo methods in bayesian computation. Springer-Verlag. Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. Wiley-Interscience. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernelbased learning methods. Cambridge University Press. Dave, K., Lawrence, S., & Pennock, D. M. (2003). Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. Proceedings of the 12th International World Wide Web Conference (WWW2003). Fang, Y.-J. (2001). Reporting the same events? a critical analysis of chinese print news media texts. Discourse and Society, 12, 585–613.

30

Geis, M. L. (1987). The language of politics. Springer. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America (PNAS), 101, 5228–5235. Hauptmann, A. G. (2004). Towards a large scale concept ontology for broadcast video. Proceedings of the Third International Conference on Image and Video Retrieval (CIVR). Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Huang, Y., & Mitchell, T. M. (2006). Text clustering with extended user feedback. Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 413–420). Ireson, B. (2004). ”Minions”. Proceedings of the Twelfth ACM International Conference on Multimedia. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Proceedings of the 9th European Conference on Machine Learning (ECML). Klimt, B., & Yang, Y. (2004). The enron corpus: A new dataset for email classification research. Proceedings of the 15th European Conference on Machine Learning (ECML). Kull, S. (2003). Misperceptions, the media and the Iraq war. http://www.pipa.org/ OnlineReports/Iraq/IraqMedia Oct03/IraqMedia Oct03 rpt.pdf. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79–86. Lewis, D., Yang, Y., Rose, T., & Li, F. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397. Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. Proceedings of the 9th European Conference on Machine Learning (ECML). Lin, W.-H., & Hauptmann, A. (2006). Do these documents convey different perspectives? a test of different perspectives based on statistical distribution divergence. Proceedings of the 42th Conference on Association for Computational Linguistics (ACL). Macdonald, C., & Ounis, I. (2006). The TREC blogs06 collection: Creating and analysing a blog test collection (Technical Report TR-2006-224). Department of Computing Science, University of Glasgow. McCallum, A., Corrada-Emmanuel, A., & Wang, X. (2004). The author-recipient-topic model for topic and role discovery in social networks: Experiments with Enron and academic email (Technical Report UM-CS-2004-096). Department of Computer Science, University of Massachusetts Amherst. McCallum, A., & Nigam, K. (1998). A comparison of event models for naive bayes text classification. Proceedings of AAAI-98 Workshop on Learning for Text Categorization. Miyamori, H., Nakamura, S., & Tanaka, K. (2005). Generation of views of TV content using TV viewer’s perspectives expressed in live chats on the web. Proceedings of the 13th ACM International Conference on Multimedia (pp. 853–861). 31

Morinaga, S., Yamanishi, K., Tateishi, K., & Fukushima, T. (2002). Mining product reputations on the web. Proceedings of the 2002 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Mullen, T., & Collier, N. (2004). Sentiment analysis using support vector machines with diverse information sources. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP2004). Naphade, M. R., & Smith, J. R. (2004). On the detection of semantic concepts at TRECVID. Proceedings of the Twelfth ACM International Conference on Multimedia. Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural language processing. Proceedings of the 2nd International Conference on Knowledge Capture (K-CAP 2003). Ng, A. Y., & Jordan, M. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in Neural Information Processing Systems (NIPS). Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39, 103–134. Over, P., Ianeva, T., Kraaijz, W., & Smeaton, A. F. (2005). TRECVID 2005 - an overview. Proceedings of the TREC Video Retrieval Evaluation (TRECVID) 2005. Pan, Z., Lee, C.-C., Chen, J. M., & So, C. Y. (1999). One event, three stories: Media narratives of the handover of hong kong in cultural china. Gazette, 61, 99–112. Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. Proceedings of the Association for Computational Linguistics (ACL-2004). Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2002). Popp, R., Armour, T., Senator, T., & Numrych, K. (2004). Countering terrorism through information technology. Communications of the ACM, 47, 36–43. Porter, M. (1980). An algorithm for suffix stripping. Program, 14, 130–137. R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Rasmussen, E. M. (1997). Indexing images. Annual Review of Information Science and Technology (ARIST), 32, 169–196. Riloff, E., & Wiebe, J. (2003). Learning extraction patterns for subjective expressions. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2003). Riloff, E., Wiebe, J., & Wilson, T. (2003). Learning subjective nouns using extraction pattern bootstrapping. Proceedings of the 7th Conference on Natural Language Learning (CoNLL-2003). Ripley, B. D. (1987). Stochastic simulation. Wiley. 32

Rosen-Zvi, M., Griffths, T., Steyvers, M., & Smyth, P. (2004). The author-topic model for authors and documents. Proceedings of the 20th conference on Uncertainty in artificial intelligence (UAI) (pp. 487– 494). Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Lawrene Erlbaum Associates. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34, 1–47. Steyvers, M., & Griffiths, T. (In Press). Latent semantic analysis: A road to meaning, chapter Probabilistic Topic Models. Laurence Erlbaum. Turney, P., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21, 315–346. van Dijk, T. (1988). News as discourse. Hillsdale, NJ: Lawrence Erlbaum. Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. (2004). Learning subjective language. Computational Linguistics, 30. Wilson, J. (1990). Politically speaking: The pragmatic analysis of political language. Blackwell. Xing, E. P. (2005). On topic evolution (Technical Report CMU-CALD-05-115). Center for Automated Learning and Discovery, School of Computer Science, Carnegie Mellon University. Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 42–49). Yang, Y., & Pedersen, J. O. (1997). A comparative study on feature selection in text categorization. Proceedings of the 14th International Conference on Machine Learning (ICML). Yu, H., & Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-2003). Zhai, Y., & Shah, M. (2005). Tracking news stories across different sources. Proceedings of the 13th ACM International Conference on Multimedia. Zhang, D.-Q., Lin, C.-Y., Chang, S.-F., & Smith, J. R. (2004). Semantic video clustering across sources using bipartite spectral clustering. Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME) (pp. 117–120).

33

Identifying Perspectives in Text and Video - Semantic Scholar

Carnegie Mellon University. December 24, 2006 .... broadcast news, newspapers, and blogs for differing viewpoints. Contrary to costly human monitoring, com-.

4MB Sizes 2 Downloads 299 Views

Recommend Documents

Identifying Perspectives in Text and Video - Semantic Scholar
Dec 24, 2006 - editor each contribute one article addressing the issue. In addition, the .... ing Running, Sky, Animal, Person , Outdoor, Clouds, Day- time Outdoor ...... R: A language and environment for statistical computing. R Foundation.

Identifying Perspectives in Text and Video - Ph.D ...
Statistical Regularities of Perspectives in Text and Video. Proposed Work ...... Proceedings of the 9th European Conference on Machine. Learning (ECML). Klimt ...

Identifying News Videos' Ideological Perspectives ... - Semantic Scholar
Oct 24, 2009 - across news broadcasters. Ideology. Hours. News Broadcaster (Channel). American. 73. CNN (LV, AA), NBC (23, NW),. MSNBC (11, 13).

Identifying News Videos' Ideological Perspectives ... - Semantic Scholar
Oct 24, 2009 - School of Computer Science ..... ideological weights simultaneously from data. ..... politics have too little video data in the collection) and in-.

Identifying and Visualising Commonality and ... - Semantic Scholar
Each model variant represents a simple banking application. The variation between these model variants is re- lated to: limit on the account, consortium entity, and to the currency exchange, which are only present in some variants. Figure 1 illustrat

Identifying global regulators in transcriptional ... - Semantic Scholar
discussions and, Verónica Jiménez, Edgar Dıaz and Fabiola Sánchez for their computer support. References and recommended reading. Papers of particular interest, .... Ju J, Mitchell T, Peters H III, Haldenwang WG: Sigma factor displacement from RN

Identifying and Visualising Commonality and ... - Semantic Scholar
2 shows the division of the UML model corresponding to Product1Bank of the banking systems UML model vari- ants. ... be able to analyse this and conclude that this is the case when the Bank has withdraw without limit. On the ... that are highly exten

Improved Video Categorization from Text Metadata ... - Semantic Scholar
Jul 28, 2011 - mance improves when we add features from a noisy data source, the viewers' comments. We analyse the results and suggest reasons for why ...

Identifying Social Learning Effects - Semantic Scholar
Feb 11, 2010 - treatment by police officers (often measured as stop or search rates) can ... racial prejudice using a ranking condition that compares searches ...

Identifying Social Learning Effects - Semantic Scholar
Feb 11, 2010 - Our analysis permits unobservables to play a more general role in that we ...... In other words, race has no marginal predictive value for guilt or.

An Audit of the Processes Involved in Identifying ... - Semantic Scholar
The Commission for Racial Equality (Special Educational Needs. Assessment in Strathclyde: Report of a Formal Investigation,. CRE, London, 1996) highlighted ...

Science Perspectives on Psychological - Semantic Scholar
authors have replied to our e-mail and publicly disclosed the requested ..... act of actually disclosing the information carries much ..... Marketing Science, 32, 1–3.

Automatic, Efficient, Temporally-Coherent Video ... - Semantic Scholar
Enhancement for Large Scale Applications ..... perceived image contrast and observer preference data. The Journal of imaging ... using La*b* analysis. In Proc.

Online Video Recommendation Based on ... - Semantic Scholar
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, P. R. ... precedented level, video recommendation has become a very.

PATTERN BASED VIDEO CODING WITH ... - Semantic Scholar
quality gain. ... roughly approximate the real shape and thus the coding gain would ..... number of reference frames, and memory buffer size also increases.

Scalable Video Summarization Using Skeleton ... - Semantic Scholar
the Internet. .... discrete Laplacian matrix in this connection is defined as: Lij = ⎧. ⎨. ⎩ di .... video stream Space Work 5 at a scale of 5 and a speed up factor of 5 ...

Merging Rank Lists from Multiple Sources in Video ... - Semantic Scholar
School of Computer Science. Carnegie ... preserve rank before and after mapping. If a video ... Raw Score The degree of confidence that a classifier assigns to a ...

Merging Rank Lists from Multiple Sources in Video ... - Semantic Scholar
School of Computer Science. Carnegie Mellon .... pick up the top-ranked video shot in the first rank list, and ... One may be curious as to the best performance we.

Scalable Video Summarization Using Skeleton ... - Semantic Scholar
a framework which is scalable during both the analysis and the generation stages of ... unsuitable for real-time social multimedia applications. Hence, an efficient ...

Generating Arabic Text from Interlingua - Semantic Scholar
Computer Science Dept.,. Faculty of ... will be automated computer translation of spoken. English into .... such as verb-subject, noun-adjective, dem- onstrated ...

METER: MEasuring TExt Reuse - Semantic Scholar
Department of Computer Science. University of ... them verbatim or with varying degrees of mod- ification. ... fined °700 and banned for two years yes- terday.

Extracting Collocations from Text Corpora - Semantic Scholar
1992) used word collocations as features to auto- matically discover similar nouns of a ..... training 0.07, work 0.07, standard 0.06, ban 0.06, restriction 0.06, ...

Generating Arabic Text from Interlingua - Semantic Scholar
intention rather than literal meaning. The IF is a task-based representation ..... In order to comply with Arabic grammar rules, our. Arabic generator overrides the ...