Improving query focused summarization using look-ahead strategy Rama B, V. Suresh, C. E. Veni Madhavan Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India {ramab,vsuresh,cevm}@csa.iisc.ernet.in Abstract. Query focused summarization is the task of producing a compressed text of original set of documents based on a query. Documents can be viewed as graph with sentences as nodes and edges can be added based on sentence similarity. Graph based ranking algorithms which use ‘Biased random surfer model’ like topic-sensitive LexRank have been successfully applied to query focused summarization. In these algorithms, random walk will be biased towards the sentences which contain query relevant words. Specifically, it is assumed that random surfer knows the query relevance score of the sentence to where he jumps. However, neighbourhood information of the sentence to where he jumps is completely ignored. In this paper, we propose look-ahead version of topic-sensitive LexRank. We assume that random surfer not only knows the query relevance of the sentence to where he jumps but he can also look N-step ahead from that sentence to find query relevance scores of future set of sentences. Using this look ahead information, we figure out the sentences which are indirectly related to the query by looking at number of hops to reach a sentence which has query relevant words. Then we make the random walk biased towards even to the indirect query relevant sentences along with the sentences which have query relevant words. Experimental results show 20.2% increase in ROUGE-2 score compared to topic-sensitive LexRank on DUC 2007 data set. Further, our system outperforms best systems in DUC 2006 and results are comparable to state of the art systems. Keywords: Topic Sensitive LexRank, Look-ahead, Biased random walk

1

Introduction

Text summarization is the process of condensing a source text into a shorter version preserving its information content [1]. Generic multi-document summarization aims at producing summary from a set of documents which are based on the same topic. Query focused summarization is a particular kind of multidocument summarization where the task is to create a summary which can answer the information need expressed in the query [14]. After the introduction of query focused summarization as the main task in DUC1 competitions, it has become one of the important fields of research in both natural language processing and information retrieval. 1

http://duc.nist.gov

In this paper, we concentrate on sentence extractive summarization which basically involves heuristic ranking of sentences present in the documents, and picking top-ranked sentences to the summary. Query focused summarization is harder task than generic multi-document summarization because it expects the summary to be biased towards the query. Usually, queries are complex and not direct questions. So the the summary generated by just picking the textual units which contain names or numbers would not suffice. Therefore it requires deep understanding of the documents to create an ideal summary. Further, query will have very few words. So the main challenge is to use this little information to pick important sentences which answer the question in the query. Several methods have been developed for query focused summarization over the years. Graph based summarization methods based on Google’s PageRank algorithm [2] have gained much attention due to their simplicity, unsupervised nature and language independence. For example, Lexrank [9] builds a weighted undirected graph by considering sentences as nodes and edges are added between the sentences based on cosine similarity measure. Then PageRank algorithm is applied to find salient sentences in the graph. Otterbacher et al. [5] proposed the idea of biased random walk called topic-sensitive LexRank for question answering. However, topic-sensitive LexRank can be easily extended to query focused summarization. Wan et al. [11] came up with an improved method. In their algorithm, instead of treating all relations between the sentences equally, within-document relationships and cross-document relations are differentiated and separate random walk models are used. In the existing ‘biased surfer models’, sentences which contain query relevant words are given high scores by making the random walk biased towards these sentences. This is based on the assumption that random surfer knows the query relevance score of the sentence to where he jumps. However, the sentences which are indirectly related to the query are found during the course of the algorithm using the link structure of the similarity graph. In our model, we try to find out sentences which are indirectly related to the query using the neighbourhood information. Specifically, we assume that random surfer not only knows the query relevance score of the sentence to where he jumps but we also include the option of looking N-step ahead from that sentence to learn more about it. Now we bias the random walk towards both indirect query relevant sentences and the ones which contain query relevant words. This results in generating better quality summaries. The experiments on DUC 2006 and DUC 2007 data sets confirm that inclusion of look-ahead strategy yields better performance. We show that our method performs better than some of the recently developed methods and results are comparable to state of the art approaches.

The rest of the paper is organized as follows: In Section 2, we give a brief description of topic-sensitive LexRank and then we introduce our model. In Section 3, we present experiments and results. Finally, we conclude and suggest possible directions for future research in Section 4.

2

Topic-sensitive LexRank: A Revisit

Since we develop our model from topic-sensitive LexRank, we give a brief description of it in this section. Topic-sensitive LexRank uses concept of graph based centrality to rank the sentences. It consists of the following steps. A similarity graph G is constructed using the sentences in the document set. Each sentence is viewed as a node. Stopwords are removed from both query and the sentences. Next, all the words are reduced to their root form through stemming and word ISF’s (Inverse Sentence Frequency) are calculated by the following formula:  isfw = log

Ns + 1 0.5 + sfw

 (1)

where, Ns is the total number of sentences in the cluster, sfw is the number of sentences that the word w appears in. Now, relevance of a sentence s to the query q is computed by the following formula: rel(s|q) =

X

log(tfw,s + 1) × log(tfw,q + 1) × isfw

(2)

w∈q

where tfw,s and tfw,q are the number of times w appears in s and q, respectively. Similarity between two sentences is calculated using cosine measure weighted by word ISF’s. 2

P

tfw,x tfw,y (isfw ) qP 2 2 (tf isf ) × x ,x x i i xi ∈x yi ∈y (tfyi ,y isfyi )

sim(x, y) = qP

w∈x,y

(3)

The main aim of query focused summarization is to pick sentences which are relevant to the query. Therefore sentences which are similar to the query should get high score. But a sentence that is similar to other high scoring sentence in the graph should also get a high score. This is modelled by using a mixture model. Considering the entire graph, if p(s|q) denotes score of sentence s given query q, is determined as the sum of its relevance to query and the similarity to other sentences in the document cluster. p(s|q) = d P

X sim(v, s) rel(s|q) P + (1 − d) p(v|q) z∈C rel(z|q) z∈C sim(v, z) v∈C

(4)

d is known the as the damping factor which is a trade-off between the similarity of a sentence to the query and to the other sentences in the cluster. Equation 4 can be explained using random walk model as follows. If we imagine a random surfer jumping from one node to another on the graph, at each step, the random surfer does one of the two things - with probability d, he randomly jumps to a sentence (random jump) with a probability proportional to its relevance to the query; or with probability 1 − d he follows an outlink to reach one of the neighbouring nodes (forward jump) with a probability proportional to the edge weight in the graph. Since we want to give high score to sentences which are similar to the query, usually d > 1 − d in Equation 4. Experimentally it is proved that d = 0.7 gives best results [3]. Now we will introduce few key terms which help us to understand the topicsensitive LexRank better. Direct query relevant sentence: A sentence which has got at least one word overlapping with the query. Indirect query relevant sentence: A sentence which does not have any query related words but it is in the vicinity of ‘direct query relevant’ sentences in the graph. N-step indirect query relevant sentence: An ‘indirect query relevant’ sentence with at least one of the sentences which are N-hops away from the current sentence is ‘direct query relevant’ and further, none of the sentences which are at a distance less than N-hops are ‘direct query relevant’. We assume that the similarity graph contains at least one ‘direct query relevant’ sentence. Topic-sensitive LexRank uses biased random walk to boost the scores of both direct and ‘indirect query relevant’ sentences in a single equation. In case of random jump, ‘direct query relevant’ sentences are preferred as random surfer knows the query relevance of the sentence to where he jumps. Therefore random jump boosts the score of only ‘direct query relevant’ sentences. Scores of ‘indirect query relevant’ sentences will not get affected as they have zero similarity with the query. On the other hand, forward jump is used to increase the scores based on sentence similarity. Basically through forward jump, sentences which are adjacent to other high scoring sentences end up getting high score too. Therefore, it helps ‘indirect query relevant’ sentences as they are very near to other high scoring sentences. At the starting of the algorithm, set of ‘direct query relevant’ sentences is known. So the random surfer is purposely made to jump to those sentences in every random jump to increase their score. Whereas the set of ‘indirect query relevant’ sentences is not known. Therefore they depend on the forward jump of the random surfer from neighbouring high scored sentences to boost their score.

2.1

Topic-sensitive LexRank with look-ahead strategy

In our method we mainly concentrate on detecting ‘indirect query relevant’ sentences so that random surfer is made to choose both ‘direct query relevant’ and ‘indirect query relevant’ sentences during random jump. For the analysis purpose, lets concentrate on ‘1-step indirect query relevant’ sentences i.e. sentences which have at least one ‘direct query relevant’ sentence as their neighbour. If we know the set of ‘1-step indirect query relevant’ sentences before hand just like ‘direct query relevant’ sentences, then we can make the random surfer to jump to even ‘1-step indirect query relevant’ sentences during random jump. The advantage of this method is that, random jump which is happening in “every move of the random surfer”, will now prefer these sentences too. So ‘1-step indirect query relevant’ sentences need not have to wait for the forward jumps to boost their scores. ‘Direct query relevant’ sentences can be easily detected, as they will have non zero similarity with the query. But in order to detect ‘1-step indirect query relevant’ sentences, we have to make use of query relevance scores of the neighbours. Specifically, sum of query relevance scores of the neighbours will be non zero for ‘1-step indirect query relevant’ sentences. Therefore to make the random surfer to jump to both ‘direct query relevant’ and ‘1-step indirect query relevant’ sentences in every random jump, we will define the modified query relevance function as follows rel0 (si |q) = α × rel(si |q)

+

β×

X

rel(sj |q)

(5)

sj ∈N e(si ,1)

where, N e(s, k) returns sentences which are k-hop distant from s. α and β are the parameters used to control the probability of random surfer jumping to ‘direct query relevant’ sentences and ‘1-step indirect query relevant’ sentences respectively. In Equation 4, if we use the modified query relevance function defined in Equation 5, then resultant model can be viewed as topic-sensitive LexRank with “1-step look-ahead”. So now the random surfer not only knows the query relevance score of the sentence to where he jumps, but he also knows the query relevance score of its neighbours. Since ‘direct query relevant’ sentences are more important than ‘1-step indirect query relevant’ sentences, usually α > β. Note that in Equation 5, ‘direct query relevant’ sentences can get additional advantage from second term as their neighbouring sentences could be other ‘direct query relevant’ sentences. So α and β must be carefully chosen. Generalizing this, for ‘N-step indirect query relevant’ sentence, sum of query relevance scores of sentences which are N-hops away from the current sentence will be non zero. So if we use the look ahead information, we can judge a sentence better in the sense that we can detect whether it is 1-step or 2-step or in

general ‘N-step indirect query relevant’ sentence. Now with “N-level look ahead information”, we can make the random surfer to jump to both ‘direct query relevant’ sentences and all “K-step indirect query relevant sentences” in every random jump, where 1 <= K <= N. So topic-sensitive LexRank with “N-step look ahead” makes use of modified query relevance function which is defined as follows. rel0 (si |q) = α×rel(si |q) + β×

X

sj ∈N e(si ,1)

X

rel(sj |q) + . . . + ν×

rel(sj |q)

sj ∈N e(si ,N )

(6)

where, α β . . . ν are the controlling parameters. Note that in Equation 6, ‘direct query relevant’ sentences can get additional advantage from rest of the N terms if it has other ‘direct query relevant’ sentences in any of the N future levels. Similarly, a ‘K-step indirect query relevant’ sentence can get additional advantage from the rest (N-K) terms. Finally, in Equation 4 we use the modified query relevance function defined in Equation 6 and we continue with biased random walk to rank the sentences. Theoretically, our model should work for any value of N. But since the sentences at different levels are not “conceptually” related (as we only look at word overlap to add edges), for higher values of N our model breaks down. 2.2

Redundancy removal

Redundancy is a major problem in case of multi-document summarization as sentences from different documents may have similar content. Therefore it is essential to improve the diversity of the focused summary in order to increase the information coverage. In our model, we make use of the greedy algorithm proposed in [12] to impose diversity penalty on the sentences. The algorithm is as follows. 1. Define two sets, A = φ and B = {si |i=1,2. . .N}, and initialize the score of each sentence to its graph based ranking score computed using Equation 4, i.e. Score(si ) = p(si |q), i = 1,2. . .N. 2. Sort the sentences in B by their current scores in descending order. 3. Suppose si is the highest ranked sentence in B. Move si from B to A, and recalculate the scores of the remaining sentences in B by imposing redundancy penalty as follows. For each sentence sj ∈ B Score(sj ) = Score(sj ) − λ · sim( si , sj ) · p(si |q)

(7)

where, λ is the penalty degree factor which is used to control penalty imposed on sentences. sim(si , sj ) is the similarity function defined in equation 3. 4. Go to step 2 and iterate until B = φ or the iteration count reaches a predefined maximum number. Finally, sentences in the set A are added to the summary in the same order.

3

Experiment and Evaluation

3.1

Experiment setup

We conducted experiments on DUC 2006 and 2007 data sets. Query (Topic) focused summarization was the only task in DUC 2006 and it was the main task in DUC 2007. In DUC 2006 data set we had 50 document clusters and in DUC 2007 data set we had 45 document clusters. Each document cluster has 25 documents picked from AQUAINT corpus. Further, each document cluster contains a topic and 4 human generated summaries. The task is to synthesize a fluent, well-organized 250-word summary of the documents that answers the need for information expressed in the topic. We combined the title and the topic for each document set and conducted the experiemnts. We use ROUGE toolkit2 (Recall Oriented Understudy for Gisting Evaluation) for evaluation. ROUGE compares system generated summary against human generated summaries to measure its quality. ROUGE gives a variety of different statistics: – – – –

ROUGE-N(1-4): N-gram based co-occurrence statistics ROUGE-L: LCS (Longest Common subsequence) based statistics ROUGE-W: Weighted LCS based statistics ROUGE-SU: Skip-bigram plus unigram based co-occurrence statistics

In this paper, we show the results in terms of ROUGE-2 and ROUGE-SU4 scores which were considered as main criteria to rate the summaries under automatic evaluation in DUC 2007. For simplicity, we discuss the results of the proposed method with “1-step look-ahead” approach. So we use the following equation for the ranking process. 1

0 X

C B α × rel(s|q) + β × rel(si |q) C B X sim(v, s) B C si ∈N e(s,1) C+(1−d) 2 3 X p(v|q) p(s|q) = d B B C sim(v, z) C BX X v∈C @ 4α × rel(z|q) + β × rel(si |q)5 A z∈C

z∈C

si ∈N e(z,1)

(8)

There are 4 parameters in our algorithm. d is the damping factor, α and β are the weights given to ‘direct query relevant’ and ‘1-step indirect query relevant’ sentences respectively in Equation 8. λ represents penalty degree factor used to remove redundancy in Equation 7. 2

ROUGE 1.5.5 is used, and the parameters are -n 2 -x -m -2 4 -u -c 95 -r 1000 -f A -p 0.5 -t 0 -l 250

3.2

Experiments on DUC 2006 data set

In our experiments we used DUC 2006 data set to tune the parameters and then tested the performance of our model on DUC 2007 data set. From the experiments we picked the value of 0.2 for λ as it produced best results. We started off with d = 0.7 for which topic-sensitive LexRank achieves maximum performance [3]. To estimate the values of α and β, we use gradient search strategy. In the first step, we set α to a value and β is varied within a range to observe the variation in performance. Next, we set β to the value for which we got best results. To find the appropriate value of α, the experiment is repeated again with α varying within a range. In our experiments we first keep α value fixed to 0.5 and β is varied from 0 to 1. Fig 1 demonstrates the influence of β on the performance of the model. Since β > 0 indicates the addition of “look ahead information”, we can clearly see an abrupt increase in ROUGE-2 from 0.08291 to 0.09529 before and after the addition of “look ahead information” respectively. We can conclude that when β = 0.08, ROUGE-2 score reaches maximum and thereafter it decreases. Drop in the performance with increase in β can be explained as follows. Second term in Equation 5 is the summation of query relevance scores of all the neighbours. This is quite a big value compared to query relevance score of the current sentence alone. So net effect of increase in β after 0.08 is that, second term completely masks the effect of first term. i.e. we are completely neglecting query relevance score of the current sentence. So the random surfer will not able to able to differentiate between ‘direct query relevant’ sentences and other sentences. Because of this loss of information, performance decreases. To find the appropriate value of α, we set β to 0.08 and we repeat the experiment with α varying from 0 to 1. Fig 2 shows the performance with different values of α. Best ROUGE-2 score (0.09535) is obtained when α reaches 0.56 which confirms our assumption that α > β.

Fig. 1: ROUGE-2 vs. β.

Fig. 2: ROUGE-2 vs. α.

In order to test the effect of damping factor d in the ranking process, we repeated the experiment with d varying from 0 to 1 and keeping α = 0.56 and β = 0.08. From Fig 3, we can conclude that look-ahead version of topic-sensitive LexRank also achieves maximum performance at d = 0.7.

Fig. 3: ROUGE-2 vs. d.

Now with the setting λ = 0.2, d = 0.7, α = 0.56 and β = 0.08 we tested the performance of our model on DUC 2007 data set. We got ROUGE-2 score of 0.11983 and ROUGE-SU4 score of 0.17256. 3.3

Comparison with DUC systems

Tables 1 and 2 show the comparison of our model with top 5 performing systems in DUC 2006 and DUC 2007 respectively. Our model is denoted by “1-step T-LR” which stands for “Topic sensitive LexRank with 1-step look-ahead”. In both the tables, scores are arranged in decreasing order of ROUGE-2. Last row of each table indicates baseline summaries which were created automatically for each document set by taking all leading sentences from the most recent document until a word length of 250 was reached. Systems 1-step T-LR S24 S12 S23 S8 S28 Baseline

R-2 0.09535 0.09505 0.08987 0.08792 0.08707 0.08700 0.04947

R-SU4 0.15134 0.15464 0.14755 0.14486 0.14134 0.14522 0.09788

Table 1: Comparison with DUC 2006 top 5 systems

Systems S15 (IIIT-H) S29 (PYTHY) 1-step T-LR S4 S24 S13 Baseline

R-2 0.12448 0.12028 0.11983 0.11887 0.11793 0.11172 0.06039

R-SU4 0.17711 0.17074 0.17256 0.16999 0.17593 0.16446 0.10507

Table 2: Comparison with DUC 2007 top 5 systems

In DUC 2006, the proposed approach is able to outperform all the top performing systems and stands at first position in ROUGE-2 score. In DUC 2007, we can see that our method stands at third position in ROUGE-2 score. System 15 (IIIT-H) [8] and System 29 (PYTHY) [10] which were positioned at the top in overall ROUGE evaluations in DUC 2007 are state of the art systems. It should be noted that System 15 (IIIT-H) uses about a hundred of manually hand-crafted rules (which are language dependent) to reduce the sentences without losing much information. Even System 29 (PYTHY) uses certain sentence simplification heuristics. Though this technique increases the information content of the summary, this might affect the readability due to the fact that resulting sentences might be grammatically incorrect. Sentence simplification methods usually help in increasing ROUGE scores as we are removing unimportant words and hence making room for the informative ones. But this is done at the cost of generating ungrammatical sentences which are difficult to understand. To prove our point, IIIT-H system dropped to 22th position and PYTHY dropped to 21th position out of 30 submitted systems under “Grammaticality” in DUC 2007 evaluations. However, our method does not use any sentence simplification methods and hence summaries generated by our system do not suffer from grammaticality issues. Moreover, our method is able to produce informative summaries which are as good as the ones produced by state of the art systems which is evident from ROUGE scores.

3.4

Comparison with existing methods

In this section, we will compare the performance of our model with some of the recently developed systems. Description of the systems are as follows. Wiki [4]: Uses wikipedia as a source of knowledge to expand the query. Adasum [13]: Employs a mutual boosting process to generate extractive summaries and optimize topic representation. SVR [7]: Uses Support Vector Regression (SVR) to estimate the importance of sentences through set of pre-defined features. HT [6]: Builds a hierarchical tree representation of the words present in the document set. A bottom-up algorithm is to used find significance of words and then sentences are picked using a top down algorithm applied on the tree. T-LR [5]: Topic-sensitive LexRank. In Table 3 we can see that our system performs better than all the recently published methods. Further, our model shows 20.17% improvement in ROUGE-2 score compared to topic-sensitive LexRank on DUC 2007 data set. Fig 3 shows the per-topic comparison of Topic-sensitive LexRank (T-LR) with our model (1-step T-LR) in ROUGE-2 and ROUGE-4 scores on DUC 2007 data set. We can see that our method has performed well in almost all the document sets.

Systems 1-step T-LR Adasum SVR HT Wiki T-LR

R-2 0.11983 0.11720 0.11330 0.11100 0.11048 0.09972

R-SU4 1-step T-LR Improvement 0.17256 0.16920 (2.24%, 1.99%) 0.16520 (5.76%, 4.46%) 0.16080 (7.96%, 7.31%) 0.16479 (8.46%, 4.72%) 0.15300 (20.17%, 12.78%)

Table 3: Comparison with existing methods on DUC 2007 data set

Fig. 4: Per-topic comparison of Topic-sensitive LexRank (T-LR) with our system (1step T-LR)

4

Conclusion and Future Work

In this paper, we present look ahead version of the topic-sensitive LexRank. Essentially we use look ahead strategy to find ‘indirect query relevant’ sentences and then we bias the random walk towards both ‘direct query relevant’ and ‘indirect query relevant’ sentences. Experimental results on DUC 2006 and DUC 2007 data sets confirms the idea of the proposed work and shows that performance of our model is comparable to state of the art approaches. Further, our model preserves linguistic quality of the generated summary unlike state of the art approaches. Our method does not depend on any language specific features and achieves good results without taking help of any external resources like WordNet/Wikipedia. We do not include any pre-processing steps like POS tagging or parsing of sentences which may consume time. In future, we plan to extend our model to generic multi-document summarization. In query focused summarization we exactly know that sentences which are biased towards the query are potential candidates for the summary. But in generic multi-document summarization, we have to exploit the natural topic distribution in the documents to find out important sentences. So the main challenge is to figure out how to incorporate look-ahead strategy in this framework.

5

Acknowledgement

We acknowledge a partial support for the work, from a project approved by the Department of Science and Technology, Government of India.

References 1. Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Proceedings of the ACL Workshop on Intelligent Scalable Text Summarization. pp. 10–17 (1997) 2. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 30(1-7), 107–117 (1998) 3. Erkan, G.: Using biased random walks for focused summarization. In: Proceedings of the DUC 2006 document understanding workshop. Brooklyn, NY, USA (2006) 4. Nastase, V.: Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation. In: EMNLP ’08: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 763–772. Association for Computational Linguistics, Morristown, NJ, USA (2008) 5. Otterbacher, J., Erkan, G., Radev, D.R.: Using random walks for question-focused sentence retrieval. In: HLT ’05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. pp. 915–922. Association for Computational Linguistics, Morristown, NJ, USA (2005) 6. Ouyang, Y., Li, W., Lu, Q.: An integrated multi-document summarization approach based on word hierarchical representation. In: ACL-IJCNLP ’09: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. pp. 113–116. Association for Computational Linguistics, Morristown, NJ, USA (2009) 7. Ouyang, Y., Li, W., Li, S., Lu, Q.: Applying regression models to query-focused multi-document summarization. Inf. Process. Manage. (2010) 8. Pingali, P., K, R., Varma, V.: Iiit hyderabad at duc 2007. In: Proceedings of the Document Understanding Conference. Rochester, NIST (2007) 9. Radev, D.R.: Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22, 2004 (2004) 10. Toutanova, K., Brockett, C., Gamon, M., Jagarlamudi, J., Suzuki, H., Vanderwende, L.: The pythy summarization system: Microsoft research at duc2007. In: DUC 2007: Document Understanding Conference. Rochester, NY, USA (2007) 11. Wan, X., Yang, J., Xiao, J.: Using cross-document random walks for topic-focused multi-document. In: WI ’06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence. pp. 1012–1018. IEEE Computer Society, Washington, DC, USA (2006) 12. Wan, X., Yang, J., Xiao, J.: Manifold-ranking based topic-focused multi-document summarization. In: IJCAI’07: Proceedings of the 20th international joint conference on Artifical intelligence. pp. 2903–2908. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2007) 13. Zhang, J., Cheng, X., Wu, G., Xu, H.: Adasum: an adaptive model for summarization. In: CIKM ’08: Proceeding of the 17th ACM conference on Information and knowledge management. pp. 901–910. ACM, New York, NY, USA (2008) 14. Zhao, L., Wu, L., Huang, X.: Using query expansion in graph-based approach for query-focused multi-document summarization. Inf. Process. Manage. 45(1), 35–41 (2009)

Improving query focused summarization using look ...

Department of Computer Science and Automation,. Indian Institute of Science, Bangalore, India. {ramab,vsuresh,cevm}@csa.iisc.ernet.in. Abstract. Query focused summarization is the task of producing a com- pressed text of original set of documents based on a query. Documents can be viewed as graph with sentences as ...

273KB Sizes 0 Downloads 223 Views

Recommend Documents

Multi-topical Discussion Summarization Using ... - Springer Link
marization and keyword extraction research, particularly that for web texts, such as ..... so (1.1), for this reason (1.3), I/my (1.1), so there (1.1), problem is (1.2), point .... In: International Conference on Machine Learning and Cybernetics (200

Scalable Video Summarization Using Skeleton ... - Semantic Scholar
the Internet. .... discrete Laplacian matrix in this connection is defined as: Lij = ⎧. ⎨. ⎩ di .... video stream Space Work 5 at a scale of 5 and a speed up factor of 5 ...

Improving Automated Source Code Summarization via ...
Jun 7, 2014 - Department of Computer Science and Engineering. University of Notre Dame ... keywords or statements from the code, and 2) build a sum- mary from this .... Therefore, in RQ2, we study the degree to which programmers emphasize these ... r

Scalable Video Summarization Using Skeleton ... - Semantic Scholar
a framework which is scalable during both the analysis and the generation stages of ... unsuitable for real-time social multimedia applications. Hence, an efficient ...

Multi-topical Discussion Summarization Using ... - Springer Link
IBM Research – Tokyo. 1623-14 Shimotsuruma, Yamato, Kanagawa, Japan [email protected]. 3. Graduate School of Interdisciplinary Information Studies, University of Tokyo ... School of Computer Science, University of Manchester ... twofold: we first t

Cross-Lingual Query Suggestion Using Query Logs of ...
A functionality that helps search engine users better specify their ... Example – MSN Live Search .... Word alignment optimization: GIZA++ (Och and Ney,. 2003).

Content-Based Video Summarization using Spectral ...
application is developed to qualitatively test the results of video summarization for ... have a resource-constrained development environment. Due to the limited ...

Linguistic Summarization Using IF-THEN Rules
Index Terms—Data mining, IF-THEN rules, interval type-2 fuzzy set, knowledge .... the database D, which collects information about elements from Y, is in the ...... a fuzzy linguistic summary system,” Fuzzy Sets and Systems, vol. 112, pp. .... In

Query Rewriting using Monolingual Statistical ... - Semantic Scholar
expansion terms are extracted and added as alternative terms to the query, leaving the ranking function ... sources of the translation model and the language model to expand query terms in context. ..... dominion power va. - dominion - virginia.

Multi-View Video Summarization Using Bipartite ...
Then the local edge histograms for each of these blocks are obtained. Edges are broadly grouped into five bins: vertical, horizontal, 45 diagonal, 135 diagonal, and isotropic. So, texture of a frame is represented by an 80-dimensional(16 blocks. 5 bi

Improving Keyword Search by Query Expansion ... - Research at Google
Jul 26, 2017 - YouTube-8M Video Understanding Challenge ... CVPR 2017 Workshop on YouTube-8M Large-Scale Video Understanding ... Network type.

Query Expansion Based-on Similarity of Terms for Improving Arabic ...
same meaning of the sentence. An example that .... clude: Duplicate white spaces removal, excessive tatweel (or Arabic letter Kashida) removal, HTML tags ...

Look, Imagine and Match: Improving Textual ... - CVF Open Access
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models. Jiuxiang Gu1, Jianfei Cai2, Shafiq Joty2, Li Niu3, Gang ...

Look, Imagine and Match: Improving Textual ... - CVF Open Access
Table 3: Comparisons of the cross-modal retrieval results on MSCOCO dataset with the state-of-the-art methods. We mark the unpublished work with ∗ symbol.

Improving IMAGE matting USING COMPREHENSIVE ... - GitHub
Mar 25, 2014 - ... full and partial pixel coverage (alpha-channel) ... Choose best pair among all possible pairs ... confidence have higher smoothing weights) ...

Improving Statistical Machine Translation Using ...
5http://www.fjoch.com/GIZA++.html. We select and annotate 33000 phrase pairs ran- ..... In AI '01: Proceedings of the 14th Biennial Conference of the Canadian ...

Improving Dependency Parsers using Combinatory ...
[email protected], 1tdeoskar,[email protected] ... Dependency parsers can recover much of the .... sues, as the data for Hindi is small. We provided.

Small-sample Reinforcement Learning - Improving Policies Using ...
Small-sample Reinforcement Learning - Improving Policies Using Synthetic Data - preprint.pdf. Small-sample Reinforcement Learning - Improving Policies ...

Narrative Summarization
(1a) [PBS Online NewsHour with Jim Lehrer; August 17, 1999; Julian ..... A plot of a narrative, in this account, is structured as a graph, where edges links ..... TO OPEN (E0.3) THE PRESENT, WHICH IS ACTUALIZED BY HER OPENING THE ... However, this su

GPUQP: Query Co-Processing Using Graphics Processors - hkust cse
on how GPUs can be programmed for heavy-duty database constructs, such as ... 2. PRELIMINARIES. As the GPU is designed for graphics applications, the basic data .... processor Sorting for Large Database Management. SIGMOD 2006: ...

A Simple Linear Ranking Algorithm Using Query ... - Research at Google
we define an additional free variable (intercept, or benchmark) for each ... We call this parameter .... It is immediate to apply the ideas here within each category. ... international conference on Machine learning, pages 129–136, New York, NY, ..

Using lexico-semantic information for query expansion ...
Using lexico-semantic information for query expansion in passage retrieval for question answering. Lonneke van der Plas. LATL ... Information retrieval (IR) is used in most QA sys- tems to filter out relevant passages from large doc- ..... hoofdstad