Multi-Task Text Segmentation and Alignment Based on Weighted Mutual Information Bingjun Sun*, Ding Zhou*, Hongyuan Zha*, John Yen† *Department of Computer Science and Engineering,† College of Information Sciences and Technology The Pennsylvania State University, University Park, PA 16802 {bsun,dzhou,zha}@cse.psu.edu,

ABSTRACT Text segmentation is important for text analysis, while text alignment is to determine shared sub-topics among similar documents. Multi-task text segmentation and alignment is the extension of single-task segmentation to utilize information of multi-source documents. In this paper we introduce a novel domain-independent unsupervised method for multitask segmentation and alignment based on the idea that the optimal segmentation and alignment maximizes weighted mutual information, mutual information with term weights. The experiment results show that our approach works well. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and RetrievalClustering; I.2.7 [Artificial Intelligence]: Natural Language Processing-Text analysis; General Terms: Algorithms, Design, Experimentation. Keywords: Multi-task, text segmentation, text alignment, weighted mutual information.

1.

INTRODUCTION

Text segmentation tasks are to determine the boundaries of sentence sequences to capture the latent structures. Some previous approaches consider sentence dependence, such as HMM or CRF, while others are based on sentence similarity[3, 7, 8], which suffer the effect of stop words. Text classification and clustering is a related area which categorizes documents into groups, such as LSA[4], PLSA[6], and approaches based on mutual information (M I)[1, 5]. Traditional text segmentation approaches usually focused on single tasks. Multi-task learning[2] is an potential direction, but most of previous multi-task approaches focus on supervised or semi-supervised learning, instead of on clustering or segmentation. In this paper, we extend research from single-task to multi-task. We view the text segmentation issue as an optimization issue in information theory to find the optimal boundaries given the number of segments which minimize the loss of M I after segmentation. Text alignment of multi-source documents can be achieved by clustering sentences about the same sub-topic into the same segment. Term weights based on entropy learned from multi-source documents and weighted M I (W M I) is used to increase the contribution of cue words and decrease the effect of common stop words, noisy word, and document-

[email protected]

dependent stop words, which are removed before segmentation in methods based on sentence similarity.

2. PROBLEM FORMULATION Let T be the term set {t1 , t2 , ..., tl }, appearing in the document set D, {d1 , d2 , ..., dm }. Let Sd , {s1 , s2 , ..., snd } be the sentence set for d ∈ D. The probability distribution P (D, Sd , T ) is estimated as p(t, d, s) = T (t, d, s)/ND , where T (t, d, s) is the number of t in d’s sentence s and ND is the total term frequency in D. Sˆ represents the segment set {ˆ s1 , sˆ2 , ..., sˆp } after segmentation where the segment number ˆ = p. The multi-task text segmentation and alignment |S| with term co-clustering is to find the optimal term clustering mapping Clu(t) : {t1 , t2 , ..., tl } → {tˆ1 , tˆ2 , ..., tˆk }, where k ≤ l is the number of clusters, and the optimal segmentation and alignment mapping Segd (si ) : {s1 , s2 , ..., snd } → s01 , sˆ02 , ..., sˆ0p } → {ˆ s1 , sˆ2 , ..., sˆp }, {ˆ s01 , sˆ02 , ..., sˆ0p } and Alid (ˆ s0i ) : {ˆ ∀d ∈ D, with the constraint that only adjacent sentences can map to the same segment. M I can measure the amount of information in several random variables[5]. In the segmentation task, our goal is to find the best solution to maximize ˆ = Pˆ ˆ P ˆ p(tˆ, sˆ)log p(tˆ,ˆs) after segmentation I(Tˆ; S) t∈T s ˆ∈S p(tˆ)p(ˆ s) and alignment. Similar as tf-idf weight, we define weights for four types of terms. Common stop words are common ˆ Document-dependent stop words are both along D and S. common only along Sˆ for some d. Cue words which are common along D only for some sˆ. Noisy words are others. To reinforce the contribution of Cue words, we introduce E (tˆ)

ˆ

(t) ˆ a b S term weight wtˆ = ( max ED(E ˆ0 )) ) (1 − max ˆ0 ˆ (E ˆ (tˆ0 )) ) , D (t ˆ ˆ0 ∈T S t t ∈T P 1 ˆ ˆ), where ED (tˆ) = ˆ (t d∈D p(d|t)log|D| p(d|tˆ) , similar for ES ˆ

wtˆp(t,ˆ s) and usually a = b = 1 to adjust pw (tˆ, sˆ) = P ˆ s) , ˆ ;ˆ ˆ wt ˆp(t,ˆ ˆ∈T t s∈ S P P ˆ ˆ = ˆ ˆ ˆ, sˆ)log pwˆ(t,ˆs) . and Iw (Tˆ; S) ˆ pw (t t∈T s ˆ∈S p (t)p (ˆ s) w

w

3. METHODOLOGY Since term weights depending on text segmentation and alignment are unknown and the problem is NP-hard, an iterative greedy algorithm is proposed to find a local maximum with simultaneous weight estimation. It can find the global optimum for single tasks without term co-clustering. For (t) ES (t) initialization, wt = ( max 0ED(E 0 )(1 − max 0 0 ), D (t )) t ∈T t ∈T (ES (t )) P P 1 1 where ES (t) = |Dt | d∈Dt (1 − s∈Sd p(s|t)log|Sd | p(s|t) ), (0)

Copyright is held by the author/owner(s). CIKM’06, November 5–11, 2006, Arlington, Virginia, USA. ACM 1-59593-433-2/06/0011.

where Dt is the set of d which contain t. Then, for Segd , we can simply segment documents equally, or we can find (0) the optimal segmentation just for each d so that Segd =

Range of n C99[8] U00[3] ADDP03[7] M Il W M Il

3-11 12% 10% 6.0% 4.68% 4.94%

3-5 11% 9% 6.8% 5.57% 6.33%

6-8 10% 7% 5.2% 2.59% 2.76%

9-11 9% 5% 4.3% 1.59% 1.62%

ˆ where w(0) are used. For Ali(0) , we can argmaxsˆIw (T ; S), d tˆ first assume that the segment order for each d is the same. (0) For Clut , cluster labels can be set randomly. Then there are three stages, where the first is for single tasks without term clustering, while the other two are both iterative. Stage 2 is for term clustering, while Stage 3 is for term weight estimation. The algorithm is listed below: Input: P (D, Sd , T ), p ∈ {2, ..., max(sd )}, k ∈ {2, ..., l}, w ∈ {0, 1}. Output: Clu, Seg, Ali, wtˆ. (1) i = 0. Initial(0) (0) (0) (0) ize Clut , Segd , Clud , and wtˆ ; (2) If |D| = 1, k = l, and w = 0, check all segmentations of d and find the best ˆ return; (3) If k 6= l, ∀t, find the Segd = argmaxsˆI(Tˆ; S), (i+1) best tˆ so Clut = argmaxtˆIw (Tˆ; Sˆ(i) ) based on Seg (i) and Ali(i) ; (4) ∀d, check all segmentations of d with mapping (i+1) (i+1) si → sˆ, i ∈ 1, ..., nd and find the best Segd &Alid = (i+1) (i+1) ˆ ˆ argmaxsˆIw (T ; S) based on Clut ; (5) If Clu, Seg, or Ali changed, i + +, go to 3; otherwise, if w = 2, go to (i+1) 6, else return. (6) Update wtˆ based on Seg (i) , Ali(i) , Clu; (7) ∀d, check all segmentations of d with mapping (i+1) (i+1) si → sˆ, i ∈ 1, ..., nd and find the best Segd &Alid = (i) (i+1) ˆ ˆ ˆ argmaxsˆIw (T ; S) based on Clu and wtˆ ; (8) If Iw (Tˆ; S) not changed, return; else, i + +, go to 6. Dynamic programming is used for each step. We only show the steps for Step 7 below: ∀d: (1) Compute pw (tˆ), partial pw (tˆ, sˆ) and pw (ˆ s), and P Iw (Tˆ; sˆk (si , si+1 , ..., sj )). (2) Let M (sm , 1, k) = P Iw (Tˆ; sˆk (s1 , s2 , ..., sm )), where k ∈ {1, 2, ..., p}. M (sm , L, kL ) = maxi,j [M (si−1 , L − 1, kL/j ) +P Iw (Tˆ; sˆAlid (ˆs0L )=j (si , si+1 , ..., sm ))], where 0 ≤ m ≤ nd , 1 < L < p, 1 ≤ i ≤ m + 1, kL ∈ Set(p, L), which is the set p! of all L!(p−L)! combinations of L segments chosen from all p segments, j ∈ kL , the set of L segments chosen from all p segments, and kL/j is the combination of L − 1 segments in kL except j. (3) Finally, M (snd , p, kp ) = maxi,j [M (si−1 , p− 1, kp/j ) +P Iw (Tˆ; sˆAlid (ˆs0L )=j (si , si+1 , ..., snd ))], where kp is combination of all segments and 1 ≤ i ≤ nd + 1 which is the optimal Iw and the corresponding segmentation is the best.

4.

EXPERIMENTS

In this section, we refer to the method using I as M Ik , and Iw as W M Ik , where k is the number of term clusters. If k = l, no term clustering is required. The first data set is used in previous research. We use the previous evaluation criterion for comparison and tested the case with the known segment number. Table 1 shows the results with different parameters and previous approaches. For single-task W M I, term weights are computed as: wtˆ = 1 −

ˆ ES ˆ (t) ˆ0 . maxtˆ0 ∈Tˆ (ES ˆ (t ))

Obviously, our methods M Il and W M Il both outperform the previous approaches. We found using term co-clustering is worse. The data set for multi-task has 102 samples and 2264 sentences totally. Each is the introduction of a report from

Table 2: Error Rates of Multi-task Segmentation #Task 102 51 34 20 10 5 2

M Il 3.14% 4.17% 5.06% 7.08% 10.38% 15.77% 25.90%

W M Il 2.78% 3.63% 4.12% 5.42% 7.89% 11.64% 23.18%

k 300 300 300 250 250 250 50

M Ik 4.68% 17.83% 18.75% 20.40% 21.42% 21.89% 25.44%

W M Ik 6.58% 22.84% 20.95% 21.83% 21.91% 22.59% 25.49%

0.35 MI:a=0,b=0 WMI:a=1,b=1 WMI:a=1,b=0 WMI:a=2,b=1

0.3

0.25

Error Rate

Table 1: Error Rates of Single-task Segmentation

0.2

0.15

0.1

0.05

0

1

2

5

10 20 Task Number

34

51

102

Figure 1: Error rates for different hyper parameters of term weights w/o term clustering. Biol 240W, Penn. State Univ. Each has two segments. Some only have one segment or have a reverse order. We labelled each sentence manually for evaluation. The P Pcriterion is: p(err|pred, real) = d∈D,s∈Sd 1(preds 6=reals ) / d∈D nd . We compared our method with different parameters on different partitions of the data set. Except the cases that the task number is 102 or one, we randomly divided the set into partitions, each with 51, ..., or 2 samples. Then we applied our methods. Results are shown in Table 2, we can see that when the task number increases, all methods are better. W M Il is always better than M Il . Using term clustering is worse. We also tested W M Il with different parameters of a and b, shown in Figure 1, and a = 1, b = 1 gave the best results.

5. REFERENCES [1] R. Bekkerman, R. El-Yaniv, and A. McCallum. Multi-way distributional clustering via pairwise interactions. In Proc. ICML, 2005. [2] R. Caruana. Multitask learning. Machine Learning, 28:41–75, 1997. [3] F. Choi. Advances in domain indepedent linear text segmentation. In Proc. NAACL, pages 26–33, 2000. [4] S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Systems, 1990. [5] I. Dhillon, S. Mallela, and D. Modha. Information-theoretic co-clustering. In Proc. SIGKDD, pages 89–98, 2003. [6] T. Hofmann. Probabilistic latent semantic analysis. In Proc. UAI, 1999. [7] X. Ji and H. Zha. Domain-independent text segmentation using anisotropic diffusion and dynamic programming. In Proc. SIGIR, pages 322–329, 2003. [8] M. Utiyama and H. Isahara. A statistical model for domain-independent text segmentation. In Proc. ACL, pages 491–498, 1999.

Multi-Task Text Segmentation and Alignment Based on ...

Nov 11, 2006 - a novel domain-independent unsupervised method for multi- ... tation task, our goal is to find the best solution to maximize. I( ˆT; ˆS) = ∑. ˆt∈ ˆ.

132KB Sizes 1 Downloads 298 Views

Recommend Documents

Segmentation of Markets Based on Customer Service
Free WATS line (800 number) provided for entering orders ... Segment A is comprised of companies that are small but have larger purchase ... Age of business.

Outdoor Scene Image Segmentation Based On Background.pdf ...
Outdoor Scene Image Segmentation Based On Background.pdf. Outdoor Scene Image Segmentation Based On Background.pdf. Open. Extract. Open with.

Spatiotemporal Video Segmentation Based on ...
The biometrics software developed by the company was ... This includes adap- tive image coding in late 1970s, object-oriented GIS in the early 1980s,.

Elastic Motif Segmentation and Alignment of Time ...
1NTU IoX Center, National Taiwan University, Taipei, Taiwan. 2Department of Applied ... Time series (TS) learning tasks usually succeed identification of motifs and data cleaning from dynamics ..... recorded with EcoBT Mini sensor (http://epl.tw/ecob

Discriminative Topic Segmentation of Text and Speech
sults in difficulties for algorithms trying to discover top- ical structure. We create ... topic content in a generally topic-coherent observation stream, we employ the ...

Discriminative Topic Segmentation of Text and Speech
Appearing in Proceedings of the 13th International Conference on Artificial Intelligence ... cess the input speech or text in an online way, such as for streaming news ..... segmentation quality measure that we call the Topic Close- ness Measure ...

Degrees-of-Freedom Based on Interference Alignment ...
Dec 12, 2011 - Introduction. To suppress interference between users is an important prob- lem in communication systems where multiple users share the same resources. Recently, interference alignment (IA) was introduced for fundamentally solving the i

Semi-Blind Interference Alignment Based on OFDM ...
System model of 2-user X Channel. the remaining messages (a2,b2) to be received by the Rx2, respectively. For instance, at the Rx1, the messages (a1,b1) are the desired signals while the other messages (a2,b2) become interference. Therefore, each tra

Phoneme Alignment Based on Discriminative Learning - CS - Huji
a sequence of phoneme start times rather than a single number. The main ..... A direct search for the maximizer is not feasible since the number of .... eralization ability of on-line learning algorithms. In NIPS, ... Speaker independent phone.

Contextual Query Based On Segmentation & Clustering For ... - IJRIT
In a web based learning environment, existing documents and exchanged messages could provide contextual ... Contextual search is provided through query expansion using medical documents .The proposed ..... Acquiring Web. Documents for Supporting Know

Contextual Query Based On Segmentation & Clustering For ... - IJRIT
Abstract. Nowadays internet plays an important role in information retrieval but user does not get the desired results from the search engines. Web search engines have a key role in the discovery of relevant information, but this kind of search is us

Query Segmentation Based on Eigenspace Similarity
§School of Computer Science ... National University of Singapore, .... i=1 wi. (2). Here mi,j denotes the correlation between. (wi ทททwj−1) and wj, where (wi ...

Outdoor Scene Image Segmentation Based On Background ieee.pdf ...
Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Outdoor Scen ... und ieee.pdf. Outdoor Scen ... und ieee.pdf. Open. Extract. Open with. Sign I

A Meaningful Mesh Segmentation Based on Local Self ...
the human visual system decomposes complex shapes into parts based on valleys ... use 10í4 of the model bounding box diagonal length for all the examples ...

Segmentation of Mosaic Images based on Deformable ...
in this important application domain, from a number of points of view including ... To the best of our knowledge, there is only one mosaic-oriented segmentation.

Query Segmentation Based on Eigenspace Similarity
University of Electronic Science and Technology. National ... the query ”free software testing tools download”. ... returns ”free software” or ”free download” which.

A Meaningful Mesh Segmentation Based on Local ...
[11] A. Frome, D. Huber, R. Kolluri, T. Bulow and J. Malik,. “Recognizing Objects in Range Data Using Regional Point. Descriptors,” In: Proc. of Eighth European Conf. Computer. Vision, 2004, vol. 3, pp. 224-237. [12] D. Huber, A. Kapuria, R.R. Do

Query Segmentation Based on Eigenspace Similarity
the query ”free software testing tools download”. A simple ... returns ”free software” or ”free download” which ..... Conf. on Advances in Intelligent Data Analysis.

Interactive Segmentation based on Iterative Learning for Multiple ...
Interactive Segmentation based on Iterative Learning for Multiple-feature Fusion.pdf. Interactive Segmentation based on Iterative Learning for Multiple-feature ...

Robust Obstacle Segmentation based on Topological ...
persistence diagram that gives a compact visual representation of segmentation ... the 3D point cloud estimated from the dense disparity maps computed ..... [25] A. Zomorodian and G. Carlsson, “Computing persistent homology,” in Symp. on ...

Text and data mining eighteenth century based on ...
COMHIS Collective. BSECS Conference ... Initial data. Evolving set of analysis and processing tools ... statistical summaries and data analysis - work in progress.