Mining Vietnamese Comparative Sentences for Sentiment Analysis Ngo Xuan Bach+* Nguyen Dinh Tai+

Pham Duc Van+ Tu Minh Phuong+*

+

Department of Computer Science, PTIT, Vietnam

*

Machine Learning & Applications Lab, PTIT, Vietnam

KSE 2015, Ho Chi Minh City - Vietnam, October 2015

Sentiment analysis & Opinion mining Analyzing opinionated texts, such as opinions, emotions, sentiments, evaluations, beliefs, and speculations



o o

Help customers in choosing products and services Provide useful information for companies and vendors in marketing and market studies

Which hotel should I stay?

2

Ngo Xuan Bach

Sentiment classification Most existing work in sentiment analysis and opinion mining focuses on sentiment classification

Classify sentences/documents (e.g. reviews) based on the overall sentiments expressed by authors



o

positive, negative and (possibly) neutral

Examples



It was a wonderful trip. That hotel provides very bad services. 3

Ngo Xuan Bach

Mining comparative sentences An important task in sentiment analysis and opinion mining



o

Comparative sentences have specific structures 

o

Compare two entities (or sets of entities) in some features or aspects

Several work has been done for English and some other languages

Consists of two subtasks



o

Identifying comparative sentences 

o

Recognition of relations 

4

Identify comparative sentences in documents and classify them into some types Recognize entities, features, and comparing words in a comparative sentence Ngo Xuan Bach

An example The display quality of mobile phone X is better than that of mobile phone Y

Identifying comparative sentences



o

Sentence type:

non-equative comparative sentence

Recognition of relations



o o o

Two entities: mobile phone X and mobile phone Y Features: display quality Comparing words: better than “mobile phone X” is the preferred entity

5

Ngo Xuan Bach

This work Presents a framework for mining Vietnamese comparative sentence



o o

Subtask 1: a classification problem Subtask 2: a sequence learning problem

Introduces a corpus for the task



o

The domain of electronic devices

Describes a series of experiments on the task



o

Different learning methods and feature sets

The first work conducted for Vietnamese

6

Ngo Xuan Bach

Outline Motivation Our method Experiments Summary

 

 

7

Ngo Xuan Bach

A framework for mining Vietnamese comparative sentences

The focus of this work

8

Ngo Xuan Bach

Identifying comparative sentences We consider 3 types of comparative sentences

Equative



o

The camera of mobile phone X is similar to the one of mobile phone Y

Non-equative



o

The camera of mobile phone X is better than the one of mobile phone Y

Superlative



o

9

Iphone 5S is the most expensive one in the Iphone series

Ngo Xuan Bach

Identifying comparative sentences 

We model the subtask as a classification problem o

o



Learning methods o o



Input: a sentence Output: 1 (equative), 2 (non-equative), 3 (superlative), 0 (noncomparative)

Maximum Entropy Models (Berger et al., 1996) Support Vector Machines (Vapnik, 1998)

Features o

10

Words, syllables, n-grams

Ngo Xuan Bach

Relation recognition  

Input: a comparative sentence Output: entities, features, and comparing words

11

Ngo Xuan Bach

Relation recognition 

We model the subtask as a sequence learning problem o



Learning method o



A sentence is a sequence of words (or syllables) CRF (Lafferty et al., 2001)

Use IOB notation

Examples of sequence labels in a syllable-based model 12

Ngo Xuan Bach

Experiments

13

Ngo Xuan Bach

Datasets 

Collected from newspaper on the domain of electronic devices o



VnReview1 and TinhTe2

Contains 4000 Vietnamese sentences (1000 sentences for each types of comparative sentences and 1000 noncomparative sentences) o o o

5119 entities 2942 features 1087 comparing words (only in non-equative type)

1http://vnreview.vn 2https://www.tinhte.vn 14

Ngo Xuan Bach

Experimental Setups 

Subtask 1 o

o o o



Using all 4000 sentences 5-fold cross validation test Tools: LibSVM1 with RBF Kernel Measures: Accuracy, Precision, Recall, and the F1 score

Subtask 2 o o o

o

Using 3000 comparative sentences 5-fold cross validation test Tools: CRF++2 by Kudo Measures: Precision, Recall, and the F1 score

1https://www.csie.ntu.edu.tw/~cjlin/libsvm/ 2http://taku910.github.io/crfpp/ 15

Ngo Xuan Bach

Comparative sentence identification Experimental results using SVM Feature Extraction Method Syllable-based

Word-based

 

Feature Set

Accuracy (%)

1-grams

83.27

1-grams + 2-grams

86.30

1-grams + 2-grams + 3-grams

84.31

1-grams

82.59

1-grams + 2-grams

86.11

1-grams + 2-grams + 3-grams

83.22

Syllable-based method got better results than word-based method in all three cases of feature sets Using 1-grams and 2-grams achieved the best results for both methods 16

Ngo Xuan Bach

Comparative sentence identification Experimental results using SVM for each sentence type



Sentence type

Precision (%)

Recall (%)

F1(%)

Equative

86.93

92.00

89.38

Non-equative

82.18

80.51

81.32

Superlative

93.70

89.97

91.79

Superlative sentences had the highest F1 score o

o

Usually contain some specific phrases, such as “the best”, “the worst”, and “all others” The structure of superlative sentences is different from other types 

17

Equative and non-equative sentences compare two entities, superlative sentences compare an entity with all the others Ngo Xuan Bach

Comparative sentence identification SVM vs. MEM

18

Ngo Xuan Bach

Relation recognition Experimental results using CRF with different feature sets

Model

 

Precision (%)

Recall (%)

F1(%)

Window size = 1

90.00

81.33

85.89

Window size = 2

91.21

81.66

86.17

Window size = 3

91.36

81.73

86.28

Without POS tags

91.71

77.52

84.02

Using window size 3 got the best results In general, the window sizes did not affect very much to the experimental results

19

Ngo Xuan Bach

Relation recognition Experimental results (F1 score) of relation recognition in detail Entity

Feature

Comparing Word

Window size = 1

93.62

76.88

73.06

Window size = 2

93.44

78.04

73.74

Window size = 3

93.33

78.52

73.48

Without POS tags

91.64

75.75

70.79

Model

 

The first 3 models achieved nearly the same results POS tags played an important role in relation recognition

20

Ngo Xuan Bach

Relation recognition Experimental results on each type of sentence

Model

Entity

Feature

Pre(%)

Rec(%)

F1(%)

Pre(%)

Rec(%)

F1(%)

Equative

95.78

82.35

88.56

83.33

63.39

72.00

Non-equative

95.10

91.35

93.19

83.80

65.50

73.53

Superlative

95.50

92.79

94.12

88.49

73.00

80.00



Similar to the first subtask, we achieved the highest results on superlative comparison sentences on both entities and features

Summary

22

Ngo Xuan Bach

Summary 

Presented an empirical study on mining Vietnamese comparative sentences o o



Introduced a new corpus for this task o



Subtask 1: Identifying comparative sentences Subtask 2: Recognition of relations 4000 Vietnamese sentences

Our model got promising results o o

23

Subtask 1: 86.30% accuracy Subtask 2: 93.33%, 78.52%, and 73.48% in the F1 score on recognition of entities, features, and comparing words, respectively Ngo Xuan Bach

Summary 

Future work o

Study joint models 

o

Complete the task 

24

Identify comparative sentences and recognize relations simultaneously Identify the overall opinion of comparative sentences

Ngo Xuan Bach

Thank you for your attention!

Learning Semantic Correspondences with Less ...

Department of Computer Science, PTIT, Vietnam. Machine Learning & Applications Lab, PTIT, Vietnam. KSE 2015, Ho Chi Minh City - Vietnam, October 2015. +*.

705KB Sizes 1 Downloads 274 Views

Recommend Documents

Learning Semantic Correspondences with Less ...
Analyzing the Logical Structure of Law Sentences ..... Kudo, T.: Yet Another Japanese Dependency Structure Analyzer. http://chasen.org/ taku/software/cabocha/.

Learning Semantic Correspondences with Less ...
of user-generated text. ▻ Have become an important source for both data mining and NLP communities. ▻ Require appropriate tools for text analysis.

Doing Moore with Less – Leapfrogging Moore's ... - Semantic Scholar
Dec 9, 2016 - workstation operated by CentOS 7 and equipped with an Intel core i7 4770 (3.40 GHz) and 16GB of DDR3 RAM. The processor had. 4 cores, each with 2 hyperthreads, giving 8 logical CPUs. The Intel core i7 4770 processor has 2 important arch

Doing Moore with Less – Leapfrogging Moore's ... - Semantic Scholar
Dec 9, 2016 - principle during the first phase, we now use a novel second phase which involves reinvesting the saved ... major source of energy consumption of microprocessors [7, 8], the advantage of shorter words will become .... The operating syste

Large-Scale Learning with Less RAM via ... - Research at Google
such as those used for predicting ad click through rates. (CTR) for sponsored ... Streeter & McMahan, 2010) or for filtering email spam at scale (Goodman et al., ...

Doing more with less: Teacher professional learning ...
Jun 2, 2008 - opportunities, including joint lesson planning and the sharing of resources; ..... report that teachers use the computers to collect materials. ..... communities: Leadership, purposeful decision making, and job embedded staff.

Doing more with less: Teacher professional learning ...
Jun 2, 2008 - Administration, Graduate School of Education, Rutgers, The State University of New Jersey, ... (Hargreaves, 2000), the culture of teaching in the United States has long been ..... in the school, or short term training sessions held at a

Informal Learning with Semantic Wikis in Enterprises
Since the emergence of Web 2.0 and its easy to use web based applications, ordinary internet users are empowered to generate content themselves. (O'Reilly 04) which significantly contributes to information growth. During the last two years, it has be

Semantic Proximity Search on Graphs with Metagraph-based Learning
process online for enabling real-time search. ..... the best form of π within the family from the training examples ..... same school and the same degree or major.

Semantic Proximity Search on Graphs with Metagraph-based Learning
social networks, proximity search on graphs has been an active .... To compute the instances of a metagraph more efficiently, ...... rankings at top 10 nodes.

Learning sequence kernels - Semantic Scholar
such as the hard- or soft-margin SVMs, and analyzed more specifically the ..... The analysis of this optimization problem helps us prove the following theorem.

Fuzzy correspondences guided Gaussian mixture ...
Sep 12, 2017 - 1. Introduction. Point set registration (PSR) is a fundamental problem and has been widely applied in a variety of computer vision and pattern recognition tasks ..... 1 Bold capital letters denote a matrix X, xi denotes the ith row of

Candidate stability and voting correspondences - Springer Link
Jun 9, 2006 - Indeed, we see that, when candidates cannot vote and under different domains of preferences, candidate stability implies no harm and insignificance. We show that if candidates cannot vote and they compare sets according to their expecte

On strategy-proof social choice correspondences: a ...
Apr 2, 2008 - support from the Ministerio de Educación y Ciencia through the Programa ... sets and a generic definition of strategy-proofness for SCCs.

Learning Articulation from Cepstral Coefficients - Semantic Scholar
Parallel and Distributed Processing Laboratory, Department of Applied Informatics,. University ... training set), namely the fsew0 speaker data from the MOCHA.

UNSUPERVISED LEARNING OF SEMANTIC ... - Research at Google
model evaluation with only a small fraction of the labeled data. This allows us to measure the utility of unlabeled data in reducing an- notation requirements for any sound event classification application where unlabeled data is plentiful. 4.1. Data

Learning, Information Exchange, and Joint ... - Semantic Scholar
Atlanta, GA 303322/0280, [email protected]. 2 IIIA, Artificial Intelligence Research Institute - CSIC, Spanish Council for Scientific Research ... situation or problem — moreover, the reasoning needed to support the argumentation process will als

Backward Machine Transliteration by Learning ... - Semantic Scholar
Backward Machine Transliteration by Learning Phonetic Similarity1. Wei-Hao Lin. Language Technologies Institute. School of Computer Science. Carnegie ...

Learning Articulation from Cepstral Coefficients - Semantic Scholar
2-3cm posterior from the tongue blade sensor), and soft palate. Two channels for every sensor ... (ν−SVR), Principal Component Analysis (PCA) and Indepen-.

Transformation-based Learning for Semantic parsing
semantic hypothesis into the correct semantics by applying an ordered list of transformation rules. These rules are learnt auto- matically from a training corpus ...

Organizational Learning Capabilities and ... - Semantic Scholar
A set of questionnaire was distributed to selected academic ... Key words: Organizational learning capabilities (OLC) systems thinking Shared vision and mission ... principle and ambition as a guide to be successful. .... and databases.