A Qualitative Analysis of a Corpus of Opinion Summaries based on Aspects
Roque López, Lucas Avanço, Pedro Balage, Alessandro Bokan, Paula Cardoso, Márcio Dias, Fernando Nóbrega, Marco Sobrevilla, Jackson Souza, Andressa Zacarias, Ariani
Di Felippo, Eloize Seno, Thiago Pardo
Interinstitutional Center for Computational Linguistics (NILC)
Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo/Brazil
1. Problem and Motivation
• Aspect-based opinion summarization generates summaries of opinions
for the main aspects of an object or entity [1].
• Few available corpora for aspect-based opinion summarization.
• The scarcity of corpora in this task has been a limiting factor for many
research works.
• A corpus might help in the identification of errors in automatic methods
and, consequently, in the improvement of their performance.
• It could be used in machine learning methods as training and testing
data.
• It might help in the evaluation of how people generate summaries of
opinions with regards to task difficulty, aspect coverage and sentiment
orientation
2. Corpus Annotation
• OpiSums-PT: a corpus of opinion summaries based on aspects, written
in Brazilian Portuguese.
• Two domains: 13 book (ReLi corpus [2]) and 4 electronic product
(Buscapé website) reviews.
• 5 extractive summaries and 5 abstractive summaries.
• 14 participants with strong knowledge in Computational Linguistics.
• Each annotator read 10 opinions about books or electronic products.
• Extractive and abstractive summaries: 100 words ( ±10 words).
• Annotators received a training session together with the annotation
manual.
• The list of possible aspects was known in advance.
Features
Extractive
Summaries
Abstractive
Summaries
Summaries
85
85
Sentences
534
430
Tokens
8435
8611
Types
1702
1833
Average sentences by summary
6.3
5.1
Average tokens by summary
99.2
101.3
Average types by summary
71.1
72.4
Tab 1. OpiSums-PT statistics
Extractive Summaries
• The final summary was composed by complete sentences.
• Lack of cohesion among summary sentences was notorious.
A Smartphone almost perfect! <D3_S1>
What I liked: Today is the best on the market in relation to its processing. <D2_S3>
The battery lasts a lot and its installed applications are great. <D7_S5>
The camera is wonderful. <D7_S4>
What I did not like: It heats a little at the bottom but not enough to bother, in white color it seems very fragile and
the S Voice does not work yet in Portuguese. <D3_S5>
I expected more of Galaxy SIII due to the suspense that Samsung promoted. <D2_S1>
After that, who has the courage to invest around R$ 1,700.00 in Galaxy SIII or try luck with the Galaxy S4? <D6_S9>
Download the Corpus
Abstractive Summaries
Books/Electronic Products
• Annotators generate summaries as rewritten as possible.
• Summaries did not present the problem of lack of cohesion.
Capitães da Areia (4)
Crepúsculo (6)
Ensaio sobre a Cegueira (4)
Fala sério, amiga! (2)
Fala sério, amor! (4)
Fala sério, mãe! (6)
Fala sério, pai! (2)
Fala sério, professor! (2)
O Apanhador nos Campos de Centeio (4)
O Outro lado da meia noite (5)
O Reverso da Medalha (4)
Se houver Amanhã (5)
1984 (5)
Galaxy S III (9)
Iphone 5 (9)
LG Smart TV (7)
Samsung Smart TV (5)
Average
The vast majority of readers evaluated negatively Twilight book, because, in general, they argued that it
has an exaggerated romance. Among the main disadvantages of this book, readers mentioned that
characters are superficial, the writing is bad and the story is boring. Many users were not able to finish
the reading of the book and they would not recommend it to other people. On the other hand, another
small part of readers think that Twilight book is good, because they considered it intense, romantic, full of
mysteries and amazing. These readers said that, although Twilight is a fictional book, it shows the
importance of the true love.
3. Inter-Annotator Agreement
• Extractive summaries: ROUGE-1 [3], Kappa [4] and the percentage of
common sentences in the summaries.
• Abstractive summaries: ROUGE-1
• Kappa value obtained was 0.185
• Many different sentences that express the same meaning.
• ROUGE-1: in abstractive summaries, annotators have independence to
use different words, possibly synonyms and paraphrases.
• Table 2: it is difficult to generate similar opinion summaries based on
aspects, even among humans.
Abstractive
Summary
Extractive Summary
Books/Electronic Products
Capitães da Areia
Crepúsculo
Ensaio sobre a Cegueira
Fala sério, amiga!
Fala sério, amor!
Fala sério, mãe!
Fala sério, pai!
Fala sério, professor!
O Apanhador nos Campos
de Centeio
O Outro lado da meia noite
O Reverso da Medalha
Se houver Amanhã
1984
Iphone 5
Galaxy S III
LG Smart TV
Samsung Smart TV
Average
Total
Agreement
Majority
Agreement
Minority
Agreement
No
Agreement
ROUGE-1
0.000
0.000
0.000
0.077
0.118
0.000
0.000
0.000
0.267
0.286
0.043
0.154
0.118
0.222
0.143
0.235
0.200
0.357
0.217
0.154
0.294
0.167
0.143
0.353
0.533
0.357
0.739
0.615
0.471
0.611
0.714
0.412
0.405
0.414
0.250
0.606
0.600
0.325
0.418
0.344
0.218
0.239
0.251
0.299
0.287
0.308
0.352
0.345
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.011
0.091
0.136
0.100
0.200
0.263
0.308
0.100
0.040
0.238
0.173
0.409
0.182
0.250
0.200
0.316
0.154
0.200
0.240
0.333
0.245
0.500
0.682
0.650
0.600
0.421
0.538
0.700
0.720
0.429
0.570
0.360
0.392
0.339
0.471
0.366
0.342
0.235
0.274
0.451
0.388
0.253
0.232
0.305
0.309
0.238
0.230
0.276
0.270
0.270
0.275
ROUGE-1
Extractive Summary
Abstractive Summary
0.450
0.467
0.300
1.000
0.550
0.400
0.800
0.700
0.550
0.800
0.650
0.640
0.600
0.333
0.444
0.514
0.720
0.583
0.700
0.567
0.600
1.000
0.550
0.767
0.900
1.000
0.800
0.760
0.800
0.680
0.760
0.400
0.578
0.714
0.760
0.726
Tab 3. Coverage of aspects in summaries
5. Sentiment Orientation
• Summaries must preserve the polarity distribution as much as
possible to reflect the overall sentiment.
• Sentiment in extractive summary: annotations of ReLi and Buscapé.
• Sentiment in abstractive summary: lexicon-based method [5].
• In general, annotators reflected the sentiment distribution in the
source opinions in the summaries.
• Few cases where the sentiment orientation of summaries is opposite
to the source opinions (marked in red).
Books/Electronic
Products
Capitães da Areia
Crepúsculo
Ensaio sobre a Cegueira
Fala sério, amiga!
Fala sério, amor!
Fala sério, mãe!
Fala sério, pai!
Fala sério, professor!
O Apanhador nos
Campos de Centeio
O Outro lado da meia
noite
O Reverso da Medalha
Se houver Amanhã
1984
Galaxy S III
Iphone 5
LG Smart TV
Samsung Smart TV
Tab 2. Annotators agreement results
Actual Polarity
Extractive Summary
Abstractive Summary
Positive
Negative
Positive
Negative
Positive
Negative
0.784
0.391
0.812
0.895
0.968
0.510
0.842
0.621
0.216
0.609
0.188
0.105
0.032
0.490
0.158
0.379
0.978
0.075
0.880
0.960
0.980
0.680
0.877
0.791
0.022
0.925
0.120
0.040
0.020
0.320
0.123
0.209
0.370
0.510
0.471
0.723
0.967
0.569
0.950
0.686
0.630
0.490
0.529
0.277
0.033
0.431
0.050
0.314
0.300
0.700
0.204
0.796
0.283
0.717
0.705
0.667
0.867
0.757
0.584
0.975
0.622
0.556
0.295
0.333
0.133
0.243
0.416
0.025
0.378
0.444
0.667
0.521
0.952
0.877
0.272
0.971
0.674
0.502
0.333
0.479
0.048
0.123
0.728
0.029
0.326
0.498
0.633
0.558
0.716
0.627
0.460
0.810
0.753
0.536
0.367
0.442
0.284
0.573
0.540
0.190
0.247
0.464
Tab 4. Sentiment orientation of summaries
4. Aspect Coverage
6. Conclusions
• An indicator of how many aspects from the source opinions are
preserved in the summary.
• Extractive summaries: annotators are limited to the content of the
source opinion’s sentences.
• Abstractive summaries: wider coverage because annotators have less
restriction to write the summary.
• People consider only some aspects in the summary (not all).
• Human summaries are diversified and people generate summaries
only for some aspects, keeping the overall sentiment orientation with
little variation.
• This corpus could assist future opinion summarization researches.
Research supported by Samsung Eletrônica da Amazônia Ltda/Brazil
References
[1] Jack G. Conrad, Jochen L. Leidner, Frank Schilder, and Ravi Kondadadi. 2009. Query-based Opinion Summarization for Legal Blog Entries. In Proceedings of
the 12th International Conference on Artificial Intelligence and Law, pp.167–176. ACM.
[2] Cláudia Freitas, Eduardo Motta, Ruy Milidiú , and Juliana Cesar. 2013. Sparkle Vampire LoL! Annotating Opinions in a Book Review Corpus. In 11th Corpus
Linguistics Conference.
[3] Chin-Yew Lin. 2004. Looking for a Few Good Metrics: Automatic Summarization Evaluation-How many Samples are Enough? In Proceedings of the NTCIR
Workshop, pp. 1–10.
[4] Jean Carletta. 1996. Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics, 22(2):249–254.
[5] Pedro Balage Filho, Thiago Pardo, and Sandra Aluísio. 2013. An Evaluation of the Brazilian Portuguese LIWC Dictionary for Sentiment Analysis. In
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (STIL), pp. 215–219. Sociedade Brasileira de Computacao.
Download

poster