A Qualitative Analysis of a Corpus of Opinion Summaries based on Aspects Roque López, Lucas Avanço, Pedro Balage, Alessandro Bokan, Paula Cardoso, Márcio Dias, Fernando Nóbrega, Marco Sobrevilla, Jackson Souza, Andressa Zacarias, Ariani Di Felippo, Eloize Seno, Thiago Pardo Interinstitutional Center for Computational Linguistics (NILC) Institute of Mathematical and Computer Sciences (ICMC), University of São Paulo/Brazil 1. Problem and Motivation • Aspect-based opinion summarization generates summaries of opinions for the main aspects of an object or entity [1]. • Few available corpora for aspect-based opinion summarization. • The scarcity of corpora in this task has been a limiting factor for many research works. • A corpus might help in the identification of errors in automatic methods and, consequently, in the improvement of their performance. • It could be used in machine learning methods as training and testing data. • It might help in the evaluation of how people generate summaries of opinions with regards to task difficulty, aspect coverage and sentiment orientation 2. Corpus Annotation • OpiSums-PT: a corpus of opinion summaries based on aspects, written in Brazilian Portuguese. • Two domains: 13 book (ReLi corpus [2]) and 4 electronic product (Buscapé website) reviews. • 5 extractive summaries and 5 abstractive summaries. • 14 participants with strong knowledge in Computational Linguistics. • Each annotator read 10 opinions about books or electronic products. • Extractive and abstractive summaries: 100 words ( ±10 words). • Annotators received a training session together with the annotation manual. • The list of possible aspects was known in advance. Features Extractive Summaries Abstractive Summaries Summaries 85 85 Sentences 534 430 Tokens 8435 8611 Types 1702 1833 Average sentences by summary 6.3 5.1 Average tokens by summary 99.2 101.3 Average types by summary 71.1 72.4 Tab 1. OpiSums-PT statistics Extractive Summaries • The final summary was composed by complete sentences. • Lack of cohesion among summary sentences was notorious. A Smartphone almost perfect! <D3_S1> What I liked: Today is the best on the market in relation to its processing. <D2_S3> The battery lasts a lot and its installed applications are great. <D7_S5> The camera is wonderful. <D7_S4> What I did not like: It heats a little at the bottom but not enough to bother, in white color it seems very fragile and the S Voice does not work yet in Portuguese. <D3_S5> I expected more of Galaxy SIII due to the suspense that Samsung promoted. <D2_S1> After that, who has the courage to invest around R$ 1,700.00 in Galaxy SIII or try luck with the Galaxy S4? <D6_S9> Download the Corpus Abstractive Summaries Books/Electronic Products • Annotators generate summaries as rewritten as possible. • Summaries did not present the problem of lack of cohesion. Capitães da Areia (4) Crepúsculo (6) Ensaio sobre a Cegueira (4) Fala sério, amiga! (2) Fala sério, amor! (4) Fala sério, mãe! (6) Fala sério, pai! (2) Fala sério, professor! (2) O Apanhador nos Campos de Centeio (4) O Outro lado da meia noite (5) O Reverso da Medalha (4) Se houver Amanhã (5) 1984 (5) Galaxy S III (9) Iphone 5 (9) LG Smart TV (7) Samsung Smart TV (5) Average The vast majority of readers evaluated negatively Twilight book, because, in general, they argued that it has an exaggerated romance. Among the main disadvantages of this book, readers mentioned that characters are superficial, the writing is bad and the story is boring. Many users were not able to finish the reading of the book and they would not recommend it to other people. On the other hand, another small part of readers think that Twilight book is good, because they considered it intense, romantic, full of mysteries and amazing. These readers said that, although Twilight is a fictional book, it shows the importance of the true love. 3. Inter-Annotator Agreement • Extractive summaries: ROUGE-1 [3], Kappa [4] and the percentage of common sentences in the summaries. • Abstractive summaries: ROUGE-1 • Kappa value obtained was 0.185 • Many different sentences that express the same meaning. • ROUGE-1: in abstractive summaries, annotators have independence to use different words, possibly synonyms and paraphrases. • Table 2: it is difficult to generate similar opinion summaries based on aspects, even among humans. Abstractive Summary Extractive Summary Books/Electronic Products Capitães da Areia Crepúsculo Ensaio sobre a Cegueira Fala sério, amiga! Fala sério, amor! Fala sério, mãe! Fala sério, pai! Fala sério, professor! O Apanhador nos Campos de Centeio O Outro lado da meia noite O Reverso da Medalha Se houver Amanhã 1984 Iphone 5 Galaxy S III LG Smart TV Samsung Smart TV Average Total Agreement Majority Agreement Minority Agreement No Agreement ROUGE-1 0.000 0.000 0.000 0.077 0.118 0.000 0.000 0.000 0.267 0.286 0.043 0.154 0.118 0.222 0.143 0.235 0.200 0.357 0.217 0.154 0.294 0.167 0.143 0.353 0.533 0.357 0.739 0.615 0.471 0.611 0.714 0.412 0.405 0.414 0.250 0.606 0.600 0.325 0.418 0.344 0.218 0.239 0.251 0.299 0.287 0.308 0.352 0.345 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.011 0.091 0.136 0.100 0.200 0.263 0.308 0.100 0.040 0.238 0.173 0.409 0.182 0.250 0.200 0.316 0.154 0.200 0.240 0.333 0.245 0.500 0.682 0.650 0.600 0.421 0.538 0.700 0.720 0.429 0.570 0.360 0.392 0.339 0.471 0.366 0.342 0.235 0.274 0.451 0.388 0.253 0.232 0.305 0.309 0.238 0.230 0.276 0.270 0.270 0.275 ROUGE-1 Extractive Summary Abstractive Summary 0.450 0.467 0.300 1.000 0.550 0.400 0.800 0.700 0.550 0.800 0.650 0.640 0.600 0.333 0.444 0.514 0.720 0.583 0.700 0.567 0.600 1.000 0.550 0.767 0.900 1.000 0.800 0.760 0.800 0.680 0.760 0.400 0.578 0.714 0.760 0.726 Tab 3. Coverage of aspects in summaries 5. Sentiment Orientation • Summaries must preserve the polarity distribution as much as possible to reflect the overall sentiment. • Sentiment in extractive summary: annotations of ReLi and Buscapé. • Sentiment in abstractive summary: lexicon-based method [5]. • In general, annotators reflected the sentiment distribution in the source opinions in the summaries. • Few cases where the sentiment orientation of summaries is opposite to the source opinions (marked in red). Books/Electronic Products Capitães da Areia Crepúsculo Ensaio sobre a Cegueira Fala sério, amiga! Fala sério, amor! Fala sério, mãe! Fala sério, pai! Fala sério, professor! O Apanhador nos Campos de Centeio O Outro lado da meia noite O Reverso da Medalha Se houver Amanhã 1984 Galaxy S III Iphone 5 LG Smart TV Samsung Smart TV Tab 2. Annotators agreement results Actual Polarity Extractive Summary Abstractive Summary Positive Negative Positive Negative Positive Negative 0.784 0.391 0.812 0.895 0.968 0.510 0.842 0.621 0.216 0.609 0.188 0.105 0.032 0.490 0.158 0.379 0.978 0.075 0.880 0.960 0.980 0.680 0.877 0.791 0.022 0.925 0.120 0.040 0.020 0.320 0.123 0.209 0.370 0.510 0.471 0.723 0.967 0.569 0.950 0.686 0.630 0.490 0.529 0.277 0.033 0.431 0.050 0.314 0.300 0.700 0.204 0.796 0.283 0.717 0.705 0.667 0.867 0.757 0.584 0.975 0.622 0.556 0.295 0.333 0.133 0.243 0.416 0.025 0.378 0.444 0.667 0.521 0.952 0.877 0.272 0.971 0.674 0.502 0.333 0.479 0.048 0.123 0.728 0.029 0.326 0.498 0.633 0.558 0.716 0.627 0.460 0.810 0.753 0.536 0.367 0.442 0.284 0.573 0.540 0.190 0.247 0.464 Tab 4. Sentiment orientation of summaries 4. Aspect Coverage 6. Conclusions • An indicator of how many aspects from the source opinions are preserved in the summary. • Extractive summaries: annotators are limited to the content of the source opinion’s sentences. • Abstractive summaries: wider coverage because annotators have less restriction to write the summary. • People consider only some aspects in the summary (not all). • Human summaries are diversified and people generate summaries only for some aspects, keeping the overall sentiment orientation with little variation. • This corpus could assist future opinion summarization researches. Research supported by Samsung Eletrônica da Amazônia Ltda/Brazil References [1] Jack G. Conrad, Jochen L. Leidner, Frank Schilder, and Ravi Kondadadi. 2009. Query-based Opinion Summarization for Legal Blog Entries. In Proceedings of the 12th International Conference on Artificial Intelligence and Law, pp.167–176. ACM. [2] Cláudia Freitas, Eduardo Motta, Ruy Milidiú , and Juliana Cesar. 2013. Sparkle Vampire LoL! Annotating Opinions in a Book Review Corpus. In 11th Corpus Linguistics Conference. [3] Chin-Yew Lin. 2004. Looking for a Few Good Metrics: Automatic Summarization Evaluation-How many Samples are Enough? In Proceedings of the NTCIR Workshop, pp. 1–10. [4] Jean Carletta. 1996. Assessing Agreement on Classification Tasks: The Kappa Statistic. Computational Linguistics, 22(2):249–254. [5] Pedro Balage Filho, Thiago Pardo, and Sandra Aluísio. 2013. An Evaluation of the Brazilian Portuguese LIWC Dictionary for Sentiment Analysis. In Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (STIL), pp. 215–219. Sociedade Brasileira de Computacao.