Answering Portuguese
Questions
Luís Fernando Costa & Luís Miguel Cabral
{Luis.costa, Luis.M.Cabral}@sintef.no
Linguateca / SINTEF ICT
PB 124, Blindern NO-0314 Oslo, Norway
http://www.linguateca.pt
Question Answering
Goal of a question answering (QA) system is to answer
precisely questions formulated in natural language.
 Different from the more widely known search engines such as
Google which retrieve documents based on a set of keywords.
 A general domain QA system it is not specially tuned or
prepared to answer questions in a particular domain or subject.
Arquitecture

Iterate over all the alternative questions
Anaphor
Resolution
Question
Search Doc
Collections
Question
Reformulation
Search Web
News, Wikipedia
NER
N-Grams
Esfinge
Answer=NIL
General domain Portuguese QA system
 Use of information redundancy to retrieve documents (CHAVE
collection, Wikipedia and Web)
 Anaphor resolution, using PALAVRAS [Bick, 2000]
 Multiple question generation, multiple answers
 Experimenting with several types of search patterns
Named entity recognizer SIEMÊS [Sarmento, 2006] used to
retrieve candidate answers to questions that imply answers of
particular types of NE.
 Web interface and source code used in some of the system's
modules available at http://www.linguateca.pt/Esfinge/
Database of
Co-ocurrences

Choice of
longer answers

CLEF
The Cross-Language Evaluation Forum (CLEF) promotes R&D
in multilingual information access.
 Esfinge participates in CLEF since 2004.
 Errors at CLEF 2007:
Wrong or incomplete search patterns (63/165 wrong
answers)
Document retrieval failure (33/165 wrong answers)
Missing patterns to identify the type of answer
Search in Wikipedia

Experiments


Filters
Results
Test set 1: 200 questions from QA@CLEF 2007 for PT-PT


Answer
Search
Supporting
Documents
Answer(s
)
Answer
Selection
More complete search patterns (added noun phrases)
Remove the verbs from the search
Combine two types of search patterns simultaneously:
Table 1. Result of the experiments (F:170 factoid questions; D: 30 definition questions)
Description
CLEF 2007
Baseline
More Com plete
Search patterns
Without verbs
Com bination
Right answers
Unsupported Inexact answers Inexact answers Good supporting
All NIL F D Answers
(Missing words) (too m any words) snippets
35
5 28 7
1
6
1
59
34
5 29 5
4
7
1
58
35
7 31 4
4
7
1
60
41
44
4 37
3 39
"declarou a independência em 1291“ país / 20
país declarou a independência em 1291 / 1
2) Patterns generated using PALAVRAS:
7
8
7
6
1
1
71
76
Table 2. Causes for wrong answer in the best run
Cause
CLEF 2007 Com bination
Co-reference resolution
25
23
Wrong or incom plete search patterns
63
15
Docum ent retrieval failure
33
12
Answer scoring algorithm
24
60
Answer support testing
7
27
Other
6
19
Total
165
156
Test set 2: 200 questions from QA@CLEF 2008 for PT-PT
Table 3. Result of the experiments (F:171 factoid questions; D: 29 definition questions)
Description
Example: “Que país declarou a independência em 1291?”
1) Predefined text patterns:
4
5
a ) PALAVRAS
patterns
b) = a ) + test patterns
without verbs
Esfinge original
patterns + b)
Right
Unsupported
Answers
Answers
Inexact
Inexact
Answers
Answers
(Missing words) (Too m any words)
1
8
44
4
50
6
1
8
49
5
2
12
declarou; a independência em 1291; país;
a independência em 1291; país; (without verbs)
Conclusions
Using patterns without verbs as a backup strategy yield better
results both with 2007 and 2008 QA@CLEF questions), but only
for factoid questions.
 Benefits of the combination of two types of search patterns
were not confirmed by the experiment with 2008 questions.
 Errors moved to a later stage in the system’s execution.


This work was done in the scope of the Linguateca, contract
nº339/1.3/C/NAC, project jointly funded by the Portuguese
Government and the European Union.
Bick, E.: The Parsing System "Palavras": Automatic Grammatical Analysis of
Portuguese in a Constraint Grammar Framework. Aarhus: Aarhus University
Press (2000)
Sarmento, L.: SIEMÊS - a named entity recognizer for Portuguese relying on
similarity rules. In 7th Workshop on Computational Processing of Written and
Spoken Language (PROPOR'2006) (Itatiaia, RJ, Brasil, 13-17 May 2006),
Springer, pp. 90-99.
Download

Test set 1 - Linguateca