Answering Portuguese Questions Luís Fernando Costa & Luís Miguel Cabral {Luis.costa, Luis.M.Cabral}@sintef.no Linguateca / SINTEF ICT PB 124, Blindern NO-0314 Oslo, Norway http://www.linguateca.pt Question Answering Goal of a question answering (QA) system is to answer precisely questions formulated in natural language. Different from the more widely known search engines such as Google which retrieve documents based on a set of keywords. A general domain QA system it is not specially tuned or prepared to answer questions in a particular domain or subject. Arquitecture Iterate over all the alternative questions Anaphor Resolution Question Search Doc Collections Question Reformulation Search Web News, Wikipedia NER N-Grams Esfinge Answer=NIL General domain Portuguese QA system Use of information redundancy to retrieve documents (CHAVE collection, Wikipedia and Web) Anaphor resolution, using PALAVRAS [Bick, 2000] Multiple question generation, multiple answers Experimenting with several types of search patterns Named entity recognizer SIEMÊS [Sarmento, 2006] used to retrieve candidate answers to questions that imply answers of particular types of NE. Web interface and source code used in some of the system's modules available at http://www.linguateca.pt/Esfinge/ Database of Co-ocurrences Choice of longer answers CLEF The Cross-Language Evaluation Forum (CLEF) promotes R&D in multilingual information access. Esfinge participates in CLEF since 2004. Errors at CLEF 2007: Wrong or incomplete search patterns (63/165 wrong answers) Document retrieval failure (33/165 wrong answers) Missing patterns to identify the type of answer Search in Wikipedia Experiments Filters Results Test set 1: 200 questions from QA@CLEF 2007 for PT-PT Answer Search Supporting Documents Answer(s ) Answer Selection More complete search patterns (added noun phrases) Remove the verbs from the search Combine two types of search patterns simultaneously: Table 1. Result of the experiments (F:170 factoid questions; D: 30 definition questions) Description CLEF 2007 Baseline More Com plete Search patterns Without verbs Com bination Right answers Unsupported Inexact answers Inexact answers Good supporting All NIL F D Answers (Missing words) (too m any words) snippets 35 5 28 7 1 6 1 59 34 5 29 5 4 7 1 58 35 7 31 4 4 7 1 60 41 44 4 37 3 39 "declarou a independência em 1291“ país / 20 país declarou a independência em 1291 / 1 2) Patterns generated using PALAVRAS: 7 8 7 6 1 1 71 76 Table 2. Causes for wrong answer in the best run Cause CLEF 2007 Com bination Co-reference resolution 25 23 Wrong or incom plete search patterns 63 15 Docum ent retrieval failure 33 12 Answer scoring algorithm 24 60 Answer support testing 7 27 Other 6 19 Total 165 156 Test set 2: 200 questions from QA@CLEF 2008 for PT-PT Table 3. Result of the experiments (F:171 factoid questions; D: 29 definition questions) Description Example: “Que país declarou a independência em 1291?” 1) Predefined text patterns: 4 5 a ) PALAVRAS patterns b) = a ) + test patterns without verbs Esfinge original patterns + b) Right Unsupported Answers Answers Inexact Inexact Answers Answers (Missing words) (Too m any words) 1 8 44 4 50 6 1 8 49 5 2 12 declarou; a independência em 1291; país; a independência em 1291; país; (without verbs) Conclusions Using patterns without verbs as a backup strategy yield better results both with 2007 and 2008 QA@CLEF questions), but only for factoid questions. Benefits of the combination of two types of search patterns were not confirmed by the experiment with 2008 questions. Errors moved to a later stage in the system’s execution. This work was done in the scope of the Linguateca, contract nº339/1.3/C/NAC, project jointly funded by the Portuguese Government and the European Union. Bick, E.: The Parsing System "Palavras": Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus: Aarhus University Press (2000) Sarmento, L.: SIEMÊS - a named entity recognizer for Portuguese relying on similarity rules. In 7th Workshop on Computational Processing of Written and Spoken Language (PROPOR'2006) (Itatiaia, RJ, Brasil, 13-17 May 2006), Springer, pp. 90-99.