Working with COMPARA an online parallel corpus of English and Portuguese fiction Ana Frankenberg-Garcia An online parallel corpus of English and Portuguese fiction ??? An online corpus Allows you to study Portuguese and English fiction and their translations into English and Portuguese in an automatic way… COMPARA Machine Translation Human Translation The study of human translation Traditionally not a hard science Difficult to be systematic But with the technology of corpus linguistics, things can change … What is a corpus? Advantages of using corpora to study human translation An enormous amount of translated texts Systematic analyses Quantifiable results Baker (1993), Frankenberg-Garcia (2004), Olohan & Baker (2000), Øverås (1998), Sardinha (2002) A parallel corpus can also be used in language learning Barlow (2000), Frankenberg-Garcia (2000, 2004, forthcoming), Pearson (2003), Roussel (1991) Advantages of using corpora in language learning • Authentic examples of language use • Access to information often absent from conventional grammars and dictionaries • Learner autonomy (don’t have to rely on native speakers) • Risk-taking COMPARA COMPARA team Ana Frankenberg-Garcia, Diana Santos Rosário Silva, Susana Inácio, Rosa Pires Initial support (1999-2000) FCT (Portugal) ISLA Lisboa Oxford University Language Centre Present funding (2001-2004) Linguateca: FCT/ POSI (POSI/PLP/43931/2001) COMPARA EN translations PT source texts structure PT translations EN source texts COMPARA English Portuguese Original Source Translations Translated Original Translated Portuguese Texts Portuguese English English COMPARA users and uses Language learners - bilingual dictionary with examples Language teachers - exercises and tests Translators - language equivalents Translation lecturers - exercises & problems Translation theorists - test translation hypotheses Bilingual lexicographers - bilingual dictionaries Computational linguists - machine translation Since 2001: + 70 000 queries Before using it… Remember that the results you get are “only as good as the corpus” J. Sinclair Corpus concordance collocation (1991: 13) Why can’t I find the Portuguese translation of greenhouse gas in COMPARA? COMPARA 5.6 varieties UK Portugal US Mozambique Brazil South Africa Angola PORTUGUESE ENGLISH COMPARA 5.6 Publication dates 2000 1997 1988 1914 1880 1837 COMPARA 5.6 genre Published other genres fiction EXTENSIBLE COMPARA 5.6 authors Portuguese writers Camilo Castelo Branco Eça de Queirós José Cardoso Pires Jorge de Sena Mário de Carvalho Sá Carneiro COMPARA 5.6 authors Brazilian writers Aluísio Azevedo Autran Dourado Chico Buarque José de Alencar Machado de Assis Manuel Antônio de Almeida Marcos Rey Patrícia Melo Paulo Coelho Rubem Fonseca COMPARA 5.6 authors Angolan writers José Eduardo Agualusa Mozambiquean writers Mia Couto COMPARA 5.6 authors British writers David Lodge Julian Barnes Joseph Conrad Joanna Trollope Lewis Carrol Oscar Wilde COMPARA 5.6 authors American writers Henry James Edgar Allan Poe Richard Zimler South African writers Nadine Gordimer + copyright permission to use more Can any text be included in the corpus? Only published source texts and translations Only English translated directly from Portuguese, and Portuguese translated directly from English Only human translations! COMPARA 5.6 texts 49 translations 46 source texts (extracts) COMPARA 5.6 size 973317 893150 words words in English in Portuguese Largest edited parallel corpus in the world Now I know why I can’t find greenhouse gas in COMPARA! COMPARA 5.6 syntax general language fiction technical terms other genres One more thing… When using corpora, remember: Language is “constructed out of a finite set of elements”, but it is something that is used creatively! N. Chomsky Syntactic Structures (1957:13) “As a rule of thumb you need a litre of paint to every 12 square metres of wall” “rule” “as a rule” “rule of thumb” COMPARA availability Free, online For research and education COMPARA access www.linguateca.pt/COMPARA/