Complex Predicates Annotation
in a corpus of Portuguese
Iris Hendrickx, Amália Mendes, Sílvia Pereira, Anabela
Gonçalves and Inês Duarte
Centro de Linguística e Faculdade de Letras da
Universidade de Lisboa, Lisboa, Portugal
What are complex predicates?
Constructions with more than one lexical unit, each contributing part of
the information normally associated with a single predicate.
A CP behaves like a syntactic unit -> syntactic processes usually
related to one of the elements operate over whole CP.
• Verb + Noun predicates
take a walk, have a rest
• Verb + Verb predicates
Querer estudar (want to study)
Fazer rir (make laugh)
2
Overview of the talk
• Introduction
• CP typology
– Annotated sub set
• Annotation system
– Special cases
• Annotation results
– Statistics
– Special cases
• Future Work
3
CP Theory
•
•
•
•
First verb can be considered as:
Light verb (Jespersen, 1949)
Support verb (Gross, 1981)
Auxiliary verb (Abeillé et al, 1998)
We believe that the first verb in a CP has a predicative
content: Both elements in a CP contribute to its overall
structure and meaning
Duarte, I., M. Miguel & A. Gonçalves (2009). Light verbs as predicates. Paper presented at TABU
Dag 2009. Groningen
4
project PREPLEXOS
Work presented here is based on following project:
PREPLEXOS: Predicados Complexos,
tipologia e anotação de corpus
developed at Centro de Linguística da Universidade de Lisboa
supported by FCT (PTDC/LIN/68241/2006))
Goal : Create a corpus-based resource for linguistic study of
CPs in Portuguese
5
CP typology
(1) Two verbs in a restructuring construction
não me queres dizer (you do not want to tell me)
(2) Two verbs in a causative construction with clause union
lhe fez espirrar o sangue (made him spit blood)
(3) Verb + Noun construction
terem medo da tuberculose (have fear of tuberculosis)
6
CP Typology
(4) Verb + Secondary predicate
• Adjective
tornar a história credível (make the story believable)
• Prepositional phrase
fazer x em pedaços (to make x into pieces)
(5) Verb and Verb constructions
O Pedro pegou e despediu-se (Pedro took and said
goodbye)
7
Annotation System(1)
Due to time limitations, we exclude type (4) (verb+ADJ/PP)
and we restrict (3) to:
• nouns derived from a verb
dar um passeio (to take a walk)
• nouns expressing an emotion, i.e., psych-nouns
ter medo (to be afraid)
8
Annotation System(2)
We focus on a sub set of verbs
for example:
– verb+noun: ter (have), dar (give), fazer (make)
– causative: mandar (order), deixar (let), fazer (make)



9
Annotation guidelines follow results of our CP study
under a generative grammar framework and are
therefore theory-oriented.
We do not annotate idiomatic expressions.
Annotation Tags
• verb + verb constructions (type (1), (2), (5) )
tag : CV
• verb + noun constructions (type (3))
tag: CN
10
verb+verb constructions
Restructuring constructions: [CVR]
• porque nos [CVR] queriam convidar
because [they] us wanted to invite ( 'because they wanted
to invite us ')
Causative constructions: [CVC]
• veio abalar estes alicerces espirituais [CVC] fazendo
traduzir ao rapaz "Pucelle" de Voltaire
(he shacked these spiritual foundations by making translate
to the boy "Pucelle" by Voltaire)
11
verb+verb constructions
Coordinated verbs [CVE]
• e [CVE] vai um e conta ao outro.
and goes one and tells to the other (and he tells the other)
(appears in informal spoken discourse)
12
verb+noun constructions
Bare nouns [CNB]
• Facto que leva a CGD a considerar que não [CNB] tem
obrigações em relação aos trabalhadores.
(The fact that leads the CGD to believe that it doesn't have
obligations towards the workers)
Nouns with a determiner [CN]
• o erro de [CN]fazer uma interpretação literal
(the error of making a literal interpretation)
13
Order annotation
Two indications:
•
•
CP ordering (position 1, 2, etc.): ordering in canonical form
its contextual position in an example:
B=Beginning, I=Intermediate, E=End
depois de um[CN2_B] aviso[CN3_I] dado[CN1_E]
Canonical form: dar um aviso
14
Special cases
• Ambiguity
• Overlapping CPs
• Coordination inside CP
15
Special Case: Ambiguity
•
Clearly CP:
fazer perceber aos cidadãos em geral, que a fotocópia
corresponde a um acto de pirataria inaceitável
(make understand to all citizens that a photocopy corresponds to an act of
unacceptable piracy)
•
Clearly embedded clause:
fazer os cidadãos perceber que a fotocópia corresponde a um acto
de pirataria inaceitável
•
Ambiguous: CP or just a verb+embedded infinitive clause!
uma forte vontade de fazer progredir o processo de paz
(a strong will to make progress the process of peace)
16
Special case: Overlap
Two CPs overlap in one verb -> double tag
• não o queriam[CVR1_B]
deixar[CVR2_E][CVC_VINF1_B] fugir[CVC_VINF2_E]
([they ]not him want to let escape)
Two CPs: queriam deixar and deixar fugir
17
Special case: coordination
Two CPs share the same verb in a conjunctive clause:
• para quem o quis[CVR1_B] ouvir[CVR2_1_E] e
eventualmente registar[CVR2_2_E]
(to whom wanted to listen and eventually register him)
• nós temos[CN1_B] uma[CN2_1_I] tristeza[CN3_1_E] /
uma[CN2_2_I] frustração[CN3_2_E] muito grande
(we have a sadness / a frustration very deep)
18
Corpus
• The CINTIL corpus contains 1 million tokens of
Portuguese. It was compiled using different existing
resources and contains both spoken (1/3) and written texts
(2/3 of the corpus)
• The CINTIL corpus is available for online queries
(//cintil.ul.pt)
F. Barreto, A. Branco, E. Ferreira, A. Mendes, M. F. P. Bacelar do Nascimento, F. Nunes, and J. Silva. 2006.
Open resources and tools for the shallow processing of Portuguese. LREC 2006
19
Annotation results
20
label
written
spoken
total
CV total
470
219
689
CVR
470
47
81
CVC
13
3
16
CVE
0
1
1
CVR_VINF
300
143
443
CVC_VINF
123
25
148
CN total
706
586
1292
CNB
353
213
566
CN
353
373
726
total
1176
805
1981
Annotator agreement
a small experiment:
Two annotators annotated 50 sentences independently:
a kappa value of .81
21
Special cases
Zooming in on the frequencies of the special cases in the
CINTIL corpus
label
written spoken total
22
CV ambiguity 423
168
591
coordination 15
13
28
overlap
10
16
6
To what extent occur CPs in canonical form?
• verb+verb constructions always occur in canonical form.
• determiner-noun+verb (CN) constructions occur much
more often in a different order than bare-noun+verb
(CNB) constructions.
23
label
written spoken total
% of
occ
CN
86
37
123
16.9
CNB
7
2
9
1.6
Conclusion
•
•
•
•
24
We presented the annotation process of complex
predicates in the CINTIL corpus.
We showed a broad statistical analysis of the results
and zoomed in on some research questions.
In total, almost 2000 examples were annotated in the
corpus.
This resource will be used for further investigation.
Future Work
• Further analyze the results of the verb+verb types of CPs.
• Large number of ambiguous cases and the few contexts
which give us definite clues for categorizing the sequence
as a CP challenges our concept of complex predicates.
• As to the verb+noun constructions, we want to examine
the contexts with and without determiner to see if the
same CP can occur in both structures.
25
Future Work(2)
• To look at a broader list of first verbs:
for example, certain contexts of psych-nouns like sentir
medo `feel fear', experienciar uma profunda emoção
`experience a deep emotion', where the predicative
nature of the verb is unclear.
• To enlarge our description and annotation of CPs to
include idiomatic expressions with light verbs.
26
Download

Complex Predicates Annotation in a corpus of Portuguese