Participant Tracking in Text Unfolding:
Insights from Portuguese-Chinese
Translation and Post-Editing Task Logs
AuTema-PostEd Group
Ana Luísa V. Leal1, Márcia Schmaltz1, Derek Wong1, Lidia Chao1,
James TL Wang1, Adriana Pagano2, Fabio Alves2,
Igor da Silva3, Paulo Quaresma4
University of Macau (UM)1
Federal University of Minas Gerais (UFMG)2
Federal University of Uberlândia (UFU)3
University of Evora (UE)4
Outline
• Research Context and Aims
• Research Assumptions and Questions
• Experimental Design
• Methodology of Analysis
• Preliminary Results
• Discussion
• Next Steps
Research Context and Aims
Joint Project between Department of Portuguese /University of
Macau and LETRA – Laboratory for Experimentation in Translation,
Federal University of Minas Gerais, to investigate the process and
final output of human translated text as compared to human postediting of machine translated texts by Portuguese-Chinese
Translation System (PCT).
Research Assumptions and Questions
• Identity cohesive chains responsible for co-reference and participant
tracking are crucial to construing a coherent representation of a text.
(Halliday & Hasan 1976, Halliday 1989, Halliday & Matthiessen 2004)
Do translation process data show evidence of the role of
identity chains as exerting cognitive demands upon task
executors?
• Translating demands more cognitive effort than post-editing (Carl et al
2011)
Do data in our study confirm impact of task type on effort?
Translation Process Research
•
Eye-Mind Assumption (Just & Carpenter 1980)
•
User Activity Data (Carl & Jakobsen 2009, Carl 2012b)
•
(Un)challenge translation (Carl & Dragsted 2012)
• More / less time consuming
• Look few / more words ahead into the ST
• More / less keyboard activities (insertions, deletions)
•
Long pauses, regressive saccades and refixation on words already
read (Carl & Jakobsen 2009)
Integrating quantitative and qualitative analysis to have a full picture
of the translation process (Alves & Gonçalves 2013, O’Brien 2006)
•
Experimental Design
• Materials
• Translog-II (Carl 2012a)
• Eye tracker Tobii T120
• Tobii Studio 3.2.1
• Participants
• 12 translators, L1 Chinese, 23-32 years old, BA Portuguese
Studies, < 1 year experience, glasses or contact lenses
• Setting and conditions
• Eye-tracking lab at University of Macau
• No time pressure
Experimental Design
• Input
• 4 texts: ca. 80 words/word-equivalents – news reports
• 2 Chinese source texts
• 2 Portuguese source texts
• MT output provided by PCT (Portuguese-Chinese Translator)
(Wong et al 2012)
• Randomized for both subjects and tasks
• Tasks
• 1 L1 translation
• 1 L2 translation
• 1 post-editing into L1
• 1 post-editing into L2
• Recall Protocols
Experimental Design
• Input
• 4 texts: ca. 80 words/word-equivalents – news reports
• 2 Chinese source texts
• 2 Portuguese source texts [Focus on Text 2]
• MT output provided by PCT (Portuguese-Chinese Translator)
(Wong et al 2012)
• Randomized for both subjects and tasks
• Tasks
• 1 L1 translation
• 1 L2 translation
• 1 post-editing into L1
• 1 post-editing into L2
• Recall Protocols
Input
ST Identity Cohesive Chains
Os brasileiros estão em lua-de-mel com o mundo.
The Brazilians are in honeymoon with the world.
A Petrobras e sua presidente e sua presidente estão entre as maiores do mundo.
Petrobras and their/its president are among the greatest of the world.
Conforme uma revista americana, a presidente é uma das 100 maiores lideranças mundiais e a petrolífera
estatal é uma das 10 maiores companhias do planeta.
According to an American magazine, the president is one of the 100 greatest world leaders, and the stateowned oil company is one of the 10 greatest companies in the planet.
A revista classifica as empresas com base em diversos indicadores além do lucro.
The magazine classifies the companies building on several indicators besides profit.
É por isso que a brasileira surge em posição de liderança na classificação.
That’s why the Brazilian [company/president] emerges in leadership position in the classification.
[Elipse] Está inclusive à frente de grandes como a Apple.
[The company/Brazil/The president] are even beating big [ones/companies/countries/presidents], such as
Apple.
Input
ST Identity Cohesive Chains
Os brasileiros estão em lua-de-mel com o mundo.
The Brazilians are in honeymoon with the world.
A Petrobras e sua presidente e sua presidente estão entre as maiores do mundo.
Petrobras and their/its president are among the greatest of the world.
Conforme uma revista americana, a presidente é uma das 100 maiores lideranças mundiais e a petrolífera
estatal é uma das 10 maiores companhias do planeta.
According to an American magazine, the president is one of the 100 greatest world leaders, and the stateowned oil company is one of the 10 greatest companies in the planet.
A revista classifica as empresas com base em diversos indicadores além do lucro.
The magazine classifies the companies building on several indicators besides profit.
É por isso que a brasileira surge em posição de liderança na classificação.
That’s why the Brazilian [company/president] emerges in leadership position in the classification.
[Elipse] Está inclusive à frente de grandes como a Apple.
[The company/Brazil/The president] are even beating big [ones/companies/countries/presidents], such as
Apple.
Methodology of Analysis
• User Activity Data
• FU ≤400ms, PU ≤ 1000ms
• Source Tokens (ST)
• Target Tokens (TT)
Extracted: gaze time, fixation number, insertions, deletions,
editions, first pass fixation time (FD), STid, TTid.
• Statistical studies
• Comprehension of ST, TT and Production
• Subset studies with the 6 identities references (Principal,
Secondary, Others)
Methodology of Analysis
• Quantitative
• Linear Mixed-Effects Regression Model (LMER)
lmerTest running on the statistical tool R (3.0.2) (Baayen
2008, Balling 2008, 2013, Sjorup 2012)
• p<0.05
• Tests (Student’s T-Test; Mann Whitney U)
Methodology of Analysis
Comprehension
Variables
Dependent
Total Fixation Time
Production
Production time
Total Fixation Number
First Pass Fixation Time
Independent /
Random
AOIs, Participant
AOIs, Participant
Independent /
Fixed
Length
Character count
Position
Position
Frequency (corpus’ Banco de
Português)
Rendition (Yes or No)
Trigram Probability
Trigram Probability
Task (Translation or Post-editing)
Task (Trans. or Post-Ed.)
Frequency (corpus’ CCL/UPK)
Type (Reference or Comparative) Type (Reference or Comparative)
Preliminary Results
Source Text Comprehension
Gazing and Fixation
Variables
Total Fixation
Time
Total Fixation
Number
First Pass
Fixation Time
(Intercept)
<2e-16
3.02-10
<2e-16
Length
<2e-16
<2e-16
0.000478
0.0247
0.018131
Frequency
Target Text Comprehension
Gazing and Fixation
Variables
Total Fixation
Time
Total Fixation
Number
First Pass
Fixation Time
(Intercept)
<2e-16
4.81e-08
<2e-16
Length
6.90e06
6.90e-06
8.6e-10
Position
5.62e-09
5.62e-09
TypeRef
0.0171
Target Text Production
Keylogging
Variables
(Intercep)
Production Time
<2e-16
Character count
2.67e-13
Renditions
0.000586
Input
ST Identity Cohesive Chains
Os brasileiros estão em lua-de-mel com o mundo.
The Brazilians are in honeymoon with the world.
A Petrobras e sua presidente e sua presidente estão entre as maiores do mundo.
Petrobras and their/its president are among the greatest of the world.
Conforme uma revista americana, a presidente é uma das 100 maiores lideranças mundiais e a petrolífera
estatal é uma das 10 maiores companhias do planeta.
According to an American magazine, the president is one of the 100 greatest world leaders, and the stateowned oil company is one of the 10 greatest companies in the planet.
A revista classifica as empresas com base em diversos indicadores além do lucro.
The magazine classifies the companies building on several indicators besides profit.
É por isso que a brasileira surge em posição de liderança na classificação.
That’s why the Brazilian [company/president] emerges in leadership position in the classification.
[Elipse] Está inclusive à frente de grandes como a Apple.
[The company/Brazil/The president] are even beating big [ones/companies/countries/presidents], such as
Apple.
ST Identity Chains
Gazing and Fixation
Variables
Total Fixation
Time
Total Fixation
Number
First Pass
Fixation Time
(Intercept)
<2e-16
1.99e-09
<2e-16
Length
0.000319
4.58e-05
0.000823
Frequency
0.001553
0.00483
TT Identity Chains
Gazing and Fixation
Variables
Total Fixation
Time
Total Fixation
Number
First Pass
Fixation Time
(Intercept)
<2e-16
2.99e-11
0.000245
Length
3.16e-05
1.23e-08
0.002858
Position
8.20e-10
4.27e-10
Frequency
7.35e-05
Production Identity Chains
Keylogging
Variables
(Intercep)
Production Time
<2e-16
Character count
1.72e-08
Renditions
0.01672
Position
0.00498
Non-Parametric Tests
• ST Reading:
• Significant differences between reference types for TNF (0.27)
and TFT (0.46) and between subjects for TNF, TFT and FIRST
• No significant differences between tasks
•TT Reading:
• Significant differences between subjects for TNF, TFT and FIRST
• Significant differences between tasks for TNF (0.13) and TFT
(0.025)
• No significant differences between reference types
ProgGraph P23
• Some translators take the wrong road when accepted the MT or due the
15
20
25
30
35
40
45
50
comprehension of the ST.
西
巴
裁
统
10
及
以
133000
134000
135000
136000
137000
138000
139000
140000
141000
142000
143000
Prograph P23
15
20
25
30
35
40
45
50
55
60
65
70
75
80
• Rereading and ... “click”!
10
统总 西
579500
580500
581500
582500
583500
584500
585500
586500
587500
588500
589500
其
巴
590500
591500
592500
裁
总
593500
594500
Discussion
Research question: Do translation process data show
evidence of the role of identity chains as exerting cognitive
demands upon task executors?
• They do with regard to the source text considering only the
non-parametric tests. The multiple regression tests show no
significant results considering the set of variables used in the
experiment.
• Cohesive chains seem to have an impact on comprehension
when reading the ST – reading for translation involves
anticipating how chains will have to be dealt with in the TT
(whether explicitation will be needed in the TT)
Discussion
Research question: Do data in our study confirm impact of task type
on effort?
• No. However four subjects built identity chains different from those in
the ST.
Their inferential path was different from the other subjects
• The results of non-parametric tests shed light on relevant aspects of
the inferential processing in post-editing and translation
Translation TT-driven, but a result of a comprehension process
Tasks differ concerning the target text production
Post-editing demands less cognitive effort for lexical rendition
(as confirmed by RVP, i.e. less character insertions), but
requires more effort to reorganize structures (as shown in RVP)
Next Steps
• Fine-grained, qualitative analysis of user activity data (UAD) and
translation progression graphs
• Analysis of the impact of the first task on the second task (Ferreira
2010)
• Further studies collecting data from native speakers of Portuguese to
contrast reading effort in cohesive chains.
• Analyses of results for all four texts used in the experiment will permit
more robust results.
References
ALVES, F., GONÇALVES, J. 2013. “Investigating the Conceptual-procedural Distinction in the Translation Process”. Target 25:1. p. 107-124.
BALLING, L., CARL M. (to appear). “Production Time Across Language and Tasks: A Large-scale Analysis Using the CRITT Translation
Process Database. In: Schwieter, J., Ferreira, A. (eds.) The development of Translation Competence: Theories and Methodologies from
Psycholinguistics and Cognitive Science. Cambridge: Cambridge Scholar Publishing.
CARL, M. 2012. ‘Translog-II: a program for Recording User Activity Data for Empirical Reading and Writing Research’, in Proceedings of the
Eighth International Conference on Language Resource and Evaluation. Istanbul 21-27 May 2012, Istanbul: European Language Resources
Association, 4108-4112.
CARL M., DRAGSTED, B., ELMING, J., HARDT, D. & JAKOBSEN, A. L. 2011.The Process of Post-Editing: a Pilot Study. In B. Sharp, M. Zock,
M. Carl, A.L. Jakobsen (orgs.). Proceedings of the 8th Natural Language Processing and Cognitive Science Workshop. Copenhagen Studies
in Language Series 41. p. 131-142.
HALLIDAY, M. A.K., HASAN, R. 1976. Cohesion in English. London: Longman.
HALLIDAY, M. A.K. 1989. Spoken and Written Language. Oxford: Oxford University Press.
HALLIDAY, M. A.K., Matthiessen, C. 2004. An Introduction to Functional Grammar. London: Arnold
O'BRIEN, S. 2002. Teaching post-editing: A proposal for course content. In: Teaching Machine Translation - the 6th International Workshop of
the European Association of Machine Translation, Centre for Computational Linguistics, UMIST: Manchester. p. 99-106.
O`BRIEN, S. 2004. Machine translatability and post-editing effort: how do they relate. In: Proceedings of translating and the computer 26.
London: Aslib.
O’BRIEN, S. 2006. Controlled Language and Post-Editing. The Guide From Multilingual. p. 17-19.
O’BRIEN, S. & ALMEIDA, G. 2010. Analysing post-editing performance: correlations with years of translation equivalence. In: Proceedings of
EAMT 2010: the European Association for Machine Translation, St Raphael, France.
PAVLOVIĆ, N. & JENSEN, K. T. H. 2009. Eye tracking translation directionality. In A. Pym and A. Perekrestenko (eds). Translation Research
Projects 2. Tarragona: Universitat Rovira i Virgili. p.101-119.
SJORUP, A. (2013). Cognitive Effort in Metaphor Translation. PhD Thesis. Copenhagen: Copenhagen Business School.
WONG, D., OLIVEIRA, F., LI, YP. 2012. Hybrid Machine Aided Translation System based on Constraint Synchronous Grammar and
Translation Corresponding Tree. Journal of Computers, 7(2): p. 309-316.
Thank you! Obrigada! 谢谢!
谢谢
Download

Insights from Portuguese-Chinese Translation and Post