Post- édition de TA neuronale à la DGT et qualité des textes finaux : étude de cas

Dublin Core

Contributeur

[aucun texte]

Couverture

[aucun texte]

Créateur

Date

Description

Neural Machine Translation Post-Editing in DGT and Final Text Quality: A Case Study
This article aims at presenting the results of a case study carried out in collaboration with the European Commission’s Directorate General for Translation. This study analyses the quality of contents post-edited from Neural Machine Translation (NMT) proposals (eTranslation NMT engine) by translators with varied translation experience levels. Two types of participants were recruited: “Blue Book” interns (i.e. recently graduated translators taking part in a 5-month paid internship in DGT) and in-house translators. In order to proceed with this analysis, we used an evaluation grid created by French researchers Toudic et al. (2014), and containing nine error categories, as well as four types of effects which guide raters when they attribute severity penalties to errors. The reliability of this tool was verified by an interrater agreement score: 583 revision marks were compared in terms of 1) severity penalty, 2) category and 3) raw MT responsibility by two investigators. As far as methodology is concerned, for each source text, a NMT proposal from the eTranslation engine was post-edited by a DGT translator (10 participants; 7 in- house translators and 3 “Blue Book” interns) and revised by a DGT colleague. This procedure follows the typical DGT workflow: texts are usually first translated by a translator, then systematically revised by a colleague from the same (or sometimes, a different) translation unit. The evaluation of PE text quality was thus carried out through the revision marks introduced in the PE texts. Each of these revision marks was categorised and was attributed a penalty score ranging from 1 (minor) to 5 (critical), according to the perceived distortion of the original message and intention that the source text is supposed to convey. Severity penalties were then normalised using a 100-word basis, in order for the results to be comparable between participants and texts: a total penalty score was computed for each text, and then accordingly divided to reach a 100-word penalty score. These normalised scores enabled us to compare the perceived quality of the texts provided by our participants. Though our results cannot be generalised, since the study presented here is a case study for which no significance score could be computed (not enough data), several conclusions were reached: the overall PE text quality is higher in participants with high experience levels (senior translators) than in junior translators; participants with lower experience levels produce PE texts containing more fidelity and terminology problems than their more experienced counterparts, and professional experience does not seem to have an influence on the proportion of errors directly caused by NMT proposals. Several organisational constraints limited the scope of our study. First, the modest number of participants did not provide for significant results. Hence, a deeper study could be carried on with more volunteers, in order to reach more generalisable results. Secondly, each participant provided us with an uneven number of texts and PE words. This is due to the very nature of our study, in the framework of which translators provided us with texts coming from their daily translation tasks, which limits the quantity of collected data but increases natural validity. Furthermore, the authentic context in which this study was implemented did not enable us to collect process data: further studies could include said data, which would provide for more representative results and provide us with an insight in translators’ cognitive processes when post-editing. In this context, eye-tracking data could be collected, and methods such as questionnaires and think-aloud protocols could be implemented in order to link process data to the quality scores obtained in our study. Finally, studying additional language pairs would be relevant, since NMT quality tends to vary according to these.
Neural Machine Translation Post-Editing in DGT and Final Text Quality: A Case Study
This article aims at presenting the results of a case study carried out in collaboration with the European Commission’s Directorate General for Translation. This study analyses the quality of contents post-edited from Neural Machine Translation (NMT) proposals (eTranslation NMT engine) by translators with varied translation experience levels. Two types of participants were recruited: “Blue Book” interns (i.e. recently graduated translators taking part in a 5-month paid internship in DGT) and in-house translators. In order to proceed with this analysis, we used an evaluation grid created by French researchers Toudic et al. (2014), and containing nine error categories, as well as four types of effects which guide raters when they attribute severity penalties to errors. The reliability of this tool was verified by an interrater agreement score: 583 revision marks were compared in terms of 1) severity penalty, 2) category and 3) raw MT responsibility by two investigators. As far as methodology is concerned, for each source text, a NMT proposal from the eTranslation engine was post-edited by a DGT translator (10 participants; 7 in- house translators and 3 “Blue Book” interns) and revised by a DGT colleague. This procedure follows the typical DGT workflow: texts are usually first translated by a translator, then systematically revised by a colleague from the same (or sometimes, a different) translation unit. The evaluation of PE text quality was thus carried out through the revision marks introduced in the PE texts. Each of these revision marks was categorised and was attributed a penalty score ranging from 1 (minor) to 5 (critical), according to the perceived distortion of the original message and intention that the source text is supposed to convey. Severity penalties were then normalised using a 100-word basis, in order for the results to be comparable between participants and texts: a total penalty score was computed for each text, and then accordingly divided to reach a 100-word penalty score. These normalised scores enabled us to compare the perceived quality of the texts provided by our participants. Though our results cannot be generalised, since the study presented here is a case study for which no significance score could be computed (not enough data), several conclusions were reached: the overall PE text quality is higher in participants with high experience levels (senior translators) than in junior translators; participants with lower experience levels produce PE texts containing more fidelity and terminology problems than their more experienced counterparts, and professional experience does not seem to have an influence on the proportion of errors directly caused by NMT proposals. Several organisational constraints limited the scope of our study. First, the modest number of participants did not provide for significant results. Hence, a deeper study could be carried on with more volunteers, in order to reach more generalisable results. Secondly, each participant provided us with an uneven number of texts and PE words. This is due to the very nature of our study, in the framework of which translators provided us with texts coming from their daily translation tasks, which limits the quantity of collected data but increases natural validity. Furthermore, the authentic context in which this study was implemented did not enable us to collect process data: further studies could include said data, which would provide for more representative results and provide us with an insight in translators’ cognitive processes when post-editing. In this context, eye-tracking data could be collected, and methods such as questionnaires and think-aloud protocols could be implemented in order to link process data to the quality scores obtained in our study. Finally, studying additional language pairs would be relevant, since NMT quality tends to vary according to these.

Langue

Droits

[aucun texte]

Titre

Autre forme de titre

[aucun texte]

Résumé

[aucun texte]

Table des matières

[aucun texte]

Date de disponibilité

[aucun texte]

Date de création

[aucun texte]

Date d'acceptation

[aucun texte]

Date du copyright

[aucun texte]

Date de soumission

[aucun texte]

Date de parution

[aucun texte]

Date de modification

[aucun texte]

Date de validité

[aucun texte]

Droit d'accès

[aucun texte]

Licence

[aucun texte]

Est conforme à

[aucun texte]

A pour autre format

[aucun texte]

A comme partie

[aucun texte]

A d'autres versions

[aucun texte]

Est un autre format de

[aucun texte]

Est une partie de

[aucun texte]

Est référencé par

[aucun texte]

Est remplacé par

[aucun texte]

Est requis par

[aucun texte]

Est une version de

[aucun texte]

Référence

[aucun texte]

Remplace

[aucun texte]

Requiert

[aucun texte]

Étendue de la ressource, taille, durée

[aucun texte]

Support

[aucun texte]

Référence bibliographique

[aucun texte]

Couverture spatiale

[aucun texte]

Couverture temporelle

[aucun texte]

Méthode d’abonnement

[aucun texte]

Périodicité d’acquisition

[aucun texte]

Politique d’acquisition

[aucun texte]

Public visé

[aucun texte]

Niveau public destinataire

[aucun texte]

Médiateur

[aucun texte]

Méthode d’enseignement

[aucun texte]

Provenance

[aucun texte]

Ayants droit

[aucun texte]

Embed

Copy the code below into your web page