This is an interesting webinar on metrics that can be used for evaluating the quality of a translation, held by GALA (Globalization and Localization Association). The authors of the presentation have taken into consideration only evaluation metrics on which information is publicly available.
I will briefly summarize the evaluation metrics from the webinar and if you are interested in more details, you could watch the 58-minute recording.
First, the manual evaluation metrics and tools for human translation:
- the webinar presenters researched three such models (the SAE J2450, the LISA QA Model, version 3.1 and the SDL TMS Classic Model)
- they are either error rate models (based on error counting, a “perfect” translation would start from 100%) or rubric models (they have an additive nature, points are added if the translation meets certain requirements regarding quality; they are relatively uncommon)
- according to the authors of the presentation, all quality assessment metrics that were presented in the webinar rely on bilingual assessors, involve errors that do not necessarily require the evaluation of a bilingual reviser since they are general language errors and the major issues they face are validity, reliability and the subjectivity of the assessor.
Automatic tools for evaluating human translation:
Acrocheck
ApSIC XBench
CheckMate
LanguageTool
QA Distiller
XLIFF:doc
Manual (human) evaluation metrics for evaluating MT (machine translation) output
- Two models are presented, the adequacy and fluency model and the utility- and task-based evaluation model
Automatic evaluation metrics for MT output
the BLEU model, the NIST model and the METEOR, error rate-based evaluation (counting the number of changes that need to be made so that the MT output looks exactly like the human translation)
Multidimensional Quality Metrics (MQM)
an open source framework which recommends a metric / metrics on several criteria (language, client, audience, field)
the MQM framework can be found at: www.translate5.net