GTCOM Wins Five Championships at the 2020 Conference on Machine Translation, Ranking First in the World

News Source:GTCOMDate: 6 January 2021views:5090

The 2020 Conference on Machine Translation (WMT 2020) announced the expert evaluation results (final results) of 38 participating teams around the world, including Google DeepMind, Microsoft, Facebook, Tencent, OPPO, ByteDance and GTCOM. GTCOM took part in the evaluation of six language pairs, and ultimately it earned five first-place rankings and one second-place. Thus, it took the top spot in regard to the number of championships.

The Conference on Machine Translation (WMT), hosted by the Association for Computational Linguistics (ACL), is acknowledged by the global academic community as one of the leading international competitions in the realm of machine translation. The event opened in 2006 and has been held each year since then. Its mission is to evaluate the latest developments with respect to machine translation, disseminate data on testing and public training, and enhance the methodology by which machine translation is evaluated.

The competition provides a parallel corpus as the training data for the machine-translation engines used by the participating teams in all language pairs. Machine evaluation is performed prior to expert evaluation, after which the latter results are taken as the final outcome of the competition. The evaluation scores are based on a hundred-mark system, and weighted calculation is performed on the basis of the evaluation results of the language expert team so as to obtain the standardized scores and final rankings in "Ave.z" manual evaluation. Additionally, the four online machine translation systems of Online-A, G, Z and B are introduced in order to compare the evaluations. The results of that process have high credibility due to the rigorously scientific methods employed.

GTCOM's Evaluation Results from the 2020 Conference on Machine Translation

A total of 11 language pairs including Chinese-English, Czech-English, French-German, German-English, Inuktitut-English, Tamil-English, Japanese-English, Pashto-English, Polish-English, Russian-English and Khmer-English, and machine translation-evaluation tasks for 22 language pairs were released in the competition. With its outstanding performance, GTCOM won the championship in five language pairs out of the six in which it took part, including Khmer-English, English-Khmer, Pashto-English, English-Pashto and Tamil-English, while winning the second place in English-Tamil, which again demonstrated its leading strengthen in machine translation.

GTCOM's achievements in machine translation are as varied as they are impressive. At the 2019 Conference on Machine Translation, which drew more than 60 formidable teams from around the world, GTCOM, Microsoft Research Asia (MSRA) and Facebook won in three language pairs and ranked first in the number of championships. At the 2018 Conference on Machine Translation, GTCOM participated in the automatic evaluation of English-Chinese pair and won first place. Additionally, GTCOM won 16 first-place rankings in translation-evaluation tasks for 20 language pairs at the 2017 International Workshop on Spoken Language Translation (IWSLT 2017).

GTCOM's outstanding achievements are the fruit of constant perseverance. The company is among China’s earliest enterprises to carry out researches related to machine translation. Over the years, GTCOM has achieved innovations in the study of machine translation models focusing on fields and languages, which can support translation in vertical fields such as finance, technology and medical treatment while encompassing 60 languages and 3,000 language pairs. The massive, high-quality corpus and data-cleansing technology have highlighted its advantages in machine translation. GTCOM has accumulated more than 5.2 billion sentence pairs of high-quality parallel corpus in professional fields. In terms of data cleansing, GTCOM employs language models and translation models to clean the corpus respectively, and designs a cyclic corpus generation and cleansing method to continuously elevate the quality of translation and corpus. In regard to training techniques, it employs pre-training model tuning, knowledge distillation, multi-model integrated decoding, joint training, translation reordering and other techniques for the enhancement of training quality.

Demonstration of GTCOM's machine-translation hardware products

GTCOM has built a complete machine-translation application ecosystem. Its application scope includes multi-modal data such as text, webpage, audio and video; the application products encompass folder translation, document translation, Word translation plug-in, YeeCaption, webpage translation and auxiliary translation platforms. The application scenarios have been expanded to include simultaneous interpretation at international conferences, remote office operations and teaching, translation-project management, cross-language semantic retrieval and multilingual data processing. Moreover, the application modes have become more diverse. In addition to mature software and hardware products, it supports the customization of industry-specific machine-translation models, private cloud and localized deployment, hardware and software integrated machine, and cloud call.

GTCOM Wins Five Championships at the 2020 Conference on Machine Translation, Ranking First in the World

Contact Us