Uploaded on Oct 17, 2024
In an October 4, 2024 paper, researchers from Johns Hopkins University and Microsoft, introduced X-ALMA, a large language model (LLM)-based multilingual translation model that delivers “top-tier performance” across 50 languages, regardless of resource availability. While many multilingual LLMs attempt to support hundreds of languages, they often struggle to maintain quality, especially for mid- and low-resource languages, where “their performance […] falls short of practical application expectations,” the researchers explained. They also emphasized that this leads to “imbalanced performance heavily skewed in favor of high-resource languages.”
Microsoft and Johns Hopkins Unveil Multilingual AI Translation Model for 50 Languages
MICROSOFT AND JOHNS HOPKI
NS UNVEIL MULTILINGUAL AI T
RANSLATION MODEL FOR 50 L
ANGUAGES
www.slator.com
Introduction
In an October 4, 2024 paper,
researchers from
Johns Hopkins University and
Microsoft, introduced X-ALMA, a
large language model (LLM)-based
multilingual translation model that
delivers “top-tier performance” across
50 languages, regardless of resource
availability.
www.slator.com
While many multilingual LLMs
attempt to support hundreds of
languages, they often struggle to
maintain quality, especially for
mid- and
low-resource languages, where
“their performance […] falls short
of practical application
expectations,” the researchers
explained. They also emphasized
that this leads to “imbalanced
performance heavily skewed in
favor of high-resource languages.”
www.slator.co
m
Even for high-resource languages, quality tends to
decline when models are trained on too many
languages — a problem known as the ‘curse of
multilinguality’. As the researchers pointed out, in
current state-of-the-art massively multilingual
models, “overall quality decreases as the number
of supported languages increases.”
X-ALMA takes a different approach by focusing
on a set of 50 diverse languages, rather than
attempting to scale to hundreds. “We prioritize
quality over scaling the number of languages,
with a focus on multilingual machine translation
tasks,” the researchers said.
www.slator.com
Building on ALMA-R, previously recognized as “one of top-performing
translation models built on LLMs, comparable to WMT winners and GPT-4-turbo,”
X-ALMA extends support to an additional 44 languages.
A core innovation of X-ALMA is its ‘plug-and-play’ architecture, which minimizes
negative language interference through language-specific modules. These
modules are tailored to handle specific groups of languages. They can be
activated individually — saving memory and computational power — or
combined using a mixture-of-experts approach, allowing the model to adapt
flexibly to different linguistic needs while maintaining high translation quality.
To ensure top-tier performance, X-ALMA underwent a rigorous training process
that consists of three pre-training stages and two post-training stages.
www.slator.com
In the pre-training stages, the base model is trained on monolingual data, and
language-specific modules are fine-tuned to their respective languages,
ensuring they handle both high- and low-resource languages effectively. During
the post-training stages, the model is further refined using high-quality
translation data, followed by an optimization process called Adaptive-Rejection
Preference Optimization (ARPO).
ARPO is an optimization method designed to tackle the ‘over-rejection’ issue
common in traditional machine translation models. The researchers describe
this as “a phenomenon where the writing style of the translation outputs is
forced away from the preferred data distribution”. In simple terms, when two
translations are very similar, traditional models tend to reject both options,
even when one is clearly better. ARPO adjusts the rejection strength based on
how close the translations are, ensuring that the model generates translations
closer to the preferred outputs.
www.slator.com
When evaluated on the FLORES-200 and WMT’23 datasets, X-ALMA consistently
outperformed other massively multilingual models, including NLLB-3.3B, LLaMAX3-
Alpaca-8B, and Aya-101, across all language pairs in both directions (into and from
English), as measured by the COMET-22 metric. It also surpassed high-resource
language models like Aya-23-8B and Aya-23-35B.
“We tackled the challenge of achieving high translation quality while scaling to a
large number of languages, a limitation seen in many state-of-the-art multilingual
models,” the researchers noted.
The researchers have made the code and model checkpoints publicly available,
contributing to the broader open-source community. The code is available on
GitHub, and the models and datasets can be accessed on Hugging Face.
www.slator.com
Slator is the leading source of news and research for the global translation,
localization, and language technology industry. Our Advisory practice is a trusted
partner to clients looking for independent analysis. Headquartered in Zurich,
Slator has a presence in Asia, Europe, and the US.
Connect With Us
www.slator.com
Comments