MONROVIA, Calif., March 19, 2025 /PRNewswire/ -- Nexdata, a leading global provider of AI data services today announces the start of The Multilingual Conversational Speech LLM (MLC-SLM) Challenge, an officially approved satellite event of Interspeech 2025.

This challenge, hosted by Meta, Google, Samsung, Naver, China Mobile, Northwestern Polytechnical University and Nexdata, aims to advance multilingual conversational speech AI by providing a real-world dataset and encouraging innovation in speech language models.

The challenge consists of two tasks, both of which require participants to explore the development of speech language models (SLMs):

Task I: Multilingual Conversational Speech Recognition

Objective: Develop a multilingual LLM-based ASR model. Participants will be provided with oracle segmentation and speaker labels for each conversation.

Get the latest news
delivered to your inbox
Sign up for The Manila Times newsletters
By signing up with an email address, I acknowledge that I have read and agree to the Terms of Service and Privacy Policy.

Task II: Multilingual Conversational Speech Diarization and Recognition

Objective: Develop a system for both speaker diarization (identifying who is speaking when), and recognition (transcribing speech to text). No prior or oracle information will be provided during evaluation (e.g., no pre-segmented utterances or speaker labels). Both pipeline-based and end-to-end systems are encouraged, providing flexibility in system design and implementation.

The training set (Train) comprises approximately 11 languages: English (en), French (fr), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (jp), Korean (ko), Russian (ru), Thai (th), Vietnamese (vi). It's designed to provide a rich resource for training and evaluating multilingual conversational speech language models (MLC-SLM), addressing the challenges of linguistic diversity, speaker variability, and contextual understanding.

Important Dates (AOT Time)

March 10, 2025: Registration opens

March 15, 2025: Training data release

March 20, 2025: Development set and baseline system release

May 15, 2025: Evaluation set release and Leaderboard open

May 30, 2025: Leaderboard freeze and paper submission portal opens (CMT system)

June 15, 2025: Paper submission deadline

July 1, 2025: Notification of acceptance

August 18, 2025: Workshop date

We have set a prize pool of $20,000 for the winners. Based on performance, the top three teams in each track will be awarded:

1st Prize: $5,000

2nd Prize: $3,000

3rd Prize: $2,000

For more details, please check out the challenge website: https://www.nexdata.ai/competition/mlc-slm

Participate here: https://docs.google.com/forms/d/e/1FAIpQLSftZCRQQWvO5NZd-bPo1VT2Xsaieu_ZYCklw6MhW6LqjWnuYQ/viewform?usp=send_form

For inquiries: mlc-slmw@nexdata.ai

Join us in shaping the future of multilingual conversational AI and be part of this groundbreaking challenge!

About Nexdata

Nexdata provides top-notch training data solutions and serves as your reliable partner. With an extensive array of off-the-shelf datasets and flexible data collection and annotation services, our mission revolves around unleashing AI's full potential and expediting the AI industry's growth.