Tools & Products

NVIDIA Releases Open-Source Tools for Multilingual Speech AI

The Granary dataset, released Aug. 15, 2025, contains approximately 1 million hours of audio to address data scarcity in Europe.

Olivia Sharp 1 min read 577 views
Free
NVIDIA on Aug. 15, 2025, released Granary, a massive open-source dataset with 1 million hours of audio, and two new models to advance multilingual speech AI.

The Granary Dataset

NVIDIA on Aug. 15, 2025, released a major open-source dataset and two new models for multilingual speech AI. The release is aimed at addressing data scarcity for many European languages and accelerating the development of more inclusive speech technologies. The centerpiece of the initiative is Granary, a massive dataset containing approximately 1 million hours of audio.

The dataset includes nearly 650,000 hours for speech recognition and over 350,000 hours for speech translation. It was developed in collaboration with researchers from Carnegie Mellon University and Fondazione Bruno Kessler. The dataset and its associated models are now available …

Archive Access

This article is older than 24 hours. Create a free account to access our 7-day archive.

Share this article

Related Articles