NVIDIA Launches Granary Dataset to Enhance Multilingual Speech AI




Jessie A Ellis
Aug 15, 2025 09:01

NVIDIA introduces the Granary dataset and models designed to improve speech recognition and translation across 25 European languages, addressing data scarcity in AI language models.





NVIDIA has unveiled a new open dataset and models aimed at advancing multilingual speech AI, addressing the limited language support in existing AI language models. The Granary dataset, alongside the NVIDIA Canary and Parakeet models, seeks to enhance speech recognition and translation capabilities for 25 European languages, including underrepresented ones such as Croatian, Estonian, and Maltese, according to NVIDIA’s blog.

Granary Dataset: A New Resource for AI Developers

The Granary dataset is a comprehensive collection of multilingual speech datasets, encompassing approximately a million hours of audio. This includes nearly 650,000 hours dedicated to speech recognition and over 350,000 hours for speech translation. The dataset is accessible on Hugging Face, providing a valuable resource for developers to scale AI applications globally, facilitating the creation of multilingual chatbots, customer service voice agents, and real-time translation services.

Developed in collaboration with Carnegie Mellon University and Fondazione Bruno Kessler, the Granary dataset utilizes NVIDIA’s NeMo Speech Data Processor toolkit to transform unlabeled audio into structured, high-quality data. This innovative processing pipeline allows for enhanced public speech data without the need for extensive human annotation, making it a critical resource for AI training in the European Union’s official languages, plus Russian and Ukrainian.

Introducing NVIDIA Canary and Parakeet Models

The NVIDIA Canary-1b-v2 and Parakeet-tdt-0.6b-v3 models, trained on the Granary dataset, offer powerful tools for transcription and translation. Canary-1b-v2, a billion-parameter model, supports high-quality transcription of European languages and translation between English and 24 other languages. Meanwhile, Parakeet-tdt-0.6b-v3, with 600 million parameters, is optimized for real-time or large-volume transcription tasks.

Both models are designed to provide accurate punctuation, capitalization, and word-level timestamps in their outputs. Canary-1b-v2 is particularly notable for its efficiency, offering transcription and translation quality comparable to models three times its size, while running inference up to ten times faster.

Advancing Speech AI Innovation

By sharing the methodology behind Granary and its associated models, NVIDIA is empowering the global speech AI developer community to adapt similar data processing workflows to other automatic speech recognition (ASR) or automatic speech translation (AST) models, thereby accelerating innovation in the field. The models and dataset are publicly available under a permissive license, encouraging widespread use and adaptation.

The Granary dataset and NVIDIA’s new models represent a significant step forward in addressing the challenges of data scarcity in speech AI, particularly for languages that have been historically underrepresented in AI language models. This initiative not only broadens the scope of multilingual speech recognition and translation but also enhances the inclusivity and effectiveness of AI technologies globally.

The Granary dataset and models are available for exploration on Hugging Face, and further details can be accessed on NVIDIA’s blog.

Image source: Shutterstock




#NVIDIA #Launches #Granary #Dataset #Enhance #Multilingual #Speech

Leave a Reply

Your email address will not be published. Required fields are marked *