Improve AI using data from people first
Built and maintained by contributors, Mozilla Data Collective is a new platform for real-world data sharing that houses multilingual, multimodal datasets in more than 300 languages. For ASR, TTS, Translation, and SLM, they publish exclusive, permissively licensed datasets that may be accessed through the datacollective Python package.
This week's new releases consist of:
Text-to-speech: TTS corpus in Bulgarian
Code-switching: Nahuatl dialogues with code-switching annotations
Youth speech: Indonesian youth speech audio corpus
On Mozilla Data Collective, find exclusive public datasets.
🔗 External Resource:
Visit Link →