Improve AI using data from people first

Built and maintained by contributors, Mozilla Data Collective is a new platform for real-world data sharing that houses multilingual, multimodal datasets in more than 300 languages. For ASR, TTS, Translation, and SLM, they publish exclusive, permissively licensed datasets that may be accessed through the datacollective Python package. This week's new releases consist of: Text-to-speech: TTS corpus in Bulgarian Code-switching: Nahuatl dialogues with code-switching annotations Youth speech: Indonesian youth speech audio corpus On Mozilla Data Collective, find exclusive public datasets.