
LyngualLabs
Research lab developing speech and language technologies for low-resource communities.

About LyngualLabs
LyngualLabs: Speech and Language Tech for Low-Resource Communities
LyngualLabs is a research lab dedicated to building inclusive speech and language technologies for multilingual and low-resource communities. By combining cutting-edge research, ethical data practices, and innovative AI solutions, the organization focuses heavily on African languages to ensure AI systems capture local dialects and code-switched speech, ultimately bridging the global digital divide.
Key Features
- YECS Corpus Dataset: Provides the largest open-access Yoruba-English Code-Switching speech dataset, featuring 120 hours of validated naturalistic conversation across 140 speakers.
- Code-Switched Speech Optimization: Develops ASR and translation models explicitly optimized to understand complex, real-world conversational contexts where speakers alternate between languages natively.
- End-to-End Data Curation: Offers scalable voice dataset curation using specialized collection infrastructure to create high-quality, ethical data for multilingual AI.
- Custom Collection Platforms: Provides tools and applications available for licensing or renting by universities, enterprises, and research labs to build their own inclusive datasets.
- Ethical Data Farming: Partners directly with local communities through a custom app, ensuring transparent consent, fair compensation for native speakers, and expert annotation to remove systemic biases.
Use Cases
- Universities and research labs building inclusive, high-quality AI datasets for underrepresented languages
- Enterprises developing Automatic Speech Recognition and Text-to-Speech systems for code-switched and multilingual communities
- Native speakers seeking to contribute voice data to preserve their language and earn compensation
Getting Started
Website: https://www.lynguallabs.org/
Inquiries and Collaborations: Reach out via the website to license data collection tools, access the YECS Corpus, or join data farming initiatives.
LyngualLabs empowers multilingual communities by making language technology more inclusive and accessible, ensuring that speakers of all languages can benefit from the future of artificial intelligence.