Uplift AI

    Uplift AI

    Tech
    TTS

    Provides raw audio data for pretraining foundation models in low-resource languages.

    Uplift AI banner

    About Uplift AI

    Uplift AI: Raw audio data for pretraining foundation models

    Uplift AI, founded by former engineers from Apple Siri, Amazon Alexa, and AWS Bedrock, is creating a large-scale audio dataset for pretraining foundation models. The company's goal is to unlock new model capabilities across hundreds of low-resource languages. Their mission is to accelerate the transition to voice interfaces, especially for the 13% of the global population who cannot read, thereby providing access to digital knowledge and services. Uplift AI has made low-resource languages its only priority and is backed by investors including YCombinator, IVC, and RTP.

    Dataset Characteristics

    The dataset is designed with the following principles:

    • Massive scale, without skew: The dataset is built for massive scale but avoids extreme domain imbalances to help base models become better generalists.

    • Real conversations, real settings: It features audio of real people having conversations in real settings, moving away from 'produced' content found in movies, podcasts, and news.

    • Diverse acoustic environments: The data includes realistic and varied acoustic environments, such as a person driving a tractor, cooking, in public transit, operating a forklift, hiking, or in a plane.

    • Globally diverse, locally diverse: The dataset aims for balanced representation across geography, occupations, age, income, accent, and language.

    Use Cases

    The primary use case for the data is to train SOTA (state-of-the-art) voice models and pretrain foundation models for low-resource languages. Uplift AI also partners with research teams to design pretraining data distributions for any use case.

    Getting Started

    Users can acquire data through a four-step process:

    • Step 1: Request Sample: Set up a call to discuss your use case and receive relevant data samples.

    • Step 2: Purchase Access: Enter into a data license agreement for the dataset and use cases your team needs.

    • Step 3: Receive data: For existing data, access is granted to your team within two to four days.

    • Step 4: Experiment with us: Partner with the Uplift AI team to design pretraining data distributions for your specific use case.

    To begin, visit the website: https://upliftai.org