A Multimodal Dataset for Synthesizing Rap Vocals and 3D Motion

RapVerse is a large-scale multimodal dataset designed to train AI models on rap music. It includes two main subsets: Rap-Vocal (108+ hours of rap vocals paired with clean, timestamped lyrics) and Rap-Motion (26+ hours of studio performance videos annotated with 3D body mesh data using SMPL-X). The dataset supports research into singing voice synthesis, motion generation, and multimodal learning. Collected through a meticulous pipeline involving audio separation, transcription, human pose estimation, and manual filtering, RapVerse is a unique resource at the intersection of music, language, and embodied AI.

Source: HackerNoon →

Blog

A Multimodal Dataset for Synthesizing Rap Vocals and 3D Motion

Category

Related News

How This AI Model Generates Singing Avatars From Lyrics

Joint Modeling of Text, Audio, and 3D Motion Using RapVerse

This AI Turns Lyrics Into Fully Synced Song and Dance Performances

Text-to-Rap AI Turns Lyrics Into Vocals, Gestures, and Facial Expressions

The RapVerse Dataset: A New Benchmark in Text-to-Music and Motion Generation

Top Category

Blog

A Multimodal Dataset for Synthesizing Rap Vocals and 3D Motion

Category

Share

Related News

How This AI Model Generates Singing Avatars From Lyrics

Joint Modeling of Text, Audio, and 3D Motion Using RapVerse

This AI Turns Lyrics Into Fully Synced Song and Dance Performances

Text-to-Rap AI Turns Lyrics Into Vocals, Gestures, and Facial Expressions

The RapVerse Dataset: A New Benchmark in Text-to-Music and Motion Generation

Top Category