Blog

Aug 07, 2025

A Multimodal Dataset for Synthesizing Rap Vocals and 3D Motion

RapVerse is a large-scale multimodal dataset designed to train AI models on rap music. It includes two main subsets: Rap-Vocal (108+ hours of rap vocals paired with clean, timestamped lyrics) and Rap-Motion (26+ hours of studio performance videos annotated with 3D body mesh data using SMPL-X). The dataset supports research into singing voice synthesis, motion generation, and multimodal learning. Collected through a meticulous pipeline involving audio separation, transcription, human pose estimation, and manual filtering, RapVerse is a unique resource at the intersection of music, language, and embodied AI.

Source: HackerNoon →


Share

BTCBTC
$68,457.00
0.75%
ETHETH
$2,156.45
2.83%
USDTUSDT
$1.000
0.07%
BNBBNB
$615.34
0.26%
XRPXRP
$1.35
1.18%
USDCUSDC
$1.000
0.01%
SOLSOL
$82.77
0.11%
TRXTRX
$0.316
0.69%
FIGR_HELOCFIGR_HELOC
$1.02
0.19%
DOGEDOGE
$0.0930
0.92%
USDSUSDS
$1.000
0.04%
WBTWBT
$52.38
0.46%
LEOLEO
$10.01
0.29%
ADAADA
$0.250
3.68%
BCHBCH
$455.14
2.42%
HYPEHYPE
$36.12
0.9%
LINKLINK
$9.02
3.07%
XMRXMR
$335.72
2.69%
USDEUSDE
$0.999
0.03%
XLMXLM
$0.171
2.22%