Blog
3 days ago
Rewarding the Rare: How Uniqueness-Aware RL Fixes Exploration Collapse
LLMs aren’t bad at reasoning—they’re bad at exploring. Here’s how uniqueness-aware RL fixes exploration collapse by rewarding rare solutions.
Source: HackerNoon →LLMs aren’t bad at reasoning—they’re bad at exploring. Here’s how uniqueness-aware RL fixes exploration collapse by rewarding rare solutions.
Source: HackerNoon →