News

Nov 18, 2025

Future MLLMs: Contribution of MIL-Based Techniques and Enriched Visual Signals

This paper concludes that MIVPG is a general, powerful component for fusing enriched visual representations in MLLMs.

Nov 18, 2025

MIVPG uses hierarchical MIL to outperform patch concatenation and single-image baselines, proving CSA is key for correlation.

Nov 18, 2025

MIVPG significantly outperforms baselines by using instance correlation and shows strong domain adaptation over epochs.

Nov 15, 2025

Details MIVPG experiments across single- and multi-image scenarios. Model uses frozen LLM and Visual Encoder, updating only the MI...

Nov 15, 2025

MIVPG uses a Correlated Self-Attention (CSA) module to unveil instance correlation, fulfilling all MIL properties while outperform...

Nov 15, 2025

Details MIVPG's hierarchical approach to MIL for multi-image samples. It treats both image patches and whole images as 'instances'...