News
21 hours ago
MIVPG on E-commerce: Multi-Image/Multi-Patch Aggregation for Captioning
MIVPG uses hierarchical MIL to outperform patch concatenation and single-image baselines, proving CSA is key for correlation.
4 days ago
Evaluating Visual Adapters: MIVPG Performance on Single and Multi-Image Inputs
Details MIVPG experiments across single- and multi-image scenarios. Model uses frozen LLM and Visual Encoder, updating only the MI...
