Skip to content

DAMO-NLP-SG/GeoPQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning

We investigate a critical bottleneck in Multimodal Large Language Models (MLLMs): their limited visual perception, which hinders their ability to solve complex geometric reasoning tasks. We find that even state-of-the-art models struggle to accurately perceive basic geometric concepts, preventing effective reasoning.

📝 Our Solution:

We introduce GeoPQA, a benchmark to measure this gap, and propose a two-stage training framework:

1️⃣ Perception First: We train the MLLM to accurately identify geometric structures.

2️⃣ Reasoning Second: With a solid visual foundation, we then train it on complex, multi-step reasoning.

📈 The results are exciting! Our two-stage approach boosts geometric problem-solving accuracy by 9.1% over traditional methods. Our work highlights a key principle: for MLLMs to truly reason, they must first learn to see.

✒️ Citation

If you find our work useful, please consider citing our paper:

@misc{chen2025geopqabridgingvisualperception,
      title={GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning}, 
      author={Guizhen Chen and Weiwen Xu and Hao Zhang and Hou Pong Chan and Deli Zhao and Anh Tuan Luu and Yu Rong},
      year={2025},
      eprint={2509.17437},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.17437}, 
}

About

[EMNLP 2025] GeoPQA: Bridging the Visual Perception Gap in MLLMs for Geometric Reasoning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published