GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics
Abstract
GeoAgent achieves superior geolocation reasoning performance through a specialized dataset and reward mechanisms that ensure geographic accuracy and reasoning consistency.
This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grained address conclusions. Previous RL-based methods have achieved breakthroughs in performance and interpretability but still remain concerns because of their reliance on AI-generated chain-of-thought (CoT) data and training strategies, which conflict with geographic characteristics. To address these issues, we first introduce GeoSeek, a new geolocation dataset comprising CoT data annotated by geographic experts and professional players. We further thoroughly explore the inherent characteristics of geographic tasks and propose a geo-similarity reward and a consistency reward assessed by a consistency agent to assist training. This encourages the model to converge towards correct answers from a geographic perspective while ensuring the integrity and consistency of its reasoning process. Experimental results show that GeoAgent outperforms existing methods and a series of general VLLMs across multiple grains, while generating reasoning that closely aligns with humans.
Community
GeoAgent is a model capable of reasoning closely with humans and deriving fine-grained address conclusions.
For more details, please refer to our project and github repo
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach (2026)
- LocationAgent: A Hierarchical Agent for Image Geolocation via Decoupling Strategy and Evidence from Parametric Knowledge (2026)
- SpotAgent: Grounding Visual Geo-localization in Large Vision-Language Models through Agentic Reasoning (2026)
- Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization (2026)
- GTPred: Benchmarking MLLMs for Interpretable Geo-localization and Time-of-capture Prediction (2026)
- RSGround-R1: Rethinking Remote Sensing Visual Grounding through Spatial Reasoning (2026)
- GeoReason: Aligning Thinking And Answering In Remote Sensing Vision-Language Models Via Logical Consistency Reinforcement Learning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper