2026
Fabian Degen* 1, Oishi Deb* 1, Jindong Gu2, Junchi Yu1, Samuele Marro1, Philip Torr1, Jialin Yu† 1
Corresponding Author: yu.jialin@outlook.com
We introduce Plan2Map, a 208-case multimodal benchmark for document-grounded geospatial boundary reconstruction from UK planning records. Given only a source planning document, a system must reconstruct a valid geospatial boundary from notice text, schedules, map plates, map labels, and boundary annotations; the reference GeoJSON is held out for scoring.
Planning records often answer a spatial question: which area is subject to a particular rule? For UK Article 4 Directions, the affected area is typically described through a legal notice and an accompanying map rather than provided as machine-readable geometry. Digital planning systems, however, need such geometry to check whether a site falls inside an affected area, compare restrictions, and audit records over time.
The source documents usually provide only indirect spatial evidence: notice text, legal schedules, scanned or embedded map plates, map labels, coloured or hatched regions, and boundary annotations. Recovering the boundary is therefore not simply field extraction, but reconstruction of a structured spatial object from distributed textual, visual, and geographic evidence.
Plan2Map contains 208 manually reviewed UK Article 4 Direction records, spanning 1958–2025 and covering 29 local planning authorities across England. Each released case bundles three artefacts:
Cases span scanned and born-digital documents, varied map quality, and boundaries ranging from simple parcels to irregular or multi-part geometries. Metadata covers local authority, site description, document quality (✅ Good / ⚠️ Bad), document colour (📄 White / 📜 Yellow), boundary shape, and shape complexity (🟢 Easy / 🟡 Medium / 🔴 Hard).
Representative cases across the 3 × 2 × 2 strata (shape complexity × document colour × quality). Auto-advances every 4 seconds — hover to pause.
We propose GeoPlanAgent, a document-grounded, geospatial-tool-in-the-loop system that decomposes the task into evidence extraction, localisation, map registration, boundary segmentation, projection, and verification. Rather than asking a multimodal model to generate a geospatial polygon in one pass, GeoPlanAgent mirrors the task structure with specialised components.
On all 208 cases of Plan2Map, GeoPlanAgent reaches 0.736 mean IoU and 0.904 median IoU, with 67.8% of cases at IoU ≥ 0.8. Median centroid error is 4.6 m and Acc@0.1D reaches 78.8%. Direct VLM-to-GeoJSON baselines are substantially weaker.
| Method | %IoU>0 ↑ | Mean IoU ↑ | Med IoU ↑ | %IoU≥0.8 ↑ | Err (m) ↓ | Acc@0.1D ↑ | $/doc ↓ |
|---|---|---|---|---|---|---|---|
| Gemini-3.1-Pro (VLM end-to-end) | 40.4% | 0.108 | 0.000 | 1.4% | 480 | 9.6% | 0.106 |
| GeoPlanAgent (ours) | 89.4% | 0.736 | 0.904 | 67.8% | 4.6 | 78.8% | 0.043 |
| GeoPlanAgent + Critic | 89.9% | 0.740 | 0.906 | 67.8% | 4.6 | 78.8% | 0.045 |
Component ablations show that (i) direct VLM-to-GeoJSON prediction remains unreliable, (ii) supervised LoRA fine-tuning of SAM 3 lifts boundary segmentation by ≥ 0.30 pixel IoU over the vanilla baseline, and (iii) sliding-window map registration tightens median centroid error from 176 m to 5 m — a 38× improvement over the Locate stage alone.
@inproceedings{plan2map2026,
title = {Plan2Map: A Multimodal Benchmark for Document-Grounded
Geospatial Boundary Reconstruction from Planning Records},
author = {Degen, Fabian and Deb, Oishi and Gu, Jindong and Yu, Junchi
and Marro, Samuele and Torr, Philip and Yu, Jialin},
year = {2026}
}
Contains OS data © Crown copyright and database right 2026. Contains Royal Mail data © Royal Mail copyright and database right 2026. Contains National Statistics data © Crown copyright and database right 2026. Source planning documents and reference GeoJSON boundaries are reproduced from planning.data.gov.uk under the Open Government Licence v3.0.