Plan2Map
A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

2026

Fabian Degen^{* 1}, Oishi Deb^{* 1}, Jindong Gu², Junchi Yu¹, Samuele Marro¹, Philip Torr¹, Jialin Yu^{† 1}

^* Equal Contribution · ^† Corresponding Author

¹University of Oxford · ²Google DeepMind

fabian.degen@tuta.com; oishideb@robots.ox.ac.uk; yu.jialin@outlook.com

Paper Code Dataset · Coming Soon BibTeX

We introduce Plan2Map, a 208-case multimodal benchmark for document-grounded geospatial boundary reconstruction from UK planning records. Given only a source planning document, a system must reconstruct a valid geospatial boundary from notice text, schedules, map plates, map labels, and boundary annotations; the reference GeoJSON is held out for scoring.

Motivation

Planning records often answer a spatial question: which area is subject to a particular rule? For UK Article 4 Directions, the affected area is typically described through a legal notice and an accompanying map rather than provided as machine-readable geometry. Digital planning systems, however, need such geometry to check whether a site falls inside an affected area, compare restrictions, and audit records over time.

The source documents usually provide only indirect spatial evidence: notice text, legal schedules, scanned or embedded map plates, map labels, coloured or hatched regions, and boundary annotations. Recovering the boundary is therefore not simply field extraction, but reconstruction of a structured spatial object from distributed textual, visual, and geographic evidence.

The Plan2Map Benchmark

Plan2Map contains 208 manually reviewed UK Article 4 Direction records, spanning 1958–2025 and covering 29 local planning authorities across England. Each released case bundles three artefacts:

The source planning PDF — text, schedules, maps, labels, and boundary annotations.
A verified reference GeoJSON (Polygon or MultiPolygon), held out as the evaluation reference.
A rendered location-map PNG overlaying the reference geometry on an OpenStreetMap basemap.

Cases span scanned and born-digital documents, varied map quality, and boundaries ranging from simple parcels to irregular or multi-part geometries. Metadata covers local authority, site description, document quality (✅ Good / ⚠️ Bad), document colour (📄 White / 📜 Yellow), boundary shape, and shape complexity (🟢 Easy / 🟡 Medium / 🔴 Hard).

Examples from the Dataset

Plan2Map example — Easy complexity, Yellow document, Good quality

🟢 Easy 📜 Yellow ✅ Good

Plan2Map example — Medium complexity, White document, Good quality

🟢 Easy 📄 White ✅ Good

Plan2Map example — Medium complexity, Yellow document, Good quality

🟡 Medium 📜 Yellow ✅ Good

Plan2Map example — Medium complexity, Yellow document, Bad quality

🟡 Medium 📜 Yellow ⚠️ Bad

Plan2Map example — Hard complexity, White document, Good quality

🔴 Hard 📄 White ✅ Good

Plan2Map example — Hard complexity, White document, Bad quality

🔴 Hard 📄 White ⚠️ Bad

Representative cases across the 3 × 2 × 2 strata (shape complexity × document colour × quality). Auto-advances every 4 seconds — hover to pause.

GeoPlanAgent

We propose GeoPlanAgent, a document-grounded, geospatial-tool-in-the-loop system that decomposes the task into evidence extraction, localisation, map registration, boundary segmentation, projection, and verification. Rather than asking a multimodal model to generate a geospatial polygon in one pass, GeoPlanAgent mirrors the task structure with specialised components.

Pipeline

Reader — single multimodal LLM call that converts the raw PDF into a typed record of spatial evidence (postcodes, grid references, addresses, road and place names, map labels, printed scale, page-level metadata).
Worker — produces the final GeoJSON via a tool-calling loop:
1. Locate sub-agent — queries an OS Open Names gazetteer to return an approximate map centre with an uncertainty radius σ.
2. Map-tile matching — aligns the planning map against OS Open Zoomstack tiles using MINIMA-LoFTR + RANSAC.
3. Boundary segmentation — SAM 3 with LoRA adapters, fine-tuned on planning maps with style-transfer augmentation.
4. Projection — projects the mask into WGS84 via the recovered affine transform.
Critic (optional) — independent LLM that reviews the top-3 candidates post-commit and may approve, switch, or request re-localisation.

GeoPlanAgent workflow.

Explore the pipeline

Click any stage of the pipeline to expand it and look at the inputs and outputs of that step. The Map registration page opens an interactive animation showing how MINIMA-LoFTR is used to refine the initial location from the Locate sub-agent.

Planning map (case 12:00116:ART4)

Planning map of Loddon, Norfolk, resized to match OS tile pixel scale

OS Open Zoomstack tile canvas (z17, 7×9 tiles ≈ 1.3×1.7 km)

window0 / 0

zoom—

n_inliers0

best so far0

How the map is matched

Locate gave a rough area; this step finds exactly where the planning map sits on the ground. MINIMA-LoFTR looks for the same distinctive features — road junctions, field edges, building corners — in both the planning map and the Ordnance Survey basemap, and slides the map across the basemap one step at a time. The number on each window is how many of those feature matches actually line up (its inliers): more is better, and the strongest window wins, shown in gold — that alignment is what lets the boundary be read off in real-world coordinates.

This is a simplified picture: in the full pipeline the matcher searches over several zoom levels and re-ranks the strongest candidates with additional metrics before committing to one. See the paper for the full details.

Method	%IoU>0 ↑	Mean IoU ↑	Med IoU ↑	%IoU≥0.8 ↑	Err (m) ↓	Acc@0.1D ↑	$/doc ↓
Gemini-3.1-Pro (VLM end-to-end)	40.4%	0.108	0.000	1.4%	480	9.6%	0.106
GeoPlanAgent (ours)	89.4%	0.736	0.904	67.8%	4.6	78.8%	0.043
GeoPlanAgent + Critic	89.9%	0.740	0.906	67.8%	4.6	78.8%	0.045

Plan2Map
A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

Motivation