Plan2Map
A Multimodal Benchmark for Document-Grounded Geospatial Boundary Reconstruction from Planning Records

2026

Fabian Degen* 1, Oishi Deb* 1, Jindong Gu2, Junchi Yu1, Samuele Marro1, Philip Torr1, Jialin Yu† 1

* Joint 1st Author  ·  Corresponding Author
1University of Oxford  ·  2Google DeepMind

Corresponding Author: yu.jialin@outlook.com

Paper Code · Coming Soon Dataset · Coming Soon BibTeX
Plan2Map overview

We introduce Plan2Map, a 208-case multimodal benchmark for document-grounded geospatial boundary reconstruction from UK planning records. Given only a source planning document, a system must reconstruct a valid geospatial boundary from notice text, schedules, map plates, map labels, and boundary annotations; the reference GeoJSON is held out for scoring.

Motivation

Planning records often answer a spatial question: which area is subject to a particular rule? For UK Article 4 Directions, the affected area is typically described through a legal notice and an accompanying map rather than provided as machine-readable geometry. Digital planning systems, however, need such geometry to check whether a site falls inside an affected area, compare restrictions, and audit records over time.

The source documents usually provide only indirect spatial evidence: notice text, legal schedules, scanned or embedded map plates, map labels, coloured or hatched regions, and boundary annotations. Recovering the boundary is therefore not simply field extraction, but reconstruction of a structured spatial object from distributed textual, visual, and geographic evidence.

The Plan2Map Benchmark

Plan2Map contains 208 manually reviewed UK Article 4 Direction records, spanning 1958–2025 and covering 29 local planning authorities across England. Each released case bundles three artefacts:

  • The source planning PDF — text, schedules, maps, labels, and boundary annotations.
  • A verified reference GeoJSON (Polygon or MultiPolygon), held out as the evaluation reference.
  • A rendered location-map PNG overlaying the reference geometry on an OpenStreetMap basemap.

Cases span scanned and born-digital documents, varied map quality, and boundaries ranging from simple parcels to irregular or multi-part geometries. Metadata covers local authority, site description, document quality (✅ Good / ⚠️ Bad), document colour (📄 White / 📜 Yellow), boundary shape, and shape complexity (🟢 Easy / 🟡 Medium / 🔴 Hard).

Examples from the Dataset

GeoPlanAgent

We propose GeoPlanAgent, a document-grounded, geospatial-tool-in-the-loop system that decomposes the task into evidence extraction, localisation, map registration, boundary segmentation, projection, and verification. Rather than asking a multimodal model to generate a geospatial polygon in one pass, GeoPlanAgent mirrors the task structure with specialised components.

Pipeline

  1. Reader — single multimodal LLM call that converts the raw PDF into a typed record of spatial evidence (postcodes, grid references, addresses, road and place names, map labels, printed scale, page-level metadata).
  2. Worker — produces the final GeoJSON via a tool-calling loop:
    1. Locate sub-agent — queries an OS Open Names gazetteer to return an approximate map centre with an uncertainty radius σ.
    2. Map-tile matching — aligns the planning map against OS Open Zoomstack tiles using MINIMA-LoFTR + RANSAC.
    3. Boundary segmentation — SAM 3 with LoRA adapters, fine-tuned on planning maps with style-transfer augmentation.
    4. Projection — projects the mask into WGS84 via the recovered affine transform.
  3. Critic (optional) — independent LLM that reviews the top-3 candidates post-commit and may approve, switch, or request re-localisation.
GeoPlanAgent workflow diagram

GeoPlanAgent workflow.

Results

On all 208 cases of Plan2Map, GeoPlanAgent reaches 0.736 mean IoU and 0.904 median IoU, with 67.8% of cases at IoU ≥ 0.8. Median centroid error is 4.6 m and Acc@0.1D reaches 78.8%. Direct VLM-to-GeoJSON baselines are substantially weaker.

0.904
Median IoU
67.8%
IoU ≥ 0.8
4.6 m
Centroid Error
78.8%
Acc@0.1D

Main results (full 208-case benchmark)

Method %IoU>0 ↑ Mean IoU ↑ Med IoU ↑ %IoU≥0.8 ↑ Err (m) ↓ Acc@0.1D ↑ $/doc ↓
Gemini-3.1-Pro (VLM end-to-end) 40.4% 0.108 0.000 1.4% 480 9.6% 0.106
GeoPlanAgent (ours) 89.4% 0.736 0.904 67.8% 4.6 78.8% 0.043
GeoPlanAgent + Critic 89.9% 0.740 0.906 67.8% 4.6 78.8% 0.045

Component ablations show that (i) direct VLM-to-GeoJSON prediction remains unreliable, (ii) supervised LoRA fine-tuning of SAM 3 lifts boundary segmentation by ≥ 0.30 pixel IoU over the vanilla baseline, and (iii) sliding-window map registration tightens median centroid error from 176 m to 5 m — a 38× improvement over the Locate stage alone.

BibTeX

@inproceedings{plan2map2026,
  title     = {Plan2Map: A Multimodal Benchmark for Document-Grounded
               Geospatial Boundary Reconstruction from Planning Records},
  author    = {Degen, Fabian and Deb, Oishi and Gu, Jindong and Yu, Junchi
               and Marro, Samuele and Torr, Philip and Yu, Jialin},
  year      = {2026}
}

Data Attribution

Contains OS data © Crown copyright and database right 2026. Contains Royal Mail data © Royal Mail copyright and database right 2026. Contains National Statistics data © Crown copyright and database right 2026. Source planning documents and reference GeoJSON boundaries are reproduced from planning.data.gov.uk under the Open Government Licence v3.0.