Methodologyv2.0.2Released 2026-05-14

How OneGoodArea computes a UK area’s signals, scores, and trends.

Signal-first infrastructure. Country-scoped percentiles. Monthly time-series snapshots that cannot be backfilled. Deterministic engine, version stamped on every response.

01The primitive

Signal is the public primitive.

A signal is one measured, sourced, normalized, percentiled, time-stamped attribute of a UK area. Everything above signals is composition. Reports, scores, peers, insights, and forecasts are surfaces built on top.

value

The raw measurement in its native unit. number, string, or null with a reason.

normalized_value

Direction-agnostic position 0–1 within country. Ascending: 0 = lowest, 1 = highest. Read with the signal’s direction field.

percentile

Per-scope rank 0–100 from PERCENT_RANK(). Today: national-within-country. Regional and per-cohort recompute on the roadmap.

confidence

0.0–1.0 with confidence_reason. Source-driven (sample size, freshness, fallback path). Honest, not aspirational.

SIGNALdeprivation.imd_decileE01000002 · v2.0.2
01{
02 "signal_key": "deprivation.imd_decile",
03 "value": 9,
04 "normalized_value": 0.967,
05 "percentile": 96.7,
06 "confidence": 1.0,
07 "confidence_reason": "IMD 2025, England, fresh primary",
08 "direction": "higher_is_better",
09 "observed_period": "2025",
10 "source_snapshot_id": "snap_imd2025_en_…",
11 "engine_version": "2.0.2"
12}

Every signal_value and timeseries row carries source_snapshot_id and engine_version. Re-running the same query against the same engine version returns the same number. Always.

02Data sources

Seven public-record sources back every signal.

We name every source on this page. Marketing copy elsewhere says “multiple sources”; full provenance lives here and in source_snapshots on every API response.

01In production

Deprivation indices

MHCLG (IMD 2025), StatsWales (WIMD 2019), Scottish Gov (SIMD 2020)

Decile, rank, and the seven IMD domain scores per LSOA. Country-specific methodologies; we never compare across the border.

Coverage
England · Wales · Scotland
Cadence
Static per release
Grain
LSOA
02In production

HM Land Registry Price Paid

HM Land Registry

Standard residential sales (PPD category A, types D/S/T/F, non-deleted, positive prices). Median price + transaction count per LSOA per month.

Coverage
England & Wales
Cadence
Monthly
Grain
LSOA × month
03In production

Police.uk crime archive

Home Office

Bulk street-level archive joined to LSOA codes carried on each record. Aggregated to monthly count per LSOA. No spatial join needed.

Coverage
England · Wales · Scotland
Cadence
Monthly
Grain
LSOA × month
04In production

Ofsted inspections

Department for Education

School inspection ratings — Outstanding, Good, Requires Improvement, Inadequate — within 1.5km of postcode. England only.

Coverage
England
Cadence
Live (Ofsted API)
Grain
School (1.5km radius)
05In production

OpenStreetMap

OSM contributors via Overpass

Schools, food/shops, transport stations, bus stops, parks, healthcare. Live amenity counts at radius bands around the postcode.

Coverage
United Kingdom
Cadence
Live (Overpass)
Grain
0.5km - 2km radius
06In production

Environment Agency flood

Defra

Flood risk zones and active warnings around the postcode. Distinguishes river-at-risk from active-warning states.

Coverage
United Kingdom
Cadence
Live (EA API)
Grain
3km - 5km radius
07In production

Postcodes.io geocoding

ONS / Royal Mail (postcodes.io)

Postcode resolution: lat/long, LSOA, local authority, ward, constituency, region, country, rural-urban classification.

Coverage
United Kingdom
Cadence
Live (postcodes.io)
Grain
Postcode
03How we store it

Live fetch, persisted store, geographic spine.

Slow-moving signals (deprivation, prices, crime) live in a persisted store. Live signals (Ofsted, OpenStreetMap, flood) call the source on every request. The geographic spine joins them.

Store tables

geo_entities

Canonical area registry: LSOAs, MSOAs, LADs, regions, with boundary version (2011 / 2021).

geo_lookup

ONS NSPL/ONSPD spine: postcode → OA / LSOA / MSOA / LAD / region. 1.8M postcodes loaded.

source_snapshots

One row per refresh run. Captures source name, captured_at, record count, optional sha. The audit anchor.

signals

Catalog of every signal we expose. signal_key, label, direction, unit, source, description.

signal_values

One row per (signal_key, geo_type, geo_code). Current value, normalized_value, source_snapshot_id, engine_version.

signal_percentiles

Per-scope percentile rank, 0 to 100. Computed in-DB via PERCENT_RANK() window function.

signal_timeseries

Append-only history. PK includes observed_period; INSERT … ON CONFLICT DO NOTHING — corrections surface as next period, never overwrite.

meta.fetch_mode on every response

live

Every contributing signal was fetched from a live source on this request. The fallback path; happens when the store has no row for an area.

store

Every contributing signal was read from the persisted store. No live calls were made. The path that scales.

hybrid

A mix — some signals store-served, some live-served on this request. Honest about partial coverage during the live-to-store migration.

04Normalization

Country-scoped percentiles, never cross-border.

Percentiles are computed in the database with PERCENT_RANK() window functions and persisted. Each country (England, Wales, Scotland) normalizes within itself. IMD 2025, WIMD 2019, and SIMD 2020 are different methodologies; we never compare across.

scope = “national”Live

Rank within country. The current production scope. England LSOAs ranked against England; Wales against Wales; Scotland against Scotland.

scope = “regional”Roadmap

Rank within ONS region (North West, South East, …). The geographic spine is loaded; the per-region percentile recompute job is not yet built.

scope = “peer_group”Roadmap

Rank within a customer-defined Levers cohort. Cohorts themselves ship today (POST /v1/orgs/:id/cohorts); per-cohort percentile recompute is the planned step. The k-NN peer graph already drives the peer-relative z-score derived signals (see § 06).

Direction
Ascending
0 lowest · 1 highest
England LSOAs
33,755
2021 boundaries
Wales LSOAs
1,917
2011 boundaries
Scotland data zones
6,976
2011 boundaries
05The moat

Monthly snapshots, immutable per period.

Every month, a CLI job appends one row per signal per area to signal_timeseries, keyed by observed_period. INSERT … ON CONFLICT DO NOTHING. Corrections surface as next period’s value, never overwrite the past. Un-backfillable history that compounds every month.

Append-only

History is immutable per (signal_key, geo_type, geo_code, observed_period). The primary key prevents duplication; the conflict policy prevents overwrite.

Idempotent

The append job is one set-based INSERT … SELECT statement. Re-running a period is a no-op. Safe under partial failure and re-deploy.

Granular

Prices and crime ingest write monthly history directly; the static-source job (deprivation) snapshots at refresh time. Each signal’s observed_period reflects its actual cadence.

Prices history
24 months
35,606 E&W LSOAs
Price history rows
626k+
2024-2025 backfill
Deprivation snapshot
85,280 rows
IMD 2025 · WIMD · SIMD
Append cadence
Monthly
GitHub Actions cron
06Derived signals

YoY, momentum, trend slope, peer-relative.

Computed in the database from the time-series, persisted alongside raw signals, immediately queryable through the typed query plane. Each derived signal carries a documented window and a sample-size guard.

signal_key
Methodology
Window
Confidence
property.price_change_pct_yoy
Count-weighted calendar-year YoY. Median weighted by transaction count, latest year vs prior year.
calendar
0.85
crime.total_12m_change_pct_yoy
Rolling 12-month sum vs prior 12-month sum. Like-for-like windows; full-coverage guard.
rolling 12m
0.85
property.transaction_count_change_pct_yoy
Rolling 12-month transaction-count YoY. Surfaces market liquidity shifts independent of price.
rolling 12m
0.85
property.median_price_change_pct_6m
Latest 6-month window vs prior 6-month window. Count-weighted; full-window guard each side.
6m
0.85
crime.total_6m_change_pct
6-month crime momentum. Strict full-window guard on both sides.
6m
0.85
crime.monthly_count_trend_slope_24m
Postgres regr_slope over a synthetic monthly index. More robust than two-point YoY at LSOA grain.
24m (min 18)
0.80
property.transaction_count_trend_slope_24m
24-month linear-regression slope of transaction count. Multiply by 12 for annualized direction.
24m (min 18)
0.80
crime.total_12m_peer_relative_z
Z-score against the area's 20-LSOA peer cohort: (target − peer_avg) / peer_stddev. Min 5 peers.
current
0.80
property.median_price_peer_relative_z
Peer-relative price z-score. Materialized peer graph; same distance metric as POST /v1/peers.
current
0.80
07Scoring

Deterministic composites, four presets.

POST /v1/scoreaggregates the signal catalog into a 0–100 composite. Same input, same engine version, same output. Four presets cover the canonical workflows; custom weights and saved org presets layer on top.

Moving

moving

Origination. For someone choosing where to live.

  • Safety
  • Schools
  • Transport
  • Amenities
  • Cost of Living

Business

business

Site selection. For commercial location decisions.

  • Foot Traffic
  • Competition
  • Transport
  • Spending Power
  • Commercial Costs

Investing

investing

Investment underwrite. For acquisitions and portfolio.

  • Price Growth
  • Rental Yield
  • Regeneration
  • Tenant Demand
  • Risk Factors

Research

research

Reference baseline. The default preset.

  • Safety
  • Transport
  • Amenities
  • Demographics
  • Environment

The frozen v2 engine computes every dimension. Custom weights pass { preset, weights }; saved organisation presets pass { preset_id }. Response carries weights_source ("preset" or "custom") and the engine version that produced the number.

08Derived surfaces

Similarity, anomaly, projection.

Three derived surfaces over the store. Each one is honest about what it is, what it isn’t, and what the defaults assume.

08.1

Peers

POST /v1/peers

k nearest LSOAs to a target. Euclidean distance over normalized values, dimension-mean-squared (AVG, not SUM), default k=20, max k=200, min 3 overlapping signals.

Distance is symmetric and bounded in [0,1]. No per-signal weighting in v1.

08.2

Insights

POST /v1/insights

Anomaly screening: rank LSOAs by |peer-relative z|. Materialized peer graph (~840k assignments). Default k=50, max k=500. Optional |z| threshold.

Peer math is precomputed offline. No request-time recompute.

08.3

Forecast

POST /v1/forecast

Linear regression over signal_timeseries: regr_slope, regr_intercept, regr_r2, regr_syy. Default window 24 months, horizon 12. Constant ±2·residual_stderr confidence band.

This is not a learned model. Not ARIMA, not Holt-Winters, not Prophet. CI does not widen with horizon distance.

09Query plane

AI emits the plan. The database answers.

POST /v1/query is a typed JSON grammar with six plan ops. Programmatic {plan} never touches the LLM. Natural-language {question} routes through a planner that emits the same typed grammar. Every response echoes the executed plan and plan_source so every result is replayable.

  • rank_areasFilter + sort LSOAs across signals with AND semantics (eq, lt, lte, gt, gte, between, percentile_*)
  • get_areaFull signal catalog for an area (geo_code, postcode, or area name)
  • score_areaComposite score for an area; preset, custom weights, or saved preset_id
  • find_peersk-NN similarity search over normalized signal vectors (default k=20)
  • find_insightsRank LSOAs by |peer-relative z| for a derived signal (anomaly screening)
  • find_forecastLinear-regression projection of one monthly signal at one LSOA (default 24m window, 12m horizon)
POST /v1/queryplan_source: nl
“England LSOAs with price ≤ £250k AND YoY > 0 AND crime_pct ≤ 50 AND imd_pct ≥ 50, sort by YoY desc, limit 5”
{ "op": "rank_areas", "params": { "country": "E", "signals": [ { "key": "property.median_price", "filter": { "lte": 250000 } }, { "key": "property.price_change_pct_yoy", "filter": { "gt": 0 } }, { "key": "crime.total_12m", "filter": { "percentile_lte": 50 } }, { "key": "deprivation.imd_decile", "filter": { "percentile_gte": 50 } } ], "sort_by": { "signal": "property.price_change_pct_yoy", "direction": "desc" }, "limit": 5 } }
Planner accuracy
92.9%
14-case curated corpus
Plan ops
6
rank · get · score · peers · insights · forecast
Filter ops
11
eq · lt · lte · gt · gte · between · percentile_*
Model under test
claude-sonnet-4
harness measures the seam, not the model
10Confidence

Per-signal, source-driven, honest.

Every signal value and every score dimension carries confidence(0.0–1.0) and a human-readable confidence_reason. The rubric is fixed; the inputs are sample size, freshness, fallback path, and (for property) YoY volatility.

Band
Value
Criteria
Example
HIGH
1.0
Fresh primary data, sufficient sample, low volatility.
Crime from Police.uk last 12 months; Prices with ≥50 transactions and ≤15% YoY swing.
MEDIUM
0.7
Partial fallback, older dataset, smaller sample, or higher volatility.
WIMD 2019 / SIMD 2020 (older than IMD 2025); Schools in Wales/Scotland; Property with 20-50 transactions or wide YoY swing.
LOW
0.4
Full proxy fallback or sparse sample.
Property with fewer than 20 transactions per period.
NONE
0.2
No usable data; signal returns null with reason.
Service unavailable, coverage gap, or out-of-region postcode.

Inferred-not-measured dimensions (Foot Traffic, Rental Yield, Regeneration, Tenant Demand) cap at MEDIUM by design. Monitor change detection gates on min_transactions (default 8) so a 2-sale move never fires a webhook.

11Reproducibility & versioning

Engine version stamped on every response.

Every response carries engine_version in the body and X-Engine-Version in the headers. Pin a request to a specific version with the request header. Pin a whole organisation through Levers methodology pinning.

Bump
Meaning
MAJOR
Breaking change to dimension structure, intent set, or core weight. Anything that would invalidate prior scores.
MINOR
New dimension, new data source, new intent. Additive — old responses still parse.
PATCH
Formula tuning, threshold adjustment, confidence rubric refinement. Score values stay byte-identical.
Current engine
v2.0.2

OpenStreetMap data-source reliability hardening. Transport scoring no longer degrades to NONE confidence for UK city centres. No formula changes.

released2026-05-14history6 versions

Org-level methodology pinning is owner-only. PUT /v1/orgs/:id/methodologysets the pin; every scoring response from that org’s keys stamps the pinned version in X-Engine-Version. Explicit request headers still win over the org pin. The pinned row stays in the database even after a version ages out of support, for audit; runtime gracefully falls back to latest rather than 500.

12Per-organisation methodology

Four Levers shape how the API behaves for your keys.

Levers are the per-organisation methodology controls. All four are opt-in: no bundle, no preset_id, no pin, no cohort means default behaviour. RBAC, white-label, and IP allowlist live alongside on the operational side.

12.1Admin-only

Signal bundles

POST /v1/orgs/:id/bundles

Named whitelist of signal_keys. Scopes /v1/area, /v1/areas, and /v1/query — the API exposes only the bundle's signals to your keys.

A request for an out-of-bundle signal returns 422 bundle_signal_not_allowed. No silent omission.

12.2Admin-only

Scoring presets

POST /v1/orgs/:id/presets

Save a (base_preset, weights) pair server-side. Call by preset_id from POST /v1/score. Reusable across team members and replayable in audits.

Frozen v2 engine unchanged. weights_source surfaces as "custom" on response.

12.3Owner-only

Methodology pinning

PUT /v1/orgs/:id/methodology

Pin the whole org to a specific engine_version. Every scoring response from the org's keys stamps that version in X-Engine-Version.

Owner-only by design: misclicking a pin has high cost for regulator-facing audits. Explicit request header still wins.

12.4Admin-only

Peer cohorts

POST /v1/orgs/:id/cohorts

Named list of LSOA codes (max 10,000). Scopes the candidate pool on /v1/peers — "find peers in MY universe."

Cohorts ship today; per-cohort percentile recompute (scope=peer_group) is on the roadmap.

Three-tier RBAC (member / admin / owner), per-org white-label (display_name + brand_url), and per-key IP allowlist (allowed_ip_cidrs) are documented in full on /docs/levers (landing in this workstream).

13Scope & limitations

What this is, and what it isn’t.

Said up front to save reviewer time. The system is decision-grade screening + analysis; not valuation, not lending, not address-level today.

Not

An automated valuation model

OneGoodArea does not predict the market value of a specific property. Use a dedicated AVM for that.

Not

A credit decisioning model

Not a predictor of individual default, affordability, or creditworthiness. Tier-3 enrichment input only.

Not

Address-level

LSOA grain is the floor today. Address-level scoring via OS AddressBase Premium + UPRN is on the roadmap (AR-134).

MAUP

Modifiable Areal Unit Problem

Scores within 100m of an LSOA boundary deserve a closer look. Postcode and LSOA boundaries are administrative, not behavioural.

Fair lending

Protected-characteristic correlation

Deprivation indices correlate with protected characteristics. Buyers are responsible for FCA / CONC / SS1/23 compliance in regulated workflows.

Build on the data layer underneath UK property workflows.

A typed signal API, four product surfaces, monthly time-series history, and org-level methodology pinning. Same answer, every time you ask.