Expansion Model Documentation
Reference for all ML models, features, scoring logic, and data sources used in the expansion ranking tool.
Demand Forecast Model (Orders/day)
Predicts steady-state daily orders for a new location. Trained on all live stores with at least 90 days of operation. Uses 4 engineered ratio features derived from Targomo geospatial data (footfall counts, POI composition, hourly footfall patterns). Raw counts are avoided — ratios generalise better across city sizes.
| Feature | Formula | What it captures |
|---|---|---|
| footfall_per_capita | daily_footfall_30m / pop_total_10m | Through-traffic intensity relative to resident population. High values = transit/commercial zones where many people pass but few live. The strongest predictor of order volume. |
| gastro_share_10m | poi_gastronomy / total_POI_10m | Share of all POIs within 10-min walk that are food & drink venues (cafes, restaurants, bars). High gastronomy share signals a “eating-out culture” neighbourhood where coffee demand is elevated. Counter-intuitive: more competition = more demand. |
| morning_evening_ratio | footfall(8–11am) / footfall(3–5pm) | Ratio of morning coffee-hour footfall (8–11am) to afternoon commute footfall (3–5pm). Values > 1 mean more morning than afternoon traffic — ideal for a coffee shop. Business districts and residential routes score high. Pure transit hubs score low (evening commuters dominate). |
| footfall_concentration | peak_hour_footfall / daily_footfall | Share of daily footfall occurring in the single busiest hour. Captures how “spiky” the foot traffic is. High concentration (e.g. train station at rush hour) can inflate daily totals while actual trading opportunity is compressed into a short window. |
Data source: Targomo geospatial API — 30-min walk isochrone for footfall, 10-min walk isochrone for POI & demographics.
Net AOV Model
Predicts steady-state Net AOV (average order value after discounts, before VAT) for a new location. Uses 3 features. More accurate than the demand model because AOV varies less across locations.
| Feature | What it captures |
|---|---|
| spending_avg_10m | Average consumer spending in the 10-min catchment. Higher-income areas spend more per order on premium items. |
| city_median_aov | Median AOV across all live stores in the same city. Captures city-level pricing differences (e.g. Munich vs Berlin). |
| tourist_share | Proportion of visitors classified as tourists. Tourists tend to order larger, higher-margin items (food combos, large formats). |
Confidence Tiers
Each prediction is assigned a confidence tier based on the prediction interval width relative to the predicted value. Narrow intervals mean the model is consistent with training data; wide intervals indicate the location is unusual.
| Tier | Interval width | Interpretation |
|---|---|---|
| high | < 40% of predicted | Strong signal, similar to well-understood training locations. |
| medium | 40–80% of predicted | Moderate uncertainty — treat prediction as directional guidance. |
| low | > 80% of predicted | High uncertainty — location profile is unusual; field validation critical. |
Click any confidence badge in the expansion table to see the full SHAP breakdown for that location.
Location Segmentation (v2)
Each location (live stores + leads) is classified into a primary segment and a set of overlay attributes. Segments are mutually exclusive and derived from unsupervised clustering (k-means, k=7) on Targomo features, then labelled by analysts. Attributes are independent overlays computed with rule-based thresholds.
Primary Segments
| Segment | Characteristics | Typical performance |
|---|---|---|
| transport hub | Adjacent to train/subway/bus. Highest footfall, lowest residential ratio. Commuter extraction. | High orders, lower loyalty, AOV driven by commute items. |
| tourist commercial | Museums, hotels, gift shops. High tourist share, seasonal amplitude, accommodation POIs. | High AOV (tourists order more), seasonal volatility. |
| business district | Office-heavy. Commuters arrive to work, not live. High spending, dead on weekends. | Strong Mon–Fri, weak weekend. High wallet share potential. |
| neighborhood hub | Local high street. Supermarket, bank, pharmacy. Mixed residents + nearby workers. | Stable, repeat customers, high loyalty potential. |
| residential | Quiet streets, high pop density, low footfall. Requires strong local marketing. | Lower peak orders, high wallet share if loyalty is built. |
| student quarter | University-adjacent. Young demographic, nightlife, lower income, high gastronomy share. | Good volume, low AOV, price-sensitive. |
| mixed | No single segment has a dominant signal. Baseline classification. | Varies — treat as “unclassified”, investigate on-site. |
Overlay Attributes
Flag Scoring (Traffic / Econ / Lease / RE)
Each location in the expansion table gets four coloured dots computed from rule-based thresholds. Click any dot to see the underlying metrics and explanation.
Measures demand signal quality: footfall volume, residential population, and ML-predicted orders.
footfall_daily_30m is the daily visitor count passing through the 30-minute walking catchment (Targomo). Values in the low hundreds are common for most urban leads — the threshold of 5,000 corresponds to major transit zones.
Compares predicted revenue potential against rent burden.
Evaluates upfront cash requirements and negotiating outcome.
On-site physical quality scored from field survey data entered in the pipeline.
RE flags require field survey data (corner, seating, brightness, retail density) entered in the pipeline tool. Leads without field visits will score low by default — this does not mean the location is poor.
Data Sources
| Source | Provides | BQ table |
|---|---|---|
| Targomo API | Footfall counts by hour, POI composition, demographic layers (population, income, age, tourist share) — isochrone-based (10-min walk, 30-min walk) | Expansion.targomo_pipeline_cleaned_v4 |
| Store operations (BQ) | Actual daily orders, steady-state calculation, Net AOV — used for model training and validation | Intermediate.int_store_steady_state |
| Pipeline tool (SQLite) | Lead-specific survey data: cold rent, sqm, corner location, outdoor seating, key money, broker fees, retail density assessments | Local SQLite DB |
| Expansion leads model output | Scored leads: predicted orders, AOV, confidence tier, SHAP values | Models.expansion_leads_scored |