LAP
Scale
← Back to Expansion

Expansion Model Documentation

Reference for all ML models, features, scoring logic, and data sources used in the expansion ranking tool.

Demand Forecast Model (Orders/day)

Algorithm
Ridge Regression (v2)
Cross-validation
Leave-One-Out (LOOCV)
Accuracy
~18% MAPE

Predicts steady-state daily orders for a new location. Trained on all live stores with at least 90 days of operation. Uses 4 engineered ratio features derived from Targomo geospatial data (footfall counts, POI composition, hourly footfall patterns). Raw counts are avoided — ratios generalise better across city sizes.

FeatureFormulaWhat it captures
footfall_per_capitadaily_footfall_30m / pop_total_10mThrough-traffic intensity relative to resident population. High values = transit/commercial zones where many people pass but few live. The strongest predictor of order volume.
gastro_share_10mpoi_gastronomy / total_POI_10mShare of all POIs within 10-min walk that are food & drink venues (cafes, restaurants, bars). High gastronomy share signals a “eating-out culture” neighbourhood where coffee demand is elevated. Counter-intuitive: more competition = more demand.
morning_evening_ratiofootfall(8–11am) / footfall(3–5pm)Ratio of morning coffee-hour footfall (8–11am) to afternoon commute footfall (3–5pm). Values > 1 mean more morning than afternoon traffic — ideal for a coffee shop. Business districts and residential routes score high. Pure transit hubs score low (evening commuters dominate).
footfall_concentrationpeak_hour_footfall / daily_footfallShare of daily footfall occurring in the single busiest hour. Captures how “spiky” the foot traffic is. High concentration (e.g. train station at rush hour) can inflate daily totals while actual trading opportunity is compressed into a short window.

Data source: Targomo geospatial API — 30-min walk isochrone for footfall, 10-min walk isochrone for POI & demographics.

Net AOV Model

Algorithm
Ridge Regression
Cross-validation
LOOCV
Accuracy
5.0% MAPE

Predicts steady-state Net AOV (average order value after discounts, before VAT) for a new location. Uses 3 features. More accurate than the demand model because AOV varies less across locations.

FeatureWhat it captures
spending_avg_10mAverage consumer spending in the 10-min catchment. Higher-income areas spend more per order on premium items.
city_median_aovMedian AOV across all live stores in the same city. Captures city-level pricing differences (e.g. Munich vs Berlin).
tourist_shareProportion of visitors classified as tourists. Tourists tend to order larger, higher-margin items (food combos, large formats).

Confidence Tiers

Each prediction is assigned a confidence tier based on the prediction interval width relative to the predicted value. Narrow intervals mean the model is consistent with training data; wide intervals indicate the location is unusual.

TierInterval widthInterpretation
high< 40% of predictedStrong signal, similar to well-understood training locations.
medium40–80% of predictedModerate uncertainty — treat prediction as directional guidance.
low> 80% of predictedHigh uncertainty — location profile is unusual; field validation critical.

Click any confidence badge in the expansion table to see the full SHAP breakdown for that location.

Location Segmentation (v2)

Each location (live stores + leads) is classified into a primary segment and a set of overlay attributes. Segments are mutually exclusive and derived from unsupervised clustering (k-means, k=7) on Targomo features, then labelled by analysts. Attributes are independent overlays computed with rule-based thresholds.

Primary Segments

SegmentCharacteristicsTypical performance
transport hubAdjacent to train/subway/bus. Highest footfall, lowest residential ratio. Commuter extraction.High orders, lower loyalty, AOV driven by commute items.
tourist commercialMuseums, hotels, gift shops. High tourist share, seasonal amplitude, accommodation POIs.High AOV (tourists order more), seasonal volatility.
business districtOffice-heavy. Commuters arrive to work, not live. High spending, dead on weekends.Strong Mon–Fri, weak weekend. High wallet share potential.
neighborhood hubLocal high street. Supermarket, bank, pharmacy. Mixed residents + nearby workers.Stable, repeat customers, high loyalty potential.
residentialQuiet streets, high pop density, low footfall. Requires strong local marketing.Lower peak orders, high wallet share if loyalty is built.
student quarterUniversity-adjacent. Young demographic, nightlife, lower income, high gastronomy share.Good volume, low AOV, price-sensitive.
mixedNo single segment has a dominant signal. Baseline classification.Varies — treat as “unclassified”, investigate on-site.

Overlay Attributes

wallet_share_index
0–1. Strength of local/residential character. Higher values → higher probability of wallet payment adoption.
loyalty_potential
0–1. Likelihood of building a repeat-customer base. Derived from residential density, local services proximity, and daytime worker concentration.
tourist_intensity
none / low / medium / high. Share of visitors classified as tourists by Targomo.
income_tier
low / mid / upper_mid / high. Based on average consumer spending in catchment.
competition_density
low / medium / high. Number of coffee shops within 10-min walk relative to city median.
nightlife_density
low / medium / high. Bars, clubs, late-night venues within catchment.
age_profile
young / balanced / mature. Derived from Targomo demographic layers.
seasonal_pattern
stable / summer_peak / winter_peak. Historical footfall seasonality.
family_area
Boolean. High kindergarten + school + child population density.
daytime_worker_spike
Boolean. Commuter-heavy footfall pattern — footfall drops sharply on weekends.
student_presence
Boolean. University or college within 10-min walk.
retail_density
low / medium / high. Density of retail brand POIs (clothing, accessories, etc.).

Flag Scoring (Traffic / Econ / Lease / RE)

Each location in the expansion table gets four coloured dots computed from rule-based thresholds. Click any dot to see the underlying metrics and explanation.

Traffic

Measures demand signal quality: footfall volume, residential population, and ML-predicted orders.

pred_orders ≥ 200+2 pts
pred_orders ≥ 130+1 pt
footfall_daily_30m ≥ 5,000+2 pts
footfall_daily_30m ≥ 2,000+1 pt
pop_total_10m ≥ 30,000+1 pt
Total ≥ 4 → green, ≥ 2 → yellow, < 2 → red

footfall_daily_30m is the daily visitor count passing through the 30-minute walking catchment (Targomo). Values in the low hundreds are common for most urban leads — the threshold of 5,000 corresponds to major transit zones.

Economics

Compares predicted revenue potential against rent burden.

pred_orders ≥ 200+2 pts
pred_orders ≥ 130+1 pt
cold_rent / (orders × 30 × €4.50) < 15%+2 pts — rent is affordable
ratio < 25%+1 pt — moderate rent burden
No rent data+1 pt (neutral)
Total ≥ 3 → green, ≥ 1 → yellow, 0 → red
Lease Terms

Evaluates upfront cash requirements and negotiating outcome.

cold_rent / sqm ≤ €30+2 pts
cold_rent / sqm ≤ €45+1 pt
key_money = €0+1 pt
rent_free_months ≥ 3+1 pt
No key_money data+1 pt (neutral)
Total ≥ 3 → green, ≥ 1 → yellow, 0 → red
Real Estate

On-site physical quality scored from field survey data entered in the pipeline.

corner_location = Yes+1 pt
outdoor_seating includes Yes+1 pt
interior_brightness = Yes+1 pt
popular_retail_brand_density = High+1 pt
popular_retail_brand_density = Moderate+0.5 pt
competitor_density = Low+1 pt
competitor_density = Moderate+0.5 pt
Total ≥ 3 → green, ≥ 1.5 → yellow, < 1.5 → red

RE flags require field survey data (corner, seating, brightness, retail density) entered in the pipeline tool. Leads without field visits will score low by default — this does not mean the location is poor.

Data Sources

SourceProvidesBQ table
Targomo APIFootfall counts by hour, POI composition, demographic layers (population, income, age, tourist share) — isochrone-based (10-min walk, 30-min walk)Expansion.targomo_pipeline_cleaned_v4
Store operations (BQ)Actual daily orders, steady-state calculation, Net AOV — used for model training and validationIntermediate.int_store_steady_state
Pipeline tool (SQLite)Lead-specific survey data: cold rent, sqm, corner location, outdoor seating, key money, broker fees, retail density assessmentsLocal SQLite DB
Expansion leads model outputScored leads: predicted orders, AOV, confidence tier, SHAP valuesModels.expansion_leads_scored