Education · Methodology

How This Works — Data, Metrics, Limits

A single reference for anyone who wants to understand how we calculate the numbers on every Education page. Written for Treasury policy advisors, UCM curriculum planners, and anyone asking 'where does that figure come from?'

Data Sources

All analysis derives from publicly-sourceable datasets plus live Isle of Man vacancy data. No opinion polls, no hand-weighted scores, no proprietary magic. Every figure should be traceable to a specific row in one of these tables:

UCM Curriculum Master File (CMF) — 1,891 raw unit rows reduced to 694 genuine parent programmes after suppressing 86 module-code pseudo-courses and 36 APEL (Accreditation of Prior Experiential Learning) placeholders.
Isle of Man 2021 Census — occupational headcounts at SOC 2020 four-digit level. Not refreshed until the next census; readers should treat distributions as broadly correct but not current-year-live.
Live IoM job listings — scraped from major employer sites nightly, classified to SOC 2020 via the O*NET crosswalk. The activeVacancies on every page is "live right now", not a YTD total.
Anthropic Economic Index (AEI) — task-level automation and augmentation shares per SOC. Released Feb 2025. Task-level, not role-level — a task being "automation-exposed" doesn't mean the whole job vanishes.
Frey-Osborne (2013) computerisation probability — the classical "will this occupation be displaced" score. Older than AEI but still widely cited. Often disagrees with AEI at the task level (e.g. teaching).
ONS Annual Survey of Hours and Earnings (ASHE), Table 14 — UK national median annual pay by SOC 2020. Low-reliability rows (CV ≥ 20%) filtered out.
Live IoM salary medians — computed from active advertised salaries when we have ≥3 live postings with salary data for a SOC.
Azure OpenAI (gpt-4.1) — used to draft narrative paragraphs (Career Outlook per SOC, weekly Insight, Missing Courses gap analysis, Workforce Summary). Prompts force the model to cite specific numbers from the above datasets so outlooks are grounded, not fabricated. Full prompts are in packages/scraper/src/pipelines/*.ts.

Metric Definitions

AI Resilience Score (0-100)

Composite indicator of how AI-proof a specific course's career outcomes are. Formula: (1 - avgAutomation) × 40 + avgAugmentation × 30 + (1 - avgFO/100) × 30. Higher = safer.

AI Risk Score (0-100)

The inverse lens, used for quadrant placement on Demand vs AI Risk and Skills Shortages. Formula: classic × 0.55 + automation × 0.25 + (1 - augmentation) × 0.20, where classic = Frey-Osborne computerisation probability if available, else AEI automation share. Rebalanced April 2026 so roles with low FO don't get flagged high-risk purely on AEI task exposure (primary teachers had this problem).

Training Lag

Minimum time to qualify a brand-new worker for a SOC, in months. For each SOC we pick the shortest UCM course that (a) is an entry-level pathway category (Apprenticeship, FE, HE, Adult Learning — not short CPD or school links), (b) has duration ≥ 3 months, and (c) clears the SOC's minimum professional entry level (SOC major group 2 professionals need Level 5+, 5xxx trades need Level 3, etc). If no course meets all three, the SOC has no UCM entry pathway and the lag is reported as null, not fabricated.

Supply : Demand Ratio (capacity page)

UCM annual graduate supply ÷ (census workforce × 4% churn + 0.5 × live vacancies). Capped at 40 grads/yr per SOC — no single IoM niche occupation realistically produces more. Trust undersupply more than oversupply (keyword matcher spread inflates the latter).

Demand Score (0-100)

Log-scaled: 0.75 × vacancies + 0.25 × census, both normalised against the busiest SOC in the catalogue. Blunt by design — a bricklayer with 8 live openings beats a civil engineer with 90 census workers but no current vacancies.

Pressure Score (workforce summary)

Used to pick the top stories for the AI-drafted strategic summary. demand × 40 + lagMonths/36 × 30 + (highRisk ? 30 : 0), then × 1.5 if high-risk. Higher = more policy-urgent.

Pipeline Supply (capacity page)

For each course: cohort = totalPlaces ÷ unitCount (CMF sums places across every unit, so a 7-unit course with 20 students shows as 140 raw "places"). Then annual supply ≈ cohort (most courses run yearly intakes). Cohort capped at 50 as a defensive ceiling.

AI-Adjusted Vacancies (workforce → skills gap)

When the "Adjust for AI" toggle is on, each SOC's vacancies are multiplied by 1 − automation × 0.7 before summing into the field gap. A field with 50% AEI automation loses 35% of its displayed demand. Rough 5-year forward view, not a forecast.

Page Glossary — Which One For Which Question

Workforce Report

Printable single-page briefing pulling every signal together. Start here for a Minister or committee meeting.

Demand vs AI Risk

Strategic 2×2 placing every occupation by live demand × composite AI risk. For 'where should we pay attention?'

Skills Shortages

Prioritised policy action list — capability voids, long training lags, tight supply. Sorted by severity.

Missing Courses

AI gap analysis — concrete new UCM courses that would fill demand gaps. Drafted by Azure OpenAI.

Over/Under-Training

Is UCM producing too many or too few graduates? Supply:demand ratio per occupation, with honest caveats.

Skills & Workforce

Field-level supply vs demand bars, with the glowing ⚡ 'Adjust for AI' toggle that recomputes demand.

AI-Era Skills

Every course classified as Explicitly AI-Native / AI-Augmented / Distinctly Human / AI-Vulnerable.

Courses Quadrant

Bubble chart — every course on automation × augmentation, sized by linked vacancies. Click a bubble to open.

AI Risk & Readiness

Per-field resilience scorecard, future-proof career rankings, What-If AI scenario simulator.

Career Pathways

Browse by field. Shows what UCM teaches, matched SOCs, salary ranges, AI resilience per field.

Education ROI

Will this course pay for itself? Course fee vs target salary, breakeven period, 5-year net return.

Full Stats & Directory

Dense stats: counts by category, qualification level, location, mode. Plus every course, filterable.

Known Limitations (Read Before Quoting)

4% churn is a UK average. Healthcare and hospitality are higher, long-tenure professions are lower. The capacity page uses this across every SOC — be sceptical of any ratio you're about to quote in isolation.
Graduates don't all stay or enter the matched occupation. Some emigrate, some career-pivot, some choose not to work. The analysis can't see beyond the award date.
Census is 2021. Four-year-old headcounts. Directionally correct for most SOCs but can't capture post-pandemic shifts.
The SOC keyword matcher is broad by design. Trust undersupply over oversupply — a keyword-rich umbrella rule will over-credit courses to popular SOCs. Rewritten April 2026 with specific-before-general ordering, but imperfect.
AI outlook paragraphs are drafted by a language model. Prompts force citation of data numbers but the model can still produce awkward phrasing or stretch an analogy. Every outlook shows the SOC it was generated for — if it doesn't match, the cache is stale (run education-career-ai --force).
Not every SOC is scored. A handful of SOCs (Advertising/PR at time of writing, SOC 2473) have no Anthropic Economic Index data; they're placed in a separate "Data Unavailable" quadrant rather than being falsely classified.
Salaries are mixed provenance. Three tiers: live IoM-advertised (best) → UK ONS ASHE (good, clearly labelled) → US BLS via Anthropic EI (weakest, clearly labelled). Chip on each Target Salary tells you which.
Training lag is an optimistic minimum. Real-world completion rates are under 100%. A "2-year training lag" is the fastest possible response, not the expected response.

How To Cite

In a policy document or report:

"Smart Island Education Analysis, Manx Technology Group (smartisland.im/education), accessed [date]."

For reproducibility, note the data sources cited in Section 1 rather than the Smart Island page alone. If you're contesting a specific number, the pipeline code is open — see packages/scraper/src/pipelines/*.ts on GitHub.