TRF Benchmark Dashboard
A daily-refreshed view of Total Reach & Frequency campaign benchmarks for the MSCI team. This is the operating model you're inheriting as the dashboard's owner.
1. What this is
The TRF Benchmark Dashboard answers questions like "how does this campaign's reach compare to other Auto / 30-day / Medium-sized campaigns?" for the MSCI measurement constituency.
Two versions exist in the world right now:
| Version | URL | Data freshness | Notes |
|---|---|---|---|
| v1.5 (March snapshot) | trf-benchmark.pages.dev | 2026-03-31, frozen | Static. Honest header text. What MSCI sees today. |
| v2 (dev pipeline) | trf-benchmark-dev.pages.dev | 2026-05-07, 19-campaign sample | Live pipeline. Daily refresh built but first full-universe run blocked (see §8). |
2. Data lifecycle
The whole loop runs once per day at 03:00 UTC, fully unattended. You don't trigger it manually under normal conditions.
Each stage has its own guardrails. If anything fails, the dashboard stays at its previous state and the failure is logged in the audit trail (see §6).
3. Data sources — what's an oracle, what's a job
| Thing | What it is | Where it lives |
|---|---|---|
| Oracle | A single TRF campaign (e.g. "Choice Hotels Comp Analysis") | Currency API — msci-mcp side |
| Job | One execution of a campaign's analytics. Each oracle has many jobs over time; we use the latest COMPLETE one. | Currency API |
| Metrics tables | 16 per-job tables: overall, daily, hml, bucket_freq, on_target, exp_overlap, universe, plus 9 platform_by_* cuts (state, dma, gender, age, property, device, creative, tactic, lineitem) | Pulled by the daily job |
| Industry | A campaign's industry (Auto, Retail, Pharma…). Hand-classified — no live source. Currently covers ~12% of campaigns; the rest fall to "Uncategorized." | Joined from v1 BQ historical table at clean time |
lastJobStatus=COMPLETE — 8× larger than the ~857 in the v1 xlsx-derived dataset. The xlsx was a hand-curated subset; the live API exposes the full universe.
4. Where everything is stored
| Location | Purpose | Retention |
|---|---|---|
gs://ai-workflows-trf-audit-dev/pulls/YYYY-MM-DD/ | Raw API responses, one JSON file per oracle per day | 30 days, auto-deleted |
ai-workflows-459123.trf_benchmarks_dev.fact_campaigns_raw | Everything that came in, before quality filtering | Indefinite, day-partitioned |
ai-workflows-459123.trf_benchmarks_dev.fact_campaigns | The clean published set after 7 quality rules | Indefinite, day-partitioned |
ai-workflows-459123.trf_benchmarks_dev.audit_log | One row per daily run — when it ran, how many in/out, did validation pass | Indefinite |
Cloudflare Pages trf-benchmark-dev | The actual HTML dashboard | Last few deploys kept; rollback in CF dashboard |
5. What happens daily, automatically
| Time (UTC) | What runs | Who triggers it |
|---|---|---|
| 03:00 | Cloud Scheduler fires trf-ingestion-daily | automatic |
| 03:00–04:30 | Cloud Run pulls ~6,874 campaigns, archives raw JSON to GCS, normalizes, writes clean BQ rows, regenerates HTML, deploys to Cloudflare Pages | automatic |
| ~04:30 | One audit_log row written: status=success or status=failed with reason | automatic |
| 30 days later | GCS lifecycle deletes that day's pulls/ folder | automatic |
If the job fails, Cloud Run retries once automatically. If both attempts fail, the dashboard stays at its previous state and the audit_log records the failure for triage.
6. How you know it's healthy
Three checks, in increasing depth. The first one is sufficient 99% of the time.
Check 1 — Dashboard renders fresh data (10 seconds)
Visit trf-benchmark-dev.pages.dev and look at the header. It should say something like:
"N campaigns · Refreshed YYYY-MM-DD · v2 daily ingestion"
The date should be today or yesterday. If it's more than 2 days old, something stopped.
Check 2 — The audit log shows success (30 seconds)
In the Google Cloud console → BigQuery, run:
You want to see the top row with status="success", validation_passed=true, failure_reason=null, and started_at being this morning.
Check 3 — The job execution is green (1 minute)
Google Cloud console → Cloud Run → Jobs → trf-ingestion. The Executions tab shows every run. The most recent one should be green ✅. Click in to see logs if red ❌.
7. The 7 quality rules + 5-check validation
Not every campaign that comes back from the API makes it to the dashboard. Two layers of guardrails:
Quality rules (per-campaign filtering)
- is_no_data — API returned
{"message": "Not enough data to generate this report"}→ exclude from published set - is_test — campaign name matches test heuristics ("test", "demo", internal-test patterns) → exclude
- is_unclassified — industry is null + name doesn't auto-resolve → flag
- is_pending — always false (no live source for pending state)
- industry_canonicalization — map raw industry strings to canonical taxonomy ("Pharma" → "Pharma & Healthcare")
- has_digital — non-zero digital reach metric
- with_industry — has a non-null canonical industry
5-check validation gate (per-run sanity)
- gcs_archives_exist — every published campaign has a raw JSON in GCS
- dashboard_matches_bq — rendered HTML campaign count matches
fact_campaignsrow count - no_duplicate_oracle_job_in_raw — no duplicate (oracle_id, job_id) pairs in
fact_campaigns_raw - no_null_last_run_in_fact_campaigns — every published campaign has a real last-run timestamp
- no_oracle_drift_vs_api — the set of oracles in BQ matches the set the API said it had today
If any check fails, the deploy is aborted and the failure is logged in audit_log.failure_reason. The dashboard stays at the previous version.
8. Things that can go wrong
| Symptom | What it means | First-aid |
|---|---|---|
| Dashboard date is >2 days old | Cron didn't run successfully | Check audit_log → Cloud Run Executions → root-cause from logs |
audit_log.failure_reason mentions "Auth0 token request failed" | Currency API credentials rotated | Ping Sid to refresh AUTH0_* secrets |
audit_log.failure_reason mentions "validation check X failed" | Data quality changed upstream | Capture run_id, ping Sid — needs investigation |
audit_log.failure_reason mentions "cloudflare deploy failed: 401" | CF API token rotated/revoked | Ping Sid to regenerate cloudflare-api-token secret |
| Container terminated on signal 9 | Out-of-memory kill | This is the current open issue — see callout below |
9. What you own going forward
| Operation | Who | How often |
|---|---|---|
| Verify daily run succeeded | you (any team member) | Once weekly; ad-hoc when MSCI flags stale data |
| Decide what changes to v2 architecture | you | As needed |
| Approve feature additions (Edits, Downloadables, etc.) | you | As needed |
| Triage failed runs with engineering | you + Sid initially, then your engineering partner long-term | When audit_log shows failed |
| Rotate secrets (Auth0, Cloudflare) | Sid for now; engineering team long-term | Yearly or when rotated |
| Add new industry to canonicalization map | Engineering (small PR to normalize.py) | When MSCI taxonomy adds a category |
| Stakeholder comms / MSCI questions | you | As they arise |
10. Escalation
- Needs code change / GCP config / secret rotation
- Sid Dani — sid.dani@samba.tv
- Data correctness ("the metric is wrong")
- Sid first, who'll route to MSCI (Alyson Sprague / Ankitha Ananda) or Currency API team if upstream
- Dashboard access / SSO
- Sid (Cloudflare Access policy pending — ATF-458)
- Architecture decisions about the dashboard itself
- you — that's the handoff