Scope discussion
Three asks on the table: Data Retention, Edits, Downloadables. Below — what each one means, current state, options on the menu, and what I'd recommend as P0 / P1 / P2 today. Push back wherever you disagree.
For: Landis Current
From: Sid Dani
Meeting: Wed 2026-05-20 · 10:15 PT · 45 min
Companion doc: 01-operating-model.html
Read me first. Each ask below has a "My take" callout with a tentative priority. These are anchors for the conversation, not commitments — the goal of the meeting is for you to walk out with a P0/P1/P2 ranking that matches your roadmap as the dashboard's owner. The scoring table at the bottom is meant to be filled in live.
Ask 1
Data Retention
How long do we keep raw + processed campaign data, and where? Mostly already in place; small policy decisions remain.
Ask 2
Edits
Curation control — the ability to override what shows up in the dashboard. Largest architectural impact. Needs definition first.
Ask 3
Downloadables
Export views to Excel / CSV / PDF so MSCI can pull data into their own tools. Moderate work, well-scoped, achievable.
What we keep, where we keep it, and for how long. The good news: most of this is already wired and the policies are sensible. Decisions remain on (a) raw archive duration and (b) whether to set a BigQuery expiry policy.
Current state
| Layer | Retention today | Why |
| Raw API JSON (GCS) | 30 days, auto-deleted | Audit trail for debugging recent runs; quickly outgrows usefulness |
fact_campaigns_raw (BQ) | Indefinite, day-partitioned | Lossless record of everything we pulled, ever |
fact_campaigns (BQ) | Indefinite, day-partitioned | Time-series of clean published campaign metrics |
audit_log (BQ) | Indefinite | One row per run forever — small, cheap, important for forensics |
| HTML dashboard | Last ~10 deploys | Cloudflare default; rollback available |
Options on the menu
| Option | What changes | Effort | Cost impact |
| 1a. Keep as-is | Nothing. Current policy is fine for now. | 0 | baseline |
| 1b. Extend GCS to 90d | 3× more raw JSON kept; lets you re-investigate older anomalies without re-running pipelines. | 5 min config | ~$0.50/mo extra at current volume |
| 1c. Add BQ partition expiry (e.g., 2 years) | Old daily snapshots auto-expire; query cost shrinks, but historical comparisons get harder. | 5 min config | slightly lower BQ storage cost |
| 1d. Snapshot BQ to GCS monthly for cold storage | Cheap long-term backup; recoverable but not queryable directly. | 30 min one-off | ~$0.10/mo per snapshot |
My take. Defaults are sensible. The only question worth deciding today is whether compliance, legal, or contracts dictate a specific retention floor or ceiling. If MSCI's data contract with measurement clients requires deleting campaign data after N months, we need 1c sized to that. Otherwise, leave it.
Recommendation: P2 for the meeting. Confirm there's no compliance constraint we're missing, then revisit in Q3.
Architectural implication: Essentially none — these are configuration changes, not code changes. The pipeline keeps running the same way; only the storage policies shift.
Open questions for Landis
- Are there compliance / legal / MSCI-contract reasons we need a specific retention policy?
- Has anyone ever asked to query data older than 6 months? If not, partition expiry is safe.
- Do you want the v1 March snapshot (still serving at
trf-benchmark.pages.dev) preserved permanently as a baseline?
"Edits to the data and how it is curated." This is the biggest architectural ask of the three — it changes the dashboard from a one-way view of upstream data into a system of record with mutation rights. Before scoping, we need to land what "edit" actually means.
Three things "edit" could mean
| Flavor | What you'd be able to do | Blast radius |
| 2a. Annotations |
Add notes / tags / comments to campaigns — overlay UI. The metrics themselves stay read-only. ("This campaign had a measurement anomaly, see notes.") |
Small — new table, joined at view time, no upstream contact |
| 2b. Industry / taxonomy overrides |
Manually classify campaigns that fall to "Uncategorized" (currently 88% of the universe). Override the v1-BQ industry join. |
Medium — new override table, modify normalize.py to consult it |
| 2c. Data corrections |
Override actual metric values (reach, frequency, on-target). Change which campaigns are included or excluded from the published set. |
Large — requires audit trail, "who edited what when why" system, breaks data lineage from msci-mcp |
My take. What you described — "edits to the data and how it is curated" — sounds most like 2b + 2c combined. That's the largest scope. Before committing to it, two things are worth getting on the table:
(1) If we go to 2c, the dashboard stops being a transparent view of msci-mcp data and becomes its own source of truth — MSCI consumers need to know that. We'd want an "edited" badge on changed values + a side-by-side "API value vs override value" view, otherwise the same chart on two screens gives different answers.
(2) A lot of what people ask "edits" for can actually be solved by fixing the upstream classification (2b) — and that's a much smaller, safer change. If 80% of the "edit" need is "uncategorized campaigns need to be categorized," let's do 2b and see if that absorbs the demand before going to 2c.
Recommendation: disambiguate live. Then likely P1 for 2b (industry overrides) and P2 with design first for 2c (data corrections — let's not just build it, let's design it).
Architectural implication: 2a is additive (~1 week). 2b is moderate (~2 weeks; touches normalize.py + needs a small admin UI). 2c is its own project (~1–2 months; needs auth, audit log, conflict resolution, MSCI comms). Whichever flavor we pick should be on a separate roadmap, not bundled into the v2 ship.
Open questions for Landis
- Walk me through a specific case where you wanted to edit something — what would you have changed, and why?
- Who else on the MSCI team needs edit rights? Just you? Multiple analysts? Engineering can read-only with you as approver?
- If a campaign's reach number is edited, should the API value remain queryable (audit) or be replaced entirely (cleaner)?
- Would you trade some 2c scope (full mutation) for faster 2b (taxonomy fixes) shipping in 3 weeks?
Let MSCI consumers pull dashboard data into their own tools — Excel, Tableau, internal reports. The data exists; we just don't surface a download path today. Moderate work, well-scoped, no architectural risk.
Options on the menu
| Option | What it gives MSCI | Effort | Format |
| 3a. CSV download button per view | "Download the current filter set as CSV" — same data as on screen, in their spreadsheet | ~2-3 days | CSV |
| 3b. Excel export with formatting | Multi-sheet workbook matching v1 xlsx shape (Master + per-window + per-industry tabs) — what they're used to | ~1 week | XLSX |
| 3c. Pre-built PDF report (monthly) | Auto-generated executive summary PDF emailed monthly with hero charts + commentary | ~2 weeks | PDF |
| 3d. Direct BQ access for power users | MSCI analysts query fact_campaigns directly with their own tools (Looker, Data Studio, Sheets connector) | ~1 day (IAM only) | BQ-native |
My take. The honest framing: MSCI lived in the v1 xlsx world for months — they will instinctively want 3b. But 3a + 3d together is faster to ship, more flexible for power users, and doesn't lock us into matching the xlsx shape forever.
Recommendation: P1 = 3a + 3d (CSV button + BQ access for analysts). P2 = 3b (Excel export later, only if 3a doesn't satisfy demand). P3 / strike = 3c (PDF report — overkill until someone specifically asks).
Architectural implication: Zero risk. The pipeline doesn't change. CSV download is a small frontend addition; BQ access is an IAM policy. Both can ship without touching the daily ingestion job.
Open questions for Landis
- Who specifically asked for downloads? Single-analyst use case, or whole-team workflow?
- Do they use Sheets / Looker / Tableau today? That answers 3d feasibility.
- For the v1 xlsx users: do they edit it after download, or just consume? (If edit → 3b is necessary; if consume → 3a is enough.)
- Should downloads be gated (audit log who downloaded what when) or fully open within @samba.com?
Live scoring — fill this in together
Walk through each ask, decide priority, capture owner + rough timing. Anything that doesn't get a P0/P1 commit ships as P2 by default.
| Ask |
Priority |
Owner |
Target window |
1. Data Retention Pick 1a/1b/1c/1d or combo |
P0P1P2— |
|
|
| 2a. Edits — annotations |
P0P1P2— |
|
|
| 2b. Edits — industry overrides |
P0P1P2— |
|
|
| 2c. Edits — data corrections |
P0P1P2— |
|
|
| 3a. Downloads — CSV per view |
P0P1P2— |
|
|
| 3b. Downloads — Excel multi-sheet |
P0P1P2— |
|
|
| 3c. Downloads — monthly PDF report |
P0P1P2— |
|
|
| 3d. Downloads — direct BQ access |
P0P1P2— |
|
|