Scope discussion

Three asks on the table: Data Retention, Edits, Downloadables. Below — what each one means, current state, options on the menu, and what I'd recommend as P0 / P1 / P2 today. Push back wherever you disagree.

For: Landis Current From: Sid Dani Meeting: Wed 2026-05-20 · 10:15 PT · 45 min Companion doc: 01-operating-model.html
Read me first. Each ask below has a "My take" callout with a tentative priority. These are anchors for the conversation, not commitments — the goal of the meeting is for you to walk out with a P0/P1/P2 ranking that matches your roadmap as the dashboard's owner. The scoring table at the bottom is meant to be filled in live.
Ask 1
Data Retention
How long do we keep raw + processed campaign data, and where? Mostly already in place; small policy decisions remain.
Ask 2
Edits
Curation control — the ability to override what shows up in the dashboard. Largest architectural impact. Needs definition first.
Ask 3
Downloadables
Export views to Excel / CSV / PDF so MSCI can pull data into their own tools. Moderate work, well-scoped, achievable.
Ask 1

Data Retention

My take: P2 · already mostly solved

What we keep, where we keep it, and for how long. The good news: most of this is already wired and the policies are sensible. Decisions remain on (a) raw archive duration and (b) whether to set a BigQuery expiry policy.

Current state

LayerRetention todayWhy
Raw API JSON (GCS)30 days, auto-deletedAudit trail for debugging recent runs; quickly outgrows usefulness
fact_campaigns_raw (BQ)Indefinite, day-partitionedLossless record of everything we pulled, ever
fact_campaigns (BQ)Indefinite, day-partitionedTime-series of clean published campaign metrics
audit_log (BQ)IndefiniteOne row per run forever — small, cheap, important for forensics
HTML dashboardLast ~10 deploysCloudflare default; rollback available

Options on the menu

OptionWhat changesEffortCost impact
1a. Keep as-isNothing. Current policy is fine for now.0baseline
1b. Extend GCS to 90d3× more raw JSON kept; lets you re-investigate older anomalies without re-running pipelines.5 min config~$0.50/mo extra at current volume
1c. Add BQ partition expiry (e.g., 2 years)Old daily snapshots auto-expire; query cost shrinks, but historical comparisons get harder.5 min configslightly lower BQ storage cost
1d. Snapshot BQ to GCS monthly for cold storageCheap long-term backup; recoverable but not queryable directly.30 min one-off~$0.10/mo per snapshot
My take. Defaults are sensible. The only question worth deciding today is whether compliance, legal, or contracts dictate a specific retention floor or ceiling. If MSCI's data contract with measurement clients requires deleting campaign data after N months, we need 1c sized to that. Otherwise, leave it.

Recommendation: P2 for the meeting. Confirm there's no compliance constraint we're missing, then revisit in Q3.
Architectural implication: Essentially none — these are configuration changes, not code changes. The pipeline keeps running the same way; only the storage policies shift.

Open questions for Landis

  • Are there compliance / legal / MSCI-contract reasons we need a specific retention policy?
  • Has anyone ever asked to query data older than 6 months? If not, partition expiry is safe.
  • Do you want the v1 March snapshot (still serving at trf-benchmark.pages.dev) preserved permanently as a baseline?
Ask 2

Edits

My take: needs definition first

"Edits to the data and how it is curated." This is the biggest architectural ask of the three — it changes the dashboard from a one-way view of upstream data into a system of record with mutation rights. Before scoping, we need to land what "edit" actually means.

Three things "edit" could mean

FlavorWhat you'd be able to doBlast radius
2a. Annotations Add notes / tags / comments to campaigns — overlay UI. The metrics themselves stay read-only. ("This campaign had a measurement anomaly, see notes.") Small — new table, joined at view time, no upstream contact
2b. Industry / taxonomy overrides Manually classify campaigns that fall to "Uncategorized" (currently 88% of the universe). Override the v1-BQ industry join. Medium — new override table, modify normalize.py to consult it
2c. Data corrections Override actual metric values (reach, frequency, on-target). Change which campaigns are included or excluded from the published set. Large — requires audit trail, "who edited what when why" system, breaks data lineage from msci-mcp
My take. What you described — "edits to the data and how it is curated" — sounds most like 2b + 2c combined. That's the largest scope. Before committing to it, two things are worth getting on the table:

(1) If we go to 2c, the dashboard stops being a transparent view of msci-mcp data and becomes its own source of truth — MSCI consumers need to know that. We'd want an "edited" badge on changed values + a side-by-side "API value vs override value" view, otherwise the same chart on two screens gives different answers.

(2) A lot of what people ask "edits" for can actually be solved by fixing the upstream classification (2b) — and that's a much smaller, safer change. If 80% of the "edit" need is "uncategorized campaigns need to be categorized," let's do 2b and see if that absorbs the demand before going to 2c.

Recommendation: disambiguate live. Then likely P1 for 2b (industry overrides) and P2 with design first for 2c (data corrections — let's not just build it, let's design it).
Architectural implication: 2a is additive (~1 week). 2b is moderate (~2 weeks; touches normalize.py + needs a small admin UI). 2c is its own project (~1–2 months; needs auth, audit log, conflict resolution, MSCI comms). Whichever flavor we pick should be on a separate roadmap, not bundled into the v2 ship.

Open questions for Landis

  • Walk me through a specific case where you wanted to edit something — what would you have changed, and why?
  • Who else on the MSCI team needs edit rights? Just you? Multiple analysts? Engineering can read-only with you as approver?
  • If a campaign's reach number is edited, should the API value remain queryable (audit) or be replaced entirely (cleaner)?
  • Would you trade some 2c scope (full mutation) for faster 2b (taxonomy fixes) shipping in 3 weeks?
Ask 3

Downloadables

My take: P1 · ship soon

Let MSCI consumers pull dashboard data into their own tools — Excel, Tableau, internal reports. The data exists; we just don't surface a download path today. Moderate work, well-scoped, no architectural risk.

Options on the menu

OptionWhat it gives MSCIEffortFormat
3a. CSV download button per view"Download the current filter set as CSV" — same data as on screen, in their spreadsheet~2-3 daysCSV
3b. Excel export with formattingMulti-sheet workbook matching v1 xlsx shape (Master + per-window + per-industry tabs) — what they're used to~1 weekXLSX
3c. Pre-built PDF report (monthly)Auto-generated executive summary PDF emailed monthly with hero charts + commentary~2 weeksPDF
3d. Direct BQ access for power usersMSCI analysts query fact_campaigns directly with their own tools (Looker, Data Studio, Sheets connector)~1 day (IAM only)BQ-native
My take. The honest framing: MSCI lived in the v1 xlsx world for months — they will instinctively want 3b. But 3a + 3d together is faster to ship, more flexible for power users, and doesn't lock us into matching the xlsx shape forever.

Recommendation: P1 = 3a + 3d (CSV button + BQ access for analysts). P2 = 3b (Excel export later, only if 3a doesn't satisfy demand). P3 / strike = 3c (PDF report — overkill until someone specifically asks).
Architectural implication: Zero risk. The pipeline doesn't change. CSV download is a small frontend addition; BQ access is an IAM policy. Both can ship without touching the daily ingestion job.

Open questions for Landis

  • Who specifically asked for downloads? Single-analyst use case, or whole-team workflow?
  • Do they use Sheets / Looker / Tableau today? That answers 3d feasibility.
  • For the v1 xlsx users: do they edit it after download, or just consume? (If edit → 3b is necessary; if consume → 3a is enough.)
  • Should downloads be gated (audit log who downloaded what when) or fully open within @samba.com?

Live scoring — fill this in together

Walk through each ask, decide priority, capture owner + rough timing. Anything that doesn't get a P0/P1 commit ships as P2 by default.

Ask Priority Owner Target window
1. Data Retention
Pick 1a/1b/1c/1d or combo
P0P1P2
2a. Edits — annotations
P0P1P2
2b. Edits — industry overrides
P0P1P2
2c. Edits — data corrections
P0P1P2
3a. Downloads — CSV per view
P0P1P2
3b. Downloads — Excel multi-sheet
P0P1P2
3c. Downloads — monthly PDF report
P0P1P2
3d. Downloads — direct BQ access
P0P1P2