Retrieval
Point-in-time news context (GDELT)
We simulate the electorate from real Census data, weight it by who actually votes, anchor it to real exit-poll results, and benchmark it on hundreds of real counted elections — instead of guessing with one model or copying the market price.
Prediction markets are smart — and reflexive. Most “models” secretly anchor to the market price, so their “edge” is circular.
election-oracle forecasts independently, and uses the market only as the baseline to beat.
Each layer is a discipline borrowed from the forecasting literature — stacked so errors cancel instead of compound.
Point-in-time news context (GDELT)
Median of N independent samples
Tetlock discipline: reference class → base rate → adjust
Isotonic recalibration + Brier / ECE, scored out-of-sample
A temporal firewall against lookahead leakage
An LLM proposes structure, a transparent engine does the math
FlockVote — a simulated electorate that votes
We don’t ask one model to guess a state. We simulate the people who decide it — Census-real voters, weighted by who actually turns out, anchored to how each group really voted.
Drawn from real US Census ACS across 9 demographic dimensions, with party and religion conditioned on race and education — so each agent is internally coherent, not a random mix of traits.
Elections are decided by voters, not residents — so we weight by real CPS turnout and anchor each demographic cell to its real exit-poll vote share. The simulation matches how each group actually voted.
We elicit a full distribution of vote intent per cell rather than one point answer — countering the LLM tendency to collapse to a single safe mode and washing out real demographic spread.
The measured accuracy is carried by a transparent engine — Census × exit-poll × turnout × Cook PVI, calibrated — that runs without any LLM. The structure is what scores, so the number is reproducible.
Per-state forecasts roll up through Monte Carlo into P(party controls the Senate) — so national “who controls the chamber” markets can be forecast coherently from the same simulated electorate.
Voters influence their neighbours on a small-world graph — some lead, some follow the crowd — letting late-breaking shifts propagate the way real opinion does.
Backtests only see evidence that existed before each race resolved — no lookahead leakage, so the benchmark number is real, not inflated by hindsight.
The forecast reasons from evidence, never from the price it’s trying to beat — so its edge isn’t a circular echo of the market. The market is the baseline, scored only at the end.
The structural core — real Census demographics × exit-poll voting patterns × CPS turnout weighting × Cook PVI — earns +0.36 Brier skill over the base rate across 546 real U.S. Senate & presidential races (MIT Election Lab), scored leave-one-out, out-of-sample.
The hard part holds up: the near-50/50 Senate races score as well as the easy presidential ones — accuracy where forecasts usually break.
Limitation: we’re tightening this with PVI-off / pre-2016 ablations and a pre-registered forward test.
Built on open source