Methods & evidence

Most synthetic-research tools handle honesty with disclaimers. SpareBrain handles it with mechanisms — enforced in code, measured where possible, and listed here with their receipts. Everything below is reproducible from the seeds and scripts in the repository.

1. Variance comes from dice, not the model

Each cohort member’s conditions — mood, money pressure, expertise, attention, age, place — are sampled by externally seeded randomness across ~48 dimensions before the model is ever called. The model embodies the hand it’s dealt; it never invents its own variance.

Receipt: ablation against a conventional persona-only prompt, same stimulus, same model: 9 of 15 baseline responses opened with the identical sentence (5/15 distinct openers); the substrate cohort produced 15/15 distinct openers and ~3× the length variance (coefficient of variation 0.23 → 0.60).

2. Reproducible cohorts

Randomness enters once, at study creation, as a recorded seed. The same seed re-deals the identical cohort — same substrates, ages, locations — so duplicating a study and changing one variable is a controlled comparison.

Receipt: paired same-seed studies verified byte-identical substrates across a model swap, a template swap, and a demographic change — the basis for every robustness claim below.

3. Anti-smoothing: splits and verbatim dissent, or regenerate

A deterministic post-check rejects any collation that generalises without a numeric split (“9 of 15”) or whose dissenting voice is not a character-exact quote of a real cohort response. Failures trigger targeted regeneration; persistent failure ships with a visible quality warning, never silently.

4. Intent tallies are stripped — even from legitimate studies

The tool refuses purchase-intent studies outright (until calibrated). But the guard runs deeper: a deterministic check inside every collation strips intent tallies that try to sneak into legitimate work.

Receipt:in production use, a collation reported “11 of 15 say they will subscribe”; the guard was built, and the same study re-run reported the reaction and its reasons instead. A later evasion via spelled-out numbers (“one member states they will not subscribe”) was caught and closed the same day.

5. Hypotheses must cite their evidence

Every study ends with 2–4 falsifiable hypotheses about real users. Each must cite the numeric cohort split that motivates it — deterministically checked — and propose one concrete, cheap next test in the real world. Hypotheses are questions raised by synthetic data, never findings.

6. Known noise, measured and disclosed

Same-seed replication testing produced the tool’s own reading rules, shown with every study: strong consensus (12+ of 15) replicates reliably; split magnitudes wobble by ±2 at N=15; counts of spontaneously-mentioned themes are the noisiest number produced and should be read as “this objection exists,” not “this many hold it.”

Receipt: headline findings on the test stimulus replicated across two models, two prompt templates, and within-persona demographic variance — four internal robustness checks, all on identical seeds.

7. Uncalibrated means uncalibrated

Every output is labelled synthetic and exploratory-only until a use case has been validated against real research — trust is earned per use case, not asserted per model. The internal checks above make the tool consistent; only calibration against real users can make it credible. That work is the roadmap’s centre.