Gold-standard choice science, rebuilt for teams that ship every week

The best preference research — hierarchical Bayesian conjoint, adaptive choice-based designs, MaxDiff, Bayesian optimal experimental design — has been locked inside enterprise research suites designed for quarterly board presentations. The math is excellent. The workflow assumes you have a dedicated analyst, a six-week timeline, and a static product.

Subtle Beacon takes the same math and rebuilds it for continuous products. Posteriors update as data arrives. Thompson Sampling shifts traffic toward winning arms in real time. Experiments stop when expected regret is low enough to act — not when an arbitrary calendar date is reached. And because every operation propagates uncertainty through posterior distributions, decisions are calibrated to what the evidence actually supports, not to a p-value threshold that was never designed for product work.

The closed loop nobody else runs

Here is what Subtle Beacon does that no other experimentation system does:

Drift detection triggers the question. Bayesian change-point models and portfolio KPI monitoring catch when a metric shifts — not just that it moved, but when the shift started and how confident the system is that it’s real.
An LLM generates the hypothesis, grounded in BOED. The system produces ranked, testable hypotheses with confidence scores, suggested metrics, and expected information gain — then designs the actual experiment (conjoint, ACBC, MaxDiff, or A/B) using Bayesian optimal experimental design to maximize what you learn per respondent.
The experiment runs itself. Adaptive screening narrows the design space. Thompson Sampling allocates traffic. Decision rules evaluate posteriors after every batch. The experiment stops when the evidence is decision-ready, not when someone remembers to check.
Posteriors flow back into planning. Experiment results update tactical outcome distributions in Patient Cartographer, replacing assumptions with evidence. Anchored Horizon refines forecasts from the same posteriors. The planning layer gets sharper without anyone copying numbers between tools.

Every step is explainable: you can trace any recommendation back to the posterior that produced it, the data that shaped the posterior, and the design that chose what to measure. But no step requires an analyst to run it.

What it actually does

Conjugate A/B testing. Beta-Binomial (conversion) and Gamma-Poisson (count) models with closed-form posterior updates. Sub-millisecond decisions as data arrives. 2–100 arms.
Hierarchical conjoint (HB-MNL). Individual-level part-worths with partial pooling. Interaction terms with horseshoe priors. Sign and monotonicity constraints. GPU-accelerated NUTS sampling via JAX.
ACBC and MaxDiff. Adaptive screening with Beta–Bernoulli viability models, round-robin choice tournaments, and best–worst scaling via BTL decomposition — the same methodologies used in enterprise choice modeling, wired into a continuous pipeline.
Thompson Sampling. Posterior-driven adaptive traffic allocation with safety rails, exponential smoothing, and full allocation history.
Market simulation. Segment-level share breakouts, named scenario comparison, attribute sensitivity curves, and constrained bundle optimization — all under uncertainty.
Bayesian optimal experimental design. Expected information gain computation, Thompson and UCB batch policies, iterative multi-wave workflows that stop when learning plateaus.
Behavioral preference learning. Implicit feedback (clicks, dwell, purchases) mapped to weighted utility targets via Bayesian linear models, aligned to the same conjoint attribute structure. Stated choice and revealed preference in one framework.
Drift detection. Bayesian change-point models, CUSUM control charts, and baseline comparison. Staleness detection triggers re-evaluation or retraining automatically.

How it connects

Subtle Beacon consumes from Latent Spark and feeds into Patient Cartographer.

From Latent Spark, it receives the canonical mechanics vocabulary that defines attributes and levels for conjoint designs, plus embedding-based context for experiment segmentation and prior calibration. Experiments on early-stage or hypothetical games use Latent Spark’s distributional representations to propagate input uncertainty through the preference posterior — so the system knows what it doesn’t know about concepts that barely exist yet.

To Patient Cartographer, it exports experiment posteriors that replace assumed tactic parameters with measured ones. When Patient Cartographer identifies a high-uncertainty parameter, it can trigger a Subtle Beacon experiment directly. When that experiment resolves, the posterior flows back and the plan updates. No manual handoff.

What a studio sees

Patient Cartographer flags D7 retention as an outcome risk after a content drop. Subtle Beacon picks it up: the portfolio KPI watcher confirms the shift is real, not noise, and generates three ranked hypotheses — onboarding friction for new players, reward pacing for the mid-game cohort, session length compression — each with a confidence score and a suggested experiment design. It kicks off an adaptive conjoint study targeting the strongest hypothesis, screens attributes down to the ones that matter, runs a choice tournament, and delivers part-worths by player segment. The build is instrumented, so this never felt like a test to the team — it ran as part of the routine. The result: you know that mid-game reward pacing is the lever, you know which segments care most, and Patient Cartographer now has the evidence attached to the priorities it already flagged. Elapsed time: days, not quarters. Analyst involvement: zero.

Why this is hard

Bayesian experimentation in games is harder than in most domains. Player behavior is non-stationary: a patch, a content drop, or a viral moment can shift the entire distribution mid-experiment. Sample sizes are often small and skewed. Preferences interact, so testing one variable in isolation misses effects that only appear in combination. And closing the loop — from drift detection through hypothesis generation through experiment execution through posterior integration back into planning — requires every component to speak the same probabilistic language end to end. Building a system that handles these realities while remaining fast enough to matter during a sprint is a genuine engineering challenge.

If you’ve worked on adaptive experimentation, hierarchical choice models, Bayesian optimal design, or closed-loop ML systems in production, we’d like to hear from you.

We're hiring