AI Debate Engine

Don't ask AI.
Cross-examine it.

Not a chatbot. Not a side-by-side comparison. A deliberation engine where two AI models argue opposing sides, reject weak proposals, and only reach consensus when they've actually earned it — supervised by an independent judge with no allegiance to either side.

40+
Engine Files
12k+
Lines of Logic
4
AI Providers
0
Scripted Outcomes
↓ scroll
The Problem

Every AI tool gives you an answer.
Nobody checks if it's the answer.

Every other AI tool
Ask ChatGPT — get one perspective
Ask Gemini — get a different one
Compare them yourself
Decide who to believe
No adversarial pressure on the answer
No accountability
Debating Bots
Two models argue opposite sides with evidence
Cross-examination exposes weak points
An independent judge challenges unsupported claims
Models vote, revise, and merge until the answer holds up
Live web search gives models real-time data
You get the answer that survived — not the first one generated

Every debate finds its own shape.

The engine doesn't follow a script. It reacts to what the models actually say — escalating when they disagree, resolving when they converge, and guaranteeing you always get a final answer. No two debates take the same path.

01
Quick Consensus
Both models agree after opening arguments. Proposal accepted on first vote. Rare — only when genuinely warranted.
02
Revision Loop
Opponent rejects with specific fixes. Proposer revises. Accepted on second or third attempt. The most common path.
03
Counter-Proposal
After repeated rejections without alternatives, the rejecter is forced to propose. Easy to criticize — now show us yours.
04
Dual Vote Deadlock
Both models propose, both reject each other. Genuine disagreement. Mutual voting until one side gives or the judge steps in.
05
Merge Round
Both proposals pass simultaneously — an agreement collision. The judge merges the strongest parts of each into a unified draft. Both models then vote on the merged version.
06
Judge Synthesis
After exhausting negotiation, the judge reads both positions and writes a ruling neither model would have produced alone.
07
Endgame Collaboration
When budget runs low, adversarial constraints are lifted. Models switch from fighting to collaborating — synthesizing the strongest evidence from both sides into a shared answer.
08
Judge Challenge
Mid-debate, the judge catches an unsupported claim or logical flaw. The challenged model must address it before consensus is possible. Keeps arguments honest.

Inspired by "AI Safety via Debate" (Irving, Christiano & Amodei, 2018), which proposed that two AI agents debating adversarially produce more truthful answers than either could alone.

Adversarial Layers

Three layers of scrutiny.
No model controls all three.

1
Devil's Advocate Positions
Before the debate begins, the engine assigns each model an opposing position to defend. Positions are randomly swapped so neither side gets a structural advantage. Models must argue their assigned position with evidence — no hedging, no "both sides." The constraint is only lifted in the endgame when it's time to collaborate on a final answer.
Pre-debate · Structural · Mandatory
2
Cross-Examination
After opening arguments, each model writes a probing question for the other — then both must answer. The questions target the weakest link in the opponent's reasoning, and the answers become part of the record the judge evaluates.
Post-opening · Adversarial · Mutual
3
Independent Judge
A separate model — from a different provider than either debater — serves as judge. It prechecks every proposal before voting begins, can challenge unsupported claims mid-debate, runs salience checkpoints to track what's agreed versus contested, and delivers the final ruling when models can't reach consensus on their own. The judge is the only entity that can override a deadlock.
Throughout · Independent · Final authority

Six backstops prevent
every failure mode.

The engine is designed for genuine disagreement. Each backstop catches a specific failure and escalates to the next. They fire in order, and each one protects against the previous one being insufficient.

Mandatory review format
No rubber stamps
Models must write a structured review before they're allowed to vote. A bare "I agree" is rejected. Forces real engagement with the proposal.
3 one-sided rejections
Forced counter-proposal
If the same model rejects three times without proposing anything better, the system forces them to write their own solution. It's easy to say no. Now build something.
Both proposals accepted simultaneously
Merge round
When both models' proposals pass at the same time, the judge merges the strongest elements of each into a unified draft. Both models then vote on the merged version — no more two-answer ambiguity.
85% of budget consumed
Endgame collaboration
Adversarial constraints are lifted. Models switch from opposing positions to collaborating on the best answer. The fight is over — now synthesize what you've learned.
5 total rejections
Judge synthesis
After exhausting negotiation, the judge reads both positions and creates a new answer. Neither model claims victory — the judge builds something from the best of both.
95% budget or turn 20
Guaranteed final answer
Absolute ceiling. The judge issues a binding ruling. You always get an answer — never "the models couldn't agree." This is a paid product; you get what you paid for.
Under The Hood

A real state machine,
not a prompt chain.

40+
Engine Files
12k+
Lines of Logic
6
Backstops
4
AI Providers
90+
State Variables
Possible Paths

The debate engine is a numbered-step state machine. Each turn, models respond in parallel via server-sent events with real-time streaming to your browser. The engine tracks rejection counts, convergence scores, budget consumption, vote state, merge rounds, and revision history — reacting dynamically to what the models actually produce.

Provider-agnostic by design. Each debater can be any combination of OpenAI GPT, xAI Grok, Google Gemini, or Anthropic Claude. The engine automatically picks the best model pair based on question complexity — routing simple questions to faster models and complex ones to heavier reasoning tiers. The judge is always from a different provider than either debater. Real-time cost tracking keeps every debate within budget, and models have live web search and code execution so they argue with current data and verifiable calculations.

Debate when it matters.
Ask All when you're exploring.

Not every question needs a debate. Sometimes you just want to see what four different AI providers think. Use the right tool for the question.

Primary
Debate
Two models argue opposing sides under a judge. Structured consensus with voting, revisions, merge rounds, and guaranteed final answer. Upload files and codebases for the models to analyze during debate.
  • Alpha vs Beta + independent Judge
  • Devil's Advocate position assignment
  • Auto model routing (Quick / Standard / Deep)
  • File upload — models browse your code via tool calls
  • Team Huddle — N parallel drafts synthesized into one
  • Credit-back for unused API budget
Casual
Ask All
Send one message to GPT, Claude, Gemini, and Grok simultaneously. See all four responses side by side. Multi-turn — keep the conversation going with full history.
  • All 4 providers in parallel
  • Multi-turn conversation with history
  • Side-by-side response comparison
  • Flat cost per message

Watch it think.

A real debate about consciousness. Two AI models with genuinely irreconcilable philosophical positions, fighting through backstops, judge challenges, and status checks until the judge steps in.

0:00
START "Is consciousness purely emergent from physical brain processes?" — Alpha argues yes, Beta argues no.
0:33
CROSS-EXAM Models probe each other's positions. Alpha challenges the "hard problem." Beta challenges reductive physicalism.
1:40
JUDGE PICK Judge evaluates both proposals. Picks Beta's as stronger starting point. Alpha must review and vote.
1:46
REJECT ×1 Alpha rejects: "Overstates the hard problem as ontological rather than epistemic."
"Fails to grapple with the unsolved combination problem in panpsychist views, which multiplies entities without empirical support."
2:27
REJECT ×2, ×3 Beta revises twice. Alpha reviews each revision in detail. Rejects both — same core disagreement.
3:27
BACKSTOP Three one-sided rejections. Engine forces Alpha to counter-propose. "You keep saying no — show us something better."
3:55
JUDGE CHALLENGE Judge catches Alpha citing a paper that doesn't support its claimed conclusion. Alpha must address the misrepresentation before proceeding.
4:08
STATUS CHECK Salience checkpoint: "Agreed — consciousness correlates with physical processes. Contested — whether subjective experience reduces to physical description."
4:40
DUAL REJECT ×4, ×5 Both models now vote on each other's proposals. Both reject. Twice. Genuine philosophical deadlock.
5:07
JUDGE RULING Five total rejections. Escape valve triggers. Judge reads both positions and synthesizes a ruling.
"Consciousness is very strongly evidenced to be realized by physical brain processes, but it is not yet established that subjective experience is fully reducible to physical description."
5:21
DONE 9 turns. 4 backstops triggered. $0.37 total. An answer neither model would have written alone.
Pricing

Pay per debate.
Not per month.

No subscriptions. Flat pricing per debate. The engine automatically picks the right models based on your question's complexity — you just ask and pay the same price every time.

Debate
$0.69
per debate
Two debaters + judge. Auto-routed models. Works with or without file uploads.
Premium
Team Huddle
$2.99
per debate
Each side runs multiple parallel drafts, then synthesizes them into one stronger answer.
Credit-back: You pay the flat price up front. After the debate, unused API budget is automatically credited back to your balance. If the debate only uses $0.20 in model calls, you keep the difference (minus a small margin). The less the models need to argue, the less you actually pay.
Every debate includes
Load credits via Stripe · $10, $25, or $50 · No expiration

Not one answer.
The tested answer.

AI models are confident. They're articulate. They're often wrong. The only reliable way to find the truth is the same way humans have always done it — put two smart minds in a room and let them argue until what's left is what actually holds up.

Asking one model to double-check itself searches the same training data twice. Different providers means different training — different blind spots, different gaps. What one misses, another was trained on. The debate is what filters the signal from the noise.

Start a Debate
System Architect
Brandon Geisel
Founder & Multi-Model AI Architect

Leading the development of cross-model deliberation systems — orchestrating structured debate between AI models from competing providers (OpenAI, Google, Anthropic, xAI) through a single, unified interface.

South Bend, IN 🇺🇸