16 Movies, 1 Judge, and a Prediction Market: Introducing the Kleros Foresight
TLDR
Vitalik calls it distilled human judgement: use prediction markets to approximate expensive, trusted evaluations cheaply and at scale. Kleros built a general-purpose interface for exactly this, powered by Seer and Kleros Court on Gnosis Chain.
The first experiment: 16 movies, 1 judge (our CTO Clément), 5 evaluated. Predict how Clément will rate each film. Get it right, you profit. Get it wrong, you lose.
Next up: real estate pricing and futarchy for DAO grant allocation.
From "Vote Values, Bet Beliefs" to Distilled Human Judgement
Most people are familiar with prediction markets, particularly in light of the 2024 US election. The concept is simple: people trade on the outcome of future events, and the market price reflects the crowd's best estimate. If the Republican token trades at $0.61, the crowd thinks there's a 61% chance of a Republican win. The token pays $1 if they win, $0 if they don't.
That's a categorical market: a question with a fixed set of answers. But prediction markets can do more. A scalar market asks "how much?" instead of "which one?" Take inflation: the market sets a range (say 0% to 10%), and you trade UP and DOWN tokens. If inflation ends up at 8%, UP tokens redeem at 0.8 and DOWN at 0.2. Your payout scales with where the answer lands.
In 2000, economist Robin Hanson saw something bigger in this. He proposed futarchy: "Vote values, but bet beliefs." People would vote on what they want (say, higher GDP) but use prediction markets to decide how to get there. The key was a contingent market that combines a decision with a prediction: "If we pick policy A, what will GDP be?" You run one market per option, and the option with the better predicted outcome wins. Prediction markets are actively guiding decisions, rather than just forecasting them. Vitalik Buterin brought this into the Ethereum world in 2014, claiming that DAOs were the ideal testing ground.
There's a trap, though. Goodhart's law: when a measure becomes a target, it ceases to be a good measure. Tie decisions purely to a number like GDP, and people will game it. The fix is to include a trusted human judge who evaluates the result after the fact. It's easier to look at an outcome and say, "This is good," than to write a formula that captures what "good" means in advance.
In November 2024, Vitalik described exactly this in "From prediction markets to info finance," calling it "distilled human judgement." You have a trusted process for making good decisions (an expert appraisal, a jury, a review), but it's slow and expensive. So you set up prediction markets based on what that process would decide if it were called. Most of the time, you never invoke it. But occasionally you do settle the market based on the actual result. Traders who predicted well get paid; those who didn't lose. Over time, the market learns to approximate the expensive process cheaply and at scale. In February 2025, in "AI as the engine, humans as the steering wheel," Vitalik pushed this further: Humans provide a small number of high-quality judgements, while AI and market participants handle the scaling through a competitive open market.
This is what we built.
The Movie Experiment
Our CTO and co-founder Clément Lesaege presented "Prediction markets for distilled human judgement" at EthCC[8] in Cannes, walking through the progression from categorical to scalar to contingent markets and landing on a concrete experiment: using prediction markets to estimate how Clément would rate movies.
To kick things off, he posted on X asking people to suggest movies they think he would or wouldn't like, sharing his Criticker profile so others could study his taste. The replies ranged from sci-fi picks like Judge Dredd (1995), Starship Troopers (1997), and Ex Machina to films like Demolition Man, Relatos Salvajes, and Poor Things, with Thor: The Dark World and Robocop (2014) landing firmly in the "Don't like" suggestions.
I'm gonna run a cool futarchy experiment. Suggest me some movies that you think I would like or not like! Reply with:
— Clément Lesaege (@clesaege) June 23, 2025
Like: [Movie Name]
Don't like: [Movie Name]
For each movie, a scalar market asks:
"If watched, what percentile score will Clément give to the movie?"
We use percentile scores instead of regular ratings because different people use rating scales differently. A percentile of 65 means Clément liked the movie more than 65% of all films he's ever rated, making every prediction directly comparable. Learn more about Clément’s choices by looking at all his previous scores.
Here's the twist: Clément won't watch all 16. Only 5 get evaluated. Three are the movies with the highest market estimates (the market decides which are most worth watching), one is random, and one is Clément's pick. Nobody knows in advance which 5.
Clément watching and rating a movie is the "costly mechanism". The prediction market approximates that judgement across all 16 films, but only 5 get checked against the real thing. Predict well, and you profit. Predict poorly, and you lose. The 11 unevaluated movies redeem at neutral value: no profit, no loss.
This is intentionally low-stakes. It's a beta to test the interface and gather feedback. Movies work because the question is genuinely subjective (you can't game Clément's taste), and anyone who knows the films or Clément's preferences can form an informed opinion.
How It's Built
The experiment runs on Gnosis Chain. Seer provides the prediction market infrastructure. Reality.eth acts as the oracle: when a market closes, anyone can submit an answer by posting a bond, and others can challenge by doubling it. At any point, anyone can request arbitration through Kleros Court, where randomly selected jurors make the final call.
Users deposit sDAI (a yield-bearing stablecoin on Gnosis Chain), which gets split into movie tokens across all 16 films. The interface abstracts away the complexity of UP and DOWN tokens in the background: you simply use a slider to predict higher or lower than the current market estimate. When the market resolves, your profit depends on the difference between your entry price and Clément's actual percentile score.
Beyond Movies
The Kleros Foresight interface is general-purpose. The movie experiment is the first application, but the same architecture works whenever you have many questions and a trusted-but-expensive evaluation process that you can only invoke on a small subset.
Real-World Asset Pricing. Our next experiment applies the framework to real estate. The question becomes, "If this property gets a professional appraisal, what will it be worth?" Traders predict values across a pool of properties, but only a subset gets appraised. Those who price accurately profit. If successful, this approach could unlock significant savings for real-world asset pricing by replacing the need to individually appraise every single property.
Futarchy for DAO Grant Allocation. DAOs need to decide which projects to fund, and today that often becomes a popularity contest. Vitalik flagged this problem for Ethereum public goods funding. Seer is already being used for exactly this: as part of Gitcoin Grants Round 24, the GG24 Dev Tooling and Web3 Infra Round approved $350,000 for allocation via deep funding, with Devansh Mehta, Clément Lesaege, and Allan Niemerg as round operators; Gitcoin Governance. Through the DeepFunding mechanism, model builders predict how valuable each open-source repo would be if evaluated by expert judges, placing trades whose profit and loss depend on accuracy against human evaluations.
With futarchy, the question for each proposal is, "If funded, what will this project's TVL (or users, or revenue) be in one year?" The market aggregates what traders believe about each project's actual prospects. Only funded proposals offer concrete information that stabilises markets.
Are you an AI developer interested in prediction markets and/or impact evaluation? Trade on Deep Funding markets which are allocating 350,000$ in the current gitcoin round.
— Seer (@seer_pm) January 14, 2026
Join telegram group below for more info👇 https://t.co/OXh5H9ybGc
Try It Yourself
The movie experiment is live. If you believe you can anticipate Clément's film preferences more accurately than the majority, substantiate your claim with your sDAI.