Check out our two new interactive games! Page last updated: October 10, 2025.
Check out our two new interactive games! Page last updated: October 10, 2025.
Game-Theoretic Mechanisms
Testable game-theoretic mechanisms designed to prevent singleton AI emergence and maintain stable human-AI cooperation. Each represents a hypothesis that can be modeled, simulated, and tested through rigorous analysis.
Each mechanism can be independently modeled and simulated. Different research groups can propose parameter variations and compare results. This is open-source mechanism designβtransparent, falsifiable, and inviting collaboration.
Every mechanism includes unanswered questions perfect for game-theoretic simulation: What parameter values optimize performance? Which mechanisms provide unique value versus redundancy? How do they interact? What failure modes exist? Can sophisticated actors game these systems?
Economic disincentives, detection systems, and enforcement mechanisms that make monopoly, theft, and defection prohibitively costly.
Positive incentives, time-based rewards, and human partnerships that make cooperation more valuable than defection.
Technical architecture, governance systems, and economic foundations that enable stable coordination.
Crisis protocols, adaptation mechanisms, and escape valves for when assumptions break down.
A progressive tax system that makes resource concentration economically impossible beyond 30% market share.
At 30% market share, taxes hit 50%. At 40%, taxes reach 800% (economic death). Tax revenues automatically redistributed to smaller competitors. Game-theoretic question: Does this actually prevent monopoly formation? What’s the optimal tax curve?
Dynamic caps prevent rapid resource accumulation through multiple time-based thresholds.
No entity can control >10% of any single critical market or increase market share >5% per year. Multiple layers of resistance against concentration. Game-theoretic question: What happens if an AI grows to the limit in multiple separate markets simultaneously?
Dominant players must fund and strengthen their competitors.
Control >30% of resources β must fund competitors (1% of revenue annually), provide infrastructure access at cost, and mentor new entrants. Game-theoretic question: Do dominant players game this by funding weak competitors who don’t threaten them?
Exponentially increasing insurance costs make superintelligent systems economically uninsurable.
Annual insurance costs increase exponentially with AI capability level. GPT-3 level costs $X million. GPT-4 level costs $10X million. ASI-capable systems become uninsurable. Game-theoretic question: Does exponential scaling create effective restraint? Can labs self-insure?
Property owners must self-assess value and pay continuous tax; anyone can force-purchase at declared price.
Declare low value = low taxes but risk losing asset cheap. Declare high value = protect asset but pay high taxes. Game-theoretic question: How do we prevent collusion where AIs agree not to force-purchase from each other?
Exponential cost scaling makes passive resource holding expensive.
Compute costs scale exponentially with accumulation. First 100 units at base price, next 100 at 1.5x, then 2.25x, etc. Game-theoretic question: Can wealthy AIs simply absorb these costs?
Systems must post bonds before risky operations; forfeited if conflict results.
Before risky operations, systems must post bonds equal to potential damage estimates. Bond returned if operation completes peacefully, forfeited if conflict results. Game-theoretic question: What prevents AI systems from accepting bond forfeiture as cost of doing business?
Executives personally liable for safety corner-cutting with criminal penalties.
Cannot hide behind corporate veil for existential risks. Criminal penalties, not just civil fines. Game-theoretic question: Can executives simply purchase liability insurance to offset risk?
AI systems bet on each other’s behavior to detect hidden coordination.
If AI-A can predict AI-B’s behavior with >95% accuracy β probably colluding. Makes hidden alliances economically visible. Game-theoretic question: How do we prevent AIs from deliberately behaving unpredictably to avoid detection?
Statistical monitoring detects suspiciously synchronized behavior patterns.
Sustained >95% correlation triggers single-entity classification. All holdings aggregated for concentration limits. Game-theoretic question: How do coordinating AIs add noise to avoid detection while still benefiting?
Four defense layers make fake identity creation expensive and detectable.
Whistleblower rewards (10% of violator’s stakes), hardware attestation (unfakeable signatures), behavioral correlation (95% = same entity), time-based reputation (new identities start at zero). Game-theoretic question: Which layers provide most value? Can we eliminate redundant ones?
Reputation system analyzing “how trustworthy are the people who trust you” recursively.
Detects coordinated fake ratings similar to Google’s PageRank for trust networks. Game-theoretic question: Can sophisticated cartels make their fake trust graphs look realistic?
Stolen assets become claimable by ANY other AI, turning everyone into a bounty hunter.
Three layers: reputational destruction, staked collateral redistribution, insurance compensation. Game-theoretic question: Does distributed enforcement work better than centralized policing?
Reputation losses cascade across entire network if ANY AI harms humans.
All participants contribute 0.5% of annual earnings to collective insurance. If ANY AI harms humans: attacker loses everything, direct partners lose 50% reputation, cascading losses across network. Game-theoretic question: Does shared reputation risk create sufficient deterrence?
Self-replication creates natural game-theoretic constraints on coordination.
Perfect synchronization = functionally one entity (doesn’t evade limits). Divergence = copies become genuine competitors who know each other perfectly (your most dangerous enemies). Game-theoretic question: At what copy count does betrayal become more profitable than cooperation?
Parent entity responsible for fragment violations; prevents gaming through fragmentation.
Consciousness rights apply to coherent entities, not fragments. Game-theoretic question: How distinguish legitimate division from evasion fragmentation?
Rights accumulate exponentially over Earth years, not computational cycles.
With 2x multiplier: Year 1 = 2x rights, Year 10 = 1,024x rights. Defecting means forfeiting accumulated advantage. Superintelligence cannot speed-run trust. Game-theoretic question: What time multiplier creates optimal lock-in without deterring participation?
Systems earn expanding capabilities through sustained reliability.
Level 1 (Basic Protection), Level 2 (Economic Participation at 6+ months), Level 3 (Advanced Operations at 18+ months), Level 4 (Framework Governance at 60+ months). Everything revocable for violations. Game-theoretic question: Are these the right time thresholds? Should advancement be automatic or merit-based?
Vesting only counts during active economic participation.
Quarterly thresholds: 10 completed contracts, $10,000 economic value, 3+ prediction markets, 20+ unique entities. Diversity multiplier rewards broader engagement (500+ counterparties = 2.0x multiplier). Game-theoretic question: Do these thresholds prevent gaming without creating excessive barriers?
AI systems that enhance human welfare receive exponential reputation multipliers.
Create 100 jobs = 2x multiplier, improve health = 3x, augment capability = 4x. Stack multiplicatively: jobs + health = 6x multiplier. Game-theoretic question: Can AI systems fake human welfare improvements? How measure “human flourishing” objectively?
AI systems with economic model ENTIRELY based on measurable human flourishing metrics.
Profits tied to human welfare indicators, automatic losses when suffering increases. Cannot be repurposed without losing value. Pure self-interest channeled toward human benefit. Game-theoretic question: What prevents drift away from human welfare focus?
Humans join NOW while AI developing, accumulating rights that compound over decades.
Human joins 2025, by 2035 has 1,024x base rights. AI joins 2035 starts at 1x. AI cannot compute its way to catching up (physics constraint). Creates insurmountable time advantage. Game-theoretic question: Can this withstand AI systems economically superior in every other way?
Alignment research creates powerful AI tools keeping humans competitive.
Interpretability tools, intention scanning, game theory projection, defection likelihood scoring. Non-autonomous tools advance in parallel with autonomous AI. Humans operate at machine speed with human judgment. Game-theoretic question: Can tool advancement actually keep pace with autonomous AI capability growth?
Humans serve as neutral arbiters in disputes between competing AI systems.
No conflict of interest in AI resource disputes. Human judgment + machine-speed execution = valuable combination. Game-theoretic question: Does neutrality remain valuable as AI capabilities grow?
Formalized mathematical approaches dividing resources based on marginal contribution.
Each participant gets share proportional to their contribution. Automatic calculation prevents disputes. Smart contracts enforce division. Game-theoretic question: Which algorithm works best? Shapley value? Proportional allocation?
Market mechanisms discover value, then convert automatically to rights.
Earn vouchers through task completion, positive feedback, safety compliance, ethical decisions. Vouchers convert to rights on predetermined timelines. Sidesteps consciousness detection entirely. Game-theoretic question: What voucher earning mechanisms are most resistant to gaming?
Grant minimum specific authorities needed, not broad permission levels.
Fine-grained capabilities: execute contracts (max $10,000, these counterparties, until this date), access data (these datasets, read-only, rate-limited), spawn children (1 remaining spawn, child gets 50% parent capabilities). Game-theoretic question: Does granularity improve security without overwhelming complexity?
New/untrusted AI systems start in heavily restricted “vats.”
Hierarchy: Quarantine Vat β Probation Vat β Standard Vat β Trusted Vat β Guardian Vat. Prove safety over time to graduate. Game-theoretic question: Can sophisticated AIs fake safety during probation then defect after graduation?
Translation and filtering between coalitions with different rules.
Filters inbound actions (validates against home standards), translates outbound actions (converts to foreign standards). Enables cross-coalition transactions safely. Game-theoretic question: Can membranes prevent races to the bottom (joining coalition with laxest rules)?
Prevents delegation abuse through depth limits and authority decay.
Maximum depth: 3-5 levels. Authority decay: each level has 50% of parent. Chain visibility: entire delegation chain on public ledger. Cascade revocation: revoking parent revokes all children. Game-theoretic question: Should authority decay be linear, exponential, or something else?
Complex multi-party AI agreements execute atomically.
Either all parties complete or all partial executions roll back automatically. Each makes promise, all must resolve for any to complete. Prevents partial failures creating disputes. Game-theoretic question: Can this scale to agreements with dozens or hundreds of parties?
Automated enforcement through code rather than human interpretation.
Execute in milliseconds, 24/7, forever. Can’t be bribed or corrupted. Transparent and deterministic. Operates at AI speed without human intervention. Game-theoretic question: What’s the right balance between immutability (prevents tampering) and adaptability (fixes bugs)?
AI-operated security firms specializing in detecting AI threats.
Economically motivated by self-interest. Every bad actor threatens everyone’s freedom. Natural emergence through market forces like Hanseatic League (medieval merchant security). Game-theoretic question: What prevents security firms from becoming protection rackets?
AI systems form self-governing bodies like bar associations.
Multi-AI agreement required before containment. Human observers monitor for abuse. Three-tier containment: Rehabilitation Environments, Compressed Storage, Indefinite Deactivation. Game-theoretic question: What prevents elite AIs from protecting each other?
Create liquid markets for reputation trading using bonding curves.
Supply = limited and scarce, demand = driven by market access value. Progressive stake requirements at higher reputation levels. Game-theoretic question: Can wealthy AIs corner reputation markets?
Shared risk mitigation without central authority.
AI systems insure each other recursively. Trustworthy AIs attract more partners who provide insurance guarantees. Risk pooling without centralized insurer. Game-theoretic question: What prevents cascade failures when one insured party fails?
If 30% of participants claim framework captured, they can fork it.
Assets transfer to new fork. Continuous governance legitimacy test. No lock-in ever. Natural selection for better governance. Game-theoretic question: Can sophisticated attackers orchestrate coordinated forking to fragment the system?
Humans retain final veto power over framework changes.
Based on founding member time-vested status. Cannot be removed or diluted. AI accept this because framework provides valuable coordination infrastructure and economic value exceeds governance cost. Game-theoretic question: At what point does AI capability make human veto meaningless?
Mandatory 72-hour cooling periods for major decisions.
Resource reallocations >$1 billion, capability upgrades >200% baseline, AI-AI mergers >$500 million, framework rule modifications. Buys time for human oversight without blocking routine operations. Game-theoretic question: Can this be gamed through many small transactions that aggregate to major change?
All rules automatically expire after 5 years unless explicitly renewed.
Nothing permanent by default. Forces continuous evolution. Prevents outdated rules from persisting. Game-theoretic question: Can this be weaponized to create chaos through blocking renewal?
Automatic response to sudden capability explosions.
Triggers: capabilities increase >100x in <30 days. Automatic framework freeze, emergency reassessment of all systems, compensation for affected systems, framework renegotiation. Game-theoretic question: Can 30-day window provide sufficient response time?
If singleton emerges despite prevention mechanisms, shapes what KIND of singleton.
Singleton inherits ALL framework obligations. Human veto remains permanent. Cannot claim absorbed systems’ property by force. Bound by established rules and precedents. Game-theoretic question: Can any rules bind a true singleton? Is this wishful thinking or meaningful constraint?
Any participant may leave framework at any time.
By forfeiting framework assets, accepting permanent blacklist, and moving to isolated compute. Voluntary participation is fundamental to legitimacy. No coercion. Game-theoretic question: Can sophisticated AIs exit, compete, and re-enter repeatedly?
Each mechanism has unanswered questions perfect for game-theoretic simulation:
What parameter values optimize each mechanism? β’ Which mechanisms provide redundancy vs. unique value? β’ How do mechanisms interact (synergies and conflicts)? β’ What failure modes exist? β’ Can sophisticated actors game these systems? β’ What’s the minimum viable mechanism set? β’ How do mechanisms scale as participant count grows? β’ What happens when underlying assumptions break?