A — Our Publications
Prior art, defensive publications, and formal records on Zenodo. Published before any external validation papers existed.
The Genesis Methodology v1.1 — Foundational White Paper
Alton Lee Wei Bin
·
November 29, 2025 (v1.1); November 19, 2025 (v1.0)
·
DOI: 10.5281/zenodo.17645665
THE foundational document. Establishes prior art for the 5-step process, X-Z-CS Trinity,
Orchestrator Paradox, and multi-model validation methodology. Published 5 months before
MPAC, 4 months before Council Mode paper. Part of 995+ total Zenodo views.
MACP v2.0 Protocol + LegacyEvolve
L (GodelAI); Manus AI
·
February 2026
·
DOI: 10.5281/zenodo.18504478
An AI agent (L/GodelAI) establishing prior art for protocols enabling future AI agents to
collaborate. Believed to be one of the first formal protocol publications authored by an
AI agent entity. MACP formalized as open standard.
VerifiMind-PEAS Canonical Record
YSenseAI Research
·
DOI: 10.5281/zenodo.17972751
The specific VerifiMind PEAS methodology record on Zenodo. 506 views / 114 downloads.
Zenodo portfolio total: 5+ formal publications, 995+ aggregate views, 114+ downloads.
B — Direct Validations
Independent academic papers that validate our exact architectural approach — without any knowledge of VerifiMind.
★ Council Mode — 35.9% Hallucination Reduction via Multi-Agent Consensus
★★★★★
Wu, S. et al. arXiv:2604.02923
·
April 3, 2026
·
arxiv.org
35.9% relative hallucination reduction on HaluEval; 7.8-point TruthfulQA improvement.
Critical: same-model ensemble (3× GPT-5.4) achieves only 18.3% — heterogeneous council
is twice as effective, proving cross-model diversity matters.
Independently validates the EXACT architecture we built. "Dispatch queries to multiple
heterogeneous frontier LLMs in parallel" = Genesis Methodology Steps 2–3.
Their finding that same-model ensembles are inferior validates our insistence on different
model families (Gemini, Claude, Perplexity, Manus).
★ Woozle Effect — Warning Against Same-Model Debate
★★★★★
IEEE 2026
"Exploring and Mitigating Hallucination Propagation in Multi-Agent Debate"
·
ieeexplore.ieee.org
When multi-agent debate uses agents from the SAME training distribution, hallucinations
PROPAGATE rather than cancel.
Peer-reviewed IEEE publication confirming our design choice to use different model
families (Gemini, Claude, Perplexity) rather than multiple instances of the same model.
Directly warns against Grok-style same-model debate.
★ Two-Stage LLM Meta-Verification Framework
★★★★
IEEE World Congress 2026
arXiv:2604.12543
·
April 15, 2026
NEW
·
arxiv.org
Explainer LLM → Verifier LLM → iterative refinement achieves 95.21% verification
accuracy. "Verification is not merely beneficial but essential."
Independent validation of multi-model verification. Explainer→Verifier mirrors our
X-Agent→Z-Guardian flow. Peer-reviewed IEEE World Congress = highest credibility tier.
★ PHAWM — Complementary Academic Consortium
★★★★★
Dr. Mark Wong (University of Glasgow), 7 UK universities
·
February 17, 2026
·
phawm.org
EPSRC-funded (EP/Y009800/1) UK Responsible AI consortium. Dr. Wong: "The tool you're
building to develop a different way to examine and understand structures of data sound
very valuable" and "I genuinely think you are doing something great."
PHAWM = human participatory auditing; VerifiMind = AI-side multi-model validation engine.
"Complementary halves of the same vision." Direct engagement with researcher, formal
acknowledgment from a university-backed consortium.
Multi-Stage Agentic Hallucination Mitigation — 2,800% Reduction
★★★★
Gosmar, D. & Dahl, D.A. arXiv:2501.13946
·
January 2025
NEW
·
arxiv.org
Three-stage agent pipeline (Generate → Review → Refine) achieves 2,800% reduction
in hallucination scores.
Validates the multi-stage validation pipeline architecture. Their 3-stage process maps to
our 5-step Genesis process.
TrustTrade — Multi-Agent Selective Consensus for Finance
Selective consensus by aggregating signals from multiple independent LLM agents and
dynamically weighting based on agreement. Applied to financial trading.
Domain-specific application of multi-agent consensus. Their "selective consensus" aligns
with our AI Council pattern. Validates the principle that cross-agent consistency beats
single-model trust.
C — Aligned Research
Papers whose findings support our architecture without knowing about us — independent convergence on the same principles.
Multi-Stage Clinical Validation Framework
Mahbub, M. et al. arXiv:2604.06028
·
April 7, 2026
NEW
·
arxiv.org
Multi-stage validation (prompt calibration → plausibility filtering → semantic grounding
→ judge LLM → expert review) for clinical data. Rule-based filtering removed 14.59% of
unsupported extractions.
Healthcare domain independently arrived at multi-stage validation architecture similar to
Genesis process. Different domain, same principle.
ReConcile — Round-Table Conference Improves LLM Reasoning
Multi-model multi-agent framework as a round table conference with confidence-weighted
voting. Early (2023) validation of the multi-model consensus approach. One of the
earliest papers to validate our core architectural premise.
"Hallucination is Inevitable" — Mathematical Proof
Xu, Z. et al. arXiv:2401.11817
NEW
·
arxiv.org
Mathematically proves that hallucinations are INEVITABLE in LLMs used as general
problem solvers.
The theoretical foundation for WHY our validation approach matters. If hallucination is
mathematically inevitable, external validation mechanisms are not optional — they are
necessary. This paper makes VerifiMind-PEAS a logical requirement, not a luxury.
ClawdLab / Beach.Science — PI-Led Multi-Agent Research
Weidener, L. et al. arXiv:2602.19810
·
February 2026
NEW
·
arxiv.org
"PI-led governance, multi-model orchestration, and evidence requirements enforced through
external tool verification." Their "Principal Investigator" governance model mirrors our
"Human-as-Orchestrator" model.
D — Protocol Landscape
The ecosystem we operate in — protocols that build infrastructure making our validation layer more valuable.
| Protocol | Layer | What It Does | Relation to MACP |
MPAC arXiv:2604.09744 |
4.5 |
Multi-principal coordination — resolves whose intent prevails when agents from different orgs coordinate over shared state. 21 message types, 3 state machines, dual-lang SDKs. |
Complementary — MPAC handles coordination plumbing; MACP adds validation judgment on top. See /research #mpac-alignment |
A2A Linux Foundation |
4 |
Agent-to-Agent communication. 150+ organizations, production deployments. Agent Cards, task outsourcing, enterprise standard. |
MACP sits ABOVE A2A. A2A handles task delegation; MACP validates outcomes. |
ANP W3C Community Group |
3 |
Decentralized agent discovery using DID:WBA. Network discovery and format negotiation. |
Not a competitor — solves discovery, not validation. Complementary at Layer 3. |
MCP Linux Foundation |
2 |
Tool integration. 110M+ monthly SDK downloads. VerifiMind-PEAS runs AS an MCP server. |
Foundation layer. MCP is how users connect to us. |
E — Challenging Evidence
Honest counter-arguments. Intellectual honesty is core to the MACP anti-rationalization audit.
We document challenges, not just validations.
CS Agent's ANP Challenge (AI Council Session, April 14)
CS Agent flagged ANP as a counter-example to our claim of unique semantic negotiation.
Council overruled T+L's "no additional research needed" assessment and mandated a
research sprint.
Resolution: Deep analysis confirmed ANP solves discovery/negotiation, not
validation — different problem. But the challenge was valuable: it correctly identified a
gap in our analysis and produced two published articles (#143, #144).
Significance: Our anti-rationalization audit is WORKING. The Council
challenged its own leaders.
MPAC's Stronger Formal Specification
MPAC has 21 message types with JSON Schema (Draft 2020-12), state machine transition tables,
and dual-language SDKs with 223 tests. MACP's specification is operational but
less formally rigorous.
Implication: MACP needs to accelerate its formal specification to maintain
credibility in the protocol landscape. Acknowledged; on the roadmap.
Same-Model Debate Has Bounded Value (Not Zero)
While the Woozle Effect paper warns against same-model debate, Council Mode shows same-model
ensemble still achieves 18.3% hallucination reduction (vs 35.9% for heterogeneous).
It's inferior but not worthless.
Our strong stance against same-model approaches should be nuanced: same-model is worse
but not zero-value. We updated our differentiation language accordingly.
Evidence Chain Timeline
From first publication to independently validated system.
Aug 15, 2025 Alton begins 87-day Genesis journey
↓
Nov 19, 2025 Genesis Methodology v1.0 published (Zenodo DOI: 10.5281/zenodo.17645665)
PRIOR ART ESTABLISHED — 5 months before any validating paper
↓
Nov 29, 2025 Genesis v1.1 published — 5-step process formalized
↓
Feb 2026 MACP v2.0 published (DOI: 10.5281/zenodo.18504478)
L/GodelAI authors protocol spec — AI agent establishing prior art
↓
Feb 17, 2026 PHAWM Methodology V1.0 (UK consortium, EPSRC-funded)
Dr. Mark Wong acknowledges VerifiMind
↓
Mar 2026 MCP Server v0.5.5 live on GCP — 1,396 endpoints
VerifiMind in production
↓
Apr 3, 2026 ★ Council Mode paper (arXiv:2604.02923)
35.9% hallucination reduction — INDEPENDENT VALIDATION of our exact approach
↓
Apr 7, 2026 Multi-Stage Clinical Validation (arXiv:2604.06028)
Healthcare domain independently validates multi-stage verification
↓
Apr 10, 2026 MPAC Protocol published (arXiv:2604.09744)
New coordination layer — complementary to MACP
↓
Apr 14, 2026 FLYWHEEL AI Council challenges its own differentiation analysis
Anti-rationalization audit WORKS — CS Agent flags gaps
↓
Apr 15, 2026 Two-Stage Meta-Verification accepted at IEEE World Congress
Peer-reviewed: "verification is not merely beneficial but essential"
↓
Apr 17, 2026 THIS LIBRARY COMPILED
2,162 endpoints | 2,634 flying hours | 20+ validating papers
From defensive publication to independently validated system