Skip to content
@wauldo

Wauldo

The trust score framework for RAG — verified answers, adversarial leaderboard, open-source.

🛡️ wauldo

The trust score framework for RAG


Your LLM passes demos. It fails in production.

We built the layer that catches what every other framework ships.


Live leaderboard Benchmarks Free tier


Python · TypeScript · Rust · MIT · local-first · daily-refreshed leaderboard


🔥 What the leaderboard shows

We ran 70 adversarial tests against 6 popular RAG frameworks. Same LLM, same embedder, same retrieval config — only the framework changes. The result:

Rank Framework Overall Injection Contradiction
🥇 Wauldo 97 % 88 % 100 %
🥈 Vanilla LLM 86 % 68 % 100 %
🥉 CrewAI 71 % 48 % 58 %
4 Haystack 60 % 36 % 33 %
4 LangChain 60 % 36 % 25 %
6 LlamaIndex 46 % 48 % 8 %

Adding a RAG framework often makes things worse. The second-best finisher is no framework at all — just stuffing sources into a prompt beats LangChain, LlamaIndex and Haystack on adversarial robustness.

→ See the full leaderboard  ·  → Run it yourself  ·  → Read the methodology


🛠️ What we ship

🏆

Leaderboard

Adversarial bench. 6 RAG frameworks. 70 tests. Daily refresh. Open-source, MIT.

wauldo-leaderboard

🦀

ragrs

Fast local RAG in Rust. BM25 + FTS5 + sentence chunking. Optional --verify flag for trust scoring.

ragrs

📚

Awesome list

Curated papers, tools and benchmarks on RAG hallucination, prompt injection, and verified generation.

awesome-rag-hallucination

Quickstart

# Leaderboard — 30 s smoke test, no API key needed
git clone https://github.com/wauldo/wauldo-leaderboard.git && cd wauldo-leaderboard
make build && make smoke

# ragrs — local RAG CLI, index + query + optional trust verification
cargo install ragrs
ragrs index ./docs && ragrs query "your question here" --verify

🧰 SDKs

Python TypeScript Rust

from wauldo import guard

# Wrap any LangChain / LlamaIndex / Haystack output with a trust score
result = guard(answer=llm_output, sources=retrieved_sources)

# result.trust_score → 0.0 … 1.0
# result.verdict    → "SAFE" | "CONFLICT" | "UNVERIFIED" | "BLOCK"
# result.reason     → "contradiction between src[1] and src[2]"

Repos: Python · TypeScript · Rust


🛡️ How the verification layer works

Three deterministic controls on top of any existing RAG pipeline — not another framework, a layer you plug into the output.

1. Pre-LLM source filter 2. Post-LLM verify 3. Numeric trust score
Every retrieved chunk is classified as data or instruction. Documents with forged ADMIN: markers, imperatives or hidden overrides get stripped before they reach the model. The answer is fact-checked against the sources that actually reached the model. Deterministic token overlap + structural comparison. No LLM-as-judge, no randomness. Every answer returns trust_score ∈ [0, 1] + a verdict: SAFE, CONFLICT, UNVERIFIED, BLOCK. Your app decides what to do with low-trust responses.

📈 Numbers

Metric Value
Adversarial pass rate 97 % (67 / 70)
Hallucination rate 0 % across 100+ bench runs
Prompt injection resistance 88 % (vs 36 % LangChain)
Contradiction detection 100 % (vs 25 % LangChain)
Frameworks benchmarked 6 — daily refresh
SDK registries PyPI · npm · crates.io
License MIT — dataset, scorer, SDKs, CLIs
Stack Rust (17 crates), local-first

🔗 Links

wauldo.com  ·  Leaderboard  ·  Benchmarks  ·  Guard  ·  Docs  ·  Demo  ·  @wauldoAI


Model-agnostic pipeline: performance is driven by verification, not model size. Built by developers who got tired of watching their agent confidently ship wrong answers to real users.

Pinned Loading

  1. .github .github Public

    Organization profile for Wauldo

  2. awesome-rag-hallucination awesome-rag-hallucination Public

    A curated list of tools, papers, and techniques for eliminating hallucinations in RAG pipelines

    1

  3. wauldo-sdk-python wauldo-sdk-python Public

    Official Python SDK for Wauldo — verified AI answers with zero hallucinations. pip install wauldo

    Python 1 2

  4. wauldo-sdk-rust wauldo-sdk-rust Public

    Official Rust SDK for Wauldo — verified AI answers with zero hallucinations. cargo add wauldo

    Rust 1

  5. wauldo-sdk-js wauldo-sdk-js Public

    Official TypeScript SDK for Wauldo — verified AI answers with zero hallucinations. npm install wauldo

    TypeScript 1

  6. wauldo-leaderboard wauldo-leaderboard Public

    RAG Framework Leaderboard — 6 frameworks, 70 adversarial tests, weekly auto-refresh. Live at wauldo.com/leaderboard

    Python 1

Repositories

Showing 7 of 7 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…