Everything you need to know about the Torah Citation Knowledge Graph โ what's in it, how to use it, and what it means for Torah study.
A knowledge graph of 1,896,325 cross-references extracted from Torah literature. When Rashi cites Berakhot 2a, or the Arukh HaShulchan references Rambam's Mishneh Torah โ those connections are captured, typed, and scored for confidence. It's an MCP server with 8 tools that let any AI client query the graph.
Anyone who studies Torah and uses AI tools โ learners, educators, rabbis, developers. If you've ever wondered "who else cites this gemara?" or "how does Rashi's citation pattern differ from the Meiri's?" โ this is for you. It's also a research tool for computational approaches to Torah scholarship.
Add the MCP server to your Claude Desktop, Claude Code, or any MCP-compatible client. Then ask natural questions:
The AI translates your question into the right tool call automatically.
Yes. The MCP server, data, and dashboard are all open source. The underlying data is stored in Supabase (free tier). You need your own MCP client (Claude Desktop is free).
Every tool response ends with 3 contextual next-action suggestions โ named after the Hebrew word ืฉืื (sign/menu). Instead of leaving you with raw data, it guides your next move: "Explore this result", "Compare citation types", "Find the path between them". It makes exploration natural.
The graph contains 30 source works whose texts have been processed to extract citations, spanning Tannaitic literature (Mishnah, Tosefta, Sifrei), Talmud Yerushalmi, Bavli commentaries (Rashi, Tosafot, Rashba, Ritva, Ran, Meiri, Rosh, Rif), halachic codes (Mishneh Torah, Tur, Shulchan Arukh), and Acharonim (Arukh HaShulchan, Mishnah Berurah, Magen Avraham, and more). Together they form a network of 1.9M+ connections.
23 Source Works (text processed, citations extracted):
Key Target Works (cited by others โ not yet processed as sources):
15 citation types, each with different meanings:
Each citation has a confidence score from 0 to 1 indicating how certain the extraction is:
Average across the graph: 0.873. Tools default to โฅ0.7 but you can adjust.
LLM-powered extraction pipeline processing the full text of each sefer. Each segment is analyzed for all references to other texts โ verses, Talmud, Mishnah, other commentators, legal codes. The pipeline identifies the citation type and assigns a confidence score. Results are stored in DuckDB locally and Supabase for the MCP server.
The current pipeline has processed 30 major works (spanning ~200+ individual masekhtot/volumes): 3 Tannaitic (Mishnah, Tosefta, Sifrei), Talmud Yerushalmi, 8 Bavli commentaries (Rashi across 36 masekhtot, Tosafot across 35, Rashba, Ritva, Ran, Meiri, Rosh, Rif), 3 Rishonim halacha (Mishneh Torah, Tur, Sefer HaChinukh), Shulchan Arukh with 4 commentaries (Beit Yosef, Bach, Prisha, SA HaRav), and 7 Acharonim (Arukh HaShulchan, Mishnah Berurah, Magen Avraham, Biur Halacha, Taz, Kitzur SA, Chayei Adam). The Rambam (Mishneh Torah) is both a major source (88,800 citations extracted) and the most-cited target in the graph.
search_citations โ Find what cites a text or what a text citestop_cited โ Most frequently referenced texts in the graphcitation_path โ Find how two texts connect through citation chainsgraph_stats โ Overview statistics of the entire graphcitation_types โ Distribution of citation types, optionally filteredco_cited โ What texts appear together most oftencompare_sources โ Compare citation patterns between two authorsrare_finds โ Surface unique, one-of-a-kind citationscompare_sources is arguably the most novel. You can compare the "citation DNA" of any two commentators โ Rashi vs Meiri, Tosafot vs Rashba. It reveals that Rashi is 49% back-references while Meiri is 50% explicit Talmud citations. Their overlap is only 54.2%. These are patterns no human could compute manually across hundreds of thousands of citations.
Median (p50) latency is about 4 seconds per query. The bottleneck is the Supabase management API, not the database itself. graph_stats is the slowest at ~14s because it runs 3 aggregate queries on 1.9M rows. Direct Postgres would be 10-50ร faster โ this is an alpha trade-off for simplicity.
Add this to your Claude Desktop config (claude_desktop_config.json):
"sefer-graph": { "command": "python3", "args": ["/path/to/mcp_server.py"], "env": { "SUPABASE_PAT": "your_token" } }
Or clone the repo and follow the README.
The data and tools work, but there's more to do: more source corpora (Rambam, Ramban, Ran), better reference normalization (same text appearing under different names), deduplication of citations, and performance optimization. Every query you make is logged (anonymously) to help improve the system.
Tool name, parameters, result count, latency, and a user ID (which you set โ defaults to "anonymous"). No personal data. The logs help us understand which tools are most useful, which queries fail, and where to focus improvement. You can see all logged queries on the live dashboard.