FAQ

Everything you need to know about the Torah Citation Knowledge Graph — what's in it, how to use it, and what it means for Torah study.

📚 General 5 questions

What is Sefer-Graph?

A knowledge graph of 1,896,325 cross-references extracted from Torah literature. When Rashi cites Berakhot 2a, or the Arukh HaShulchan references Rambam's Mishneh Torah — those connections are captured, typed, and scored for confidence. It's an MCP server with 8 tools that let any AI client query the graph.

Who is this for?

Anyone who studies Torah and uses AI tools — learners, educators, rabbis, developers. If you've ever wondered "who else cites this gemara?" or "how does Rashi's citation pattern differ from the Meiri's?" — this is for you. It's also a research tool for computational approaches to Torah scholarship.

How do I use it?

Add the MCP server to your Claude Desktop, Claude Code, or any MCP-compatible client. Then ask natural questions:

"What texts cite Berakhot 2a?"
"Compare Rashi and Meiri's citation patterns"
"Find rare citations in the Ritva"
"What's most co-cited with Shabbat 108b?"

The AI translates your question into the right tool call automatically.

Is this free?

Yes. The MCP server, data, and dashboard are all open source. The underlying data is stored in Supabase (free tier). You need your own MCP client (Claude Desktop is free).

What's the SHELET protocol?

Every tool response ends with 3 contextual next-action suggestions — named after the Hebrew word שלט (sign/menu). Instead of leaving you with raw data, it guides your next move: "Explore this result", "Compare citation types", "Find the path between them". It makes exploration natural.

🗄️ Data 5 questions

What sefarim are included?

The graph contains 30 source works whose texts have been processed to extract citations, spanning Tannaitic literature (Mishnah, Tosefta, Sifrei), Talmud Yerushalmi, Bavli commentaries (Rashi, Tosafot, Rashba, Ritva, Ran, Meiri, Rosh, Rif), halachic codes (Mishneh Torah, Tur, Shulchan Arukh), and Acharonim (Arukh HaShulchan, Mishnah Berurah, Magen Avraham, and more). Together they form a network of 1.9M+ connections.

23 Source Works (text processed, citations extracted):

Arukh HaShulchan — 343K
Rashi — 214K
Tosafot — 182K
Beit Yosef — 120K, Meiri — 118K, Ritva — 105K
Jerusalem Talmud — 102K, Bach — 96K
Rashba — 88K, Magen Avraham — 84K, Biur Halacha — 84K
Shulchan Arukh — 58K, Rif — 54K, Tur — 52K
Shulchan Arukh HaRav — 34K, Taz — 32K, Sefer HaChinukh — 31K
Kitzur Shulchan Arukh — 29K, Tosefta — 25K, Rosh — 18K
Chayei Adam — 14K, Sifrei — 13K, Mishnah — 313

Key Target Works (cited by others — not yet processed as sources):

Talmud Bavli — 578K citations pointing to it (the most cited work in the graph)
Torah / Tanakh — 216K citations
Mishnah — 129K citations
Mishneh Torah (Rambam) — 72K citations

What types of citations are tracked?

15 citation types, each with different meanings:

explicit_talmud (30.5%) — direct Talmud references
back_reference (17.8%) — self-references within a work
named_position (17.0%) — citing an authority's opinion
explicit_verse (10.3%) — biblical verse citations
explicit_mishnah (9.9%) — Mishnah references
conceptual_dependency (6.7%) — building on an idea without explicit citation
allusion, paraphrased_verse, legal_principle, explicit_braita, gezeira_shava, and more

What does the confidence score mean?

Each citation has a confidence score from 0 to 1 indicating how certain the extraction is:

≥0.9 (36.3%) — High confidence. Explicit, unambiguous citation.
0.7–0.9 (47.4%) — Medium. Strong textual evidence but some ambiguity.
<0.7 (16.3%) — Lower confidence. Allusions, conceptual links, or ambiguous references.

Average across the graph: 0.873. Tools default to ≥0.7 but you can adjust.

How was the data extracted?

LLM-powered extraction pipeline processing the full text of each sefer. Each segment is analyzed for all references to other texts — verses, Talmud, Mishnah, other commentators, legal codes. The pipeline identifies the citation type and assigns a confidence score. Results are stored in DuckDB locally and Supabase for the MCP server.

Why don't I see Rambam as a source?

The current pipeline has processed 30 major works (spanning ~200+ individual masekhtot/volumes): 3 Tannaitic (Mishnah, Tosefta, Sifrei), Talmud Yerushalmi, 8 Bavli commentaries (Rashi across 36 masekhtot, Tosafot across 35, Rashba, Ritva, Ran, Meiri, Rosh, Rif), 3 Rishonim halacha (Mishneh Torah, Tur, Sefer HaChinukh), Shulchan Arukh with 4 commentaries (Beit Yosef, Bach, Prisha, SA HaRav), and 7 Acharonim (Arukh HaShulchan, Mishnah Berurah, Magen Avraham, Biur Halacha, Taz, Kitzur SA, Chayei Adam). The Rambam (Mishneh Torah) is both a major source (88,800 citations extracted) and the most-cited target in the graph.

🔧 Tools 4 questions

What are the 8 tools?

search_citations — Find what cites a text or what a text cites
top_cited — Most frequently referenced texts in the graph
citation_path — Find how two texts connect through citation chains
graph_stats — Overview statistics of the entire graph
citation_types — Distribution of citation types, optionally filtered
co_cited — What texts appear together most often
compare_sources — Compare citation patterns between two authors
rare_finds — Surface unique, one-of-a-kind citations

What's the most interesting tool?

compare_sources is arguably the most novel. You can compare the "citation DNA" of any two commentators — Rashi vs Meiri, Tosafot vs Rashba. It reveals that Rashi is 49% back-references while Meiri is 50% explicit Talmud citations. Their overlap is only 54.2%. These are patterns no human could compute manually across hundreds of thousands of citations.

How fast is it?

Median (p50) latency is about 4 seconds per query. The bottleneck is the Supabase management API, not the database itself. graph_stats is the slowest at ~14s because it runs 3 aggregate queries on 1.9M rows. Direct Postgres would be 10-50× faster — this is an alpha trade-off for simplicity.

How do I install the MCP server?

Add this to your Claude Desktop config (claude_desktop_config.json):

"sefer-graph": { "command": "python3", "args": ["/path/to/mcp_server.py"], "env": { "SUPABASE_PAT": "your_token" } }

Or clone the repo and follow the README.

🏗️ Alpha Status 3 questions

Why is this alpha?

The data and tools work, but there's more to do: more source corpora (Rambam, Ramban, Ran), better reference normalization (same text appearing under different names), deduplication of citations, and performance optimization. Every query you make is logged (anonymously) to help improve the system.

What's logged?

Tool name, parameters, result count, latency, and a user ID (which you set — defaults to "anonymous"). No personal data. The logs help us understand which tools are most useful, which queries fail, and where to focus improvement. You can see all logged queries on the live dashboard.

What's coming next?

More source corpora — Rambam, Ramban, Ran, Rabbenu Chananel
Reference normalization — "Berakhot 2a" and "ברכות ב." as the same text
Citation deduplication
Direct Postgres for faster queries
Semantic search — find citations by concept, not just reference
Interactive graph visualization