FAQ

Everything you need to know about the Torah Citation Knowledge Graph โ€” what's in it, how to use it, and what it means for Torah study.

๐Ÿ“š General 5 questions
What is Sefer-Graph?

A knowledge graph of 1,896,325 cross-references extracted from Torah literature. When Rashi cites Berakhot 2a, or the Arukh HaShulchan references Rambam's Mishneh Torah โ€” those connections are captured, typed, and scored for confidence. It's an MCP server with 8 tools that let any AI client query the graph.

Who is this for?

Anyone who studies Torah and uses AI tools โ€” learners, educators, rabbis, developers. If you've ever wondered "who else cites this gemara?" or "how does Rashi's citation pattern differ from the Meiri's?" โ€” this is for you. It's also a research tool for computational approaches to Torah scholarship.

How do I use it?

Add the MCP server to your Claude Desktop, Claude Code, or any MCP-compatible client. Then ask natural questions:

  • "What texts cite Berakhot 2a?"
  • "Compare Rashi and Meiri's citation patterns"
  • "Find rare citations in the Ritva"
  • "What's most co-cited with Shabbat 108b?"

The AI translates your question into the right tool call automatically.

Is this free?

Yes. The MCP server, data, and dashboard are all open source. The underlying data is stored in Supabase (free tier). You need your own MCP client (Claude Desktop is free).

What's the SHELET protocol?

Every tool response ends with 3 contextual next-action suggestions โ€” named after the Hebrew word ืฉืœื˜ (sign/menu). Instead of leaving you with raw data, it guides your next move: "Explore this result", "Compare citation types", "Find the path between them". It makes exploration natural.

๐Ÿ—„๏ธ Data 5 questions
What sefarim are included?

The graph contains 30 source works whose texts have been processed to extract citations, spanning Tannaitic literature (Mishnah, Tosefta, Sifrei), Talmud Yerushalmi, Bavli commentaries (Rashi, Tosafot, Rashba, Ritva, Ran, Meiri, Rosh, Rif), halachic codes (Mishneh Torah, Tur, Shulchan Arukh), and Acharonim (Arukh HaShulchan, Mishnah Berurah, Magen Avraham, and more). Together they form a network of 1.9M+ connections.

23 Source Works (text processed, citations extracted):

  • Arukh HaShulchan โ€” 343K
  • Rashi โ€” 214K
  • Tosafot โ€” 182K
  • Beit Yosef โ€” 120K, Meiri โ€” 118K, Ritva โ€” 105K
  • Jerusalem Talmud โ€” 102K, Bach โ€” 96K
  • Rashba โ€” 88K, Magen Avraham โ€” 84K, Biur Halacha โ€” 84K
  • Shulchan Arukh โ€” 58K, Rif โ€” 54K, Tur โ€” 52K
  • Shulchan Arukh HaRav โ€” 34K, Taz โ€” 32K, Sefer HaChinukh โ€” 31K
  • Kitzur Shulchan Arukh โ€” 29K, Tosefta โ€” 25K, Rosh โ€” 18K
  • Chayei Adam โ€” 14K, Sifrei โ€” 13K, Mishnah โ€” 313

Key Target Works (cited by others โ€” not yet processed as sources):

  • Talmud Bavli โ€” 578K citations pointing to it (the most cited work in the graph)
  • Torah / Tanakh โ€” 216K citations
  • Mishnah โ€” 129K citations
  • Mishneh Torah (Rambam) โ€” 72K citations
What types of citations are tracked?

15 citation types, each with different meanings:

  • explicit_talmud (30.5%) โ€” direct Talmud references
  • back_reference (17.8%) โ€” self-references within a work
  • named_position (17.0%) โ€” citing an authority's opinion
  • explicit_verse (10.3%) โ€” biblical verse citations
  • explicit_mishnah (9.9%) โ€” Mishnah references
  • conceptual_dependency (6.7%) โ€” building on an idea without explicit citation
  • allusion, paraphrased_verse, legal_principle, explicit_braita, gezeira_shava, and more
What does the confidence score mean?

Each citation has a confidence score from 0 to 1 indicating how certain the extraction is:

  • โ‰ฅ0.9 (36.3%) โ€” High confidence. Explicit, unambiguous citation.
  • 0.7โ€“0.9 (47.4%) โ€” Medium. Strong textual evidence but some ambiguity.
  • <0.7 (16.3%) โ€” Lower confidence. Allusions, conceptual links, or ambiguous references.

Average across the graph: 0.873. Tools default to โ‰ฅ0.7 but you can adjust.

How was the data extracted?

LLM-powered extraction pipeline processing the full text of each sefer. Each segment is analyzed for all references to other texts โ€” verses, Talmud, Mishnah, other commentators, legal codes. The pipeline identifies the citation type and assigns a confidence score. Results are stored in DuckDB locally and Supabase for the MCP server.

Why don't I see Rambam as a source?

The current pipeline has processed 30 major works (spanning ~200+ individual masekhtot/volumes): 3 Tannaitic (Mishnah, Tosefta, Sifrei), Talmud Yerushalmi, 8 Bavli commentaries (Rashi across 36 masekhtot, Tosafot across 35, Rashba, Ritva, Ran, Meiri, Rosh, Rif), 3 Rishonim halacha (Mishneh Torah, Tur, Sefer HaChinukh), Shulchan Arukh with 4 commentaries (Beit Yosef, Bach, Prisha, SA HaRav), and 7 Acharonim (Arukh HaShulchan, Mishnah Berurah, Magen Avraham, Biur Halacha, Taz, Kitzur SA, Chayei Adam). The Rambam (Mishneh Torah) is both a major source (88,800 citations extracted) and the most-cited target in the graph.

๐Ÿ”ง Tools 4 questions
What are the 8 tools?
  • search_citations โ€” Find what cites a text or what a text cites
  • top_cited โ€” Most frequently referenced texts in the graph
  • citation_path โ€” Find how two texts connect through citation chains
  • graph_stats โ€” Overview statistics of the entire graph
  • citation_types โ€” Distribution of citation types, optionally filtered
  • co_cited โ€” What texts appear together most often
  • compare_sources โ€” Compare citation patterns between two authors
  • rare_finds โ€” Surface unique, one-of-a-kind citations
What's the most interesting tool?

compare_sources is arguably the most novel. You can compare the "citation DNA" of any two commentators โ€” Rashi vs Meiri, Tosafot vs Rashba. It reveals that Rashi is 49% back-references while Meiri is 50% explicit Talmud citations. Their overlap is only 54.2%. These are patterns no human could compute manually across hundreds of thousands of citations.

How fast is it?

Median (p50) latency is about 4 seconds per query. The bottleneck is the Supabase management API, not the database itself. graph_stats is the slowest at ~14s because it runs 3 aggregate queries on 1.9M rows. Direct Postgres would be 10-50ร— faster โ€” this is an alpha trade-off for simplicity.

How do I install the MCP server?

Add this to your Claude Desktop config (claude_desktop_config.json):

"sefer-graph": { "command": "python3", "args": ["/path/to/mcp_server.py"], "env": { "SUPABASE_PAT": "your_token" } }

Or clone the repo and follow the README.

๐Ÿ—๏ธ Alpha Status 3 questions
Why is this alpha?

The data and tools work, but there's more to do: more source corpora (Rambam, Ramban, Ran), better reference normalization (same text appearing under different names), deduplication of citations, and performance optimization. Every query you make is logged (anonymously) to help improve the system.

What's logged?

Tool name, parameters, result count, latency, and a user ID (which you set โ€” defaults to "anonymous"). No personal data. The logs help us understand which tools are most useful, which queries fail, and where to focus improvement. You can see all logged queries on the live dashboard.

What's coming next?
  • More source corpora โ€” Rambam, Ramban, Ran, Rabbenu Chananel
  • Reference normalization โ€” "Berakhot 2a" and "ื‘ืจื›ื•ืช ื‘." as the same text
  • Citation deduplication
  • Direct Postgres for faster queries
  • Semantic search โ€” find citations by concept, not just reference
  • Interactive graph visualization