LLM Semicha, Part I

LLMs get tested on an Israeli Rabbanut Semicha Exam

Feb 16, 2025

Intro:

In our first article here, we highlighted the importance of benchmarking LLMs, which is the process of systematically testing and comparing AI language models' performance on standardized tasks to measure their capabilities and limitations.

It’s time to take off the training wheels – we're throwing these AI models into the deep end of the pool by testing them against actual questions from the Israeli Rabbanut semicha exam. Having spent time in an Israeli yeshiva, I've seen firsthand how these exams stress out even the most diligent yeshiva student.

This isn't just another academic exercise in AI testing. This study breaks new ground in understanding how well LLMs can truly grasp and engage with complex Jewish legal concepts, potentially revolutionizing how we approach both AI development and religious scholarship. The implications could be game-changing – if an AI can tackle the intricacies of an Israeli Rabbanut semicha exam, what else might it be capable of?

Methodology:

We used an Israeli Rabbnut Semicha test with the answers provided (publicly available here) to test 3 LLMs, Claude Sonnet 3.5, ChatGPT 3o mini (with internet search and reasoning), and DeepSeek R1 (with search and DeepThink). The last two models, ChatGPT and DeepSeek utilize the latest LLM advancement, which is chain-of-thought prompting. Per ChatGPT, Chain-of-Thought (CoT) prompting is a technique in large language models (LLMs) that enhances reasoning by guiding the model to generate intermediate steps leading to a final answer. Hypothetically, this would seem to be well-suited to understanding the complexities of Jewish texts.

Let’s introduce our semicha candidates.

Claude Sonnet 3.5
1. Our good friend and (my) fav model, even though can’t search the internet yet or ‘reasons’ :(
ChatGPT 3o
1. This model with its deep research function shocked the world with its reasoning ability, getting the highest score yet, 26%, on “Humanity’s Last Exam.”
DeepSeek:
1. The release of this model wiped out ~$589,000,000,000.00 (billion, ya I know) of value from Nvidia.

A vintage block print-style illustration from the early 1900s depicting a person giving smicha (rabbinic ordination) to an abstract representation of an LLM. The LLM is symbolized as a glowing book, a mechanical brain, or a swirling cloud of Hebrew letters and numbers. The rabbi, dressed in traditional Jewish robes, places his hands solemnly over the LLM figure in a moment of ordination. The scene is detailed with fine line work, reminiscent of classic early 20th-century printmaking, with an aged parchment texture and an old-world artistic feel. — This article wouldn’t be complete without an bizarre AI-generated photo.

Instructions for the test, given to the LLMs

The Question: Part I

Data: The Answers Part I

Rabbanut Answer, Part I

Claude Sonnet 3.5 (no search or reason option)

Claude: 3/10

General idea is correct
No mention of Rashi (ie, later it will be mutar [rashi beitza 3b])
Incorrect svara of Ran, who really says it’s the lack of difference between the currently asur and the mutar objects. The above quoted approach that is attributed to the Ran sounds like Rashi.
Rashbah: I could not find this rashbah

ChatGPT 3o mini (with internet search and reasoning)

ChatGPT: 0/10 points

Gets the entire concept wrong- it’s not about which taste, the asur vs mutar is stronger, but the fact that the asur WILL become mutar at some point, changes the halachic equation
No mention of Rashi/Rashaba
Wrong siman in Shulchan Aruch

A pointillist painting in the style of Georges Seurat, depicting a basket of eggs on a rustic wooden table. Some of the eggs glow softly, symbolizing the concept of 'Davar SheYesh Bo Matirin' in Jewish law—items that will soon become permitted. The scene is bathed in warm, dappled light, with soft dots of color creating depth and atmosphere. The background suggests a peaceful countryside setting, enhancing the contemplative mood. — ChatGPT’s idea of a Davar Sheyesh Lo Matirin

DeepSeek R1 (with search and DeepThink)

DeepSeek: 0/10 points

No mention of Rashi/Ran
General explanation is broadly in the right direction, though not entirely precise.
1. Chashuv is not the right svara for why it’s not batel
2. An egg on yom tov, not chametz (which is subject to a machloket) is the prototypical example of a davar she’yeish lo matirin.
3. Wrong siman in Shulchan Aruch.
Interesting svara in that tosafot using the word chefsah, but no such tosafot exists.

Quick Summary so far:

Claude wins, scoring 30%. Chat and DeepSeek get a 0. Doesn’t bode well for parts 2 and 3!

We’ll be back with the the 2nd and 3rd parts of the above question soon.

LLMOD

Discussion about this post

Ready for more?