New AI technology hopes to change everything we know about Jewish texts Prof. Moshe Koppel finds a new way to read Ancient Hebrew text through a groundbreaking AI technology that will be launched this week. By ZVIKA KLEIN
Thousands of Jewish texts and books have been printed for hundreds of years, but many Jews – even Hebrew speakers – have difficulty in reading them. A groundbreaking AI technology that will be launched this week hopes to enable anyone who speaks Hebrew to be able to read and learn these valuable and important texts. This new technology is called Dicta Maivin (dictation expert). It's a new addition and app of the Dicta organization that makes rabbinic literature accessible by automatically vocalizing and punctuating it, opening abbreviations and identifying source texts. This groundbreaking new technology will soon be available to anyone interested in Jewish texts. "You can choose a book from the Dicta library and see it in processed form or upload any rabbinic text and Maivin will process it automatically in real-time," said its founder Prof. Moshe Koppel, a fascinating Israeli-American computer scientist and Talmud scholar – and an activist promoting conservative views in Israel. "You can choose a book from the Dicta library and see it in processed form or upload any rabbinic text and Maivin will process it automatically in real-time," Dicta Maivin founder Prof. Moshe Koppel Dicta applies cutting-edge machine learning and natural language processing tools to the analysis of Hebrew texts. "Our objective is to remove the drudgery from the study of classical and modern Hebrew texts to allow researchers to focus on the deeper questions," its website stated. World's smallest Torah (credit: NATIONAL LIBRARY OF ISRAEL) Dicta may sound like a startup tech company, but it's actually a non-profit organization that provides its products at no charge for the benefit of the public. How it's used "The idea is to use AI (artificial intelligence) with cutting edge technology for processing Jewish or Hebrew texts," Koppel told The Jerusalem Post during an interview at his home in Efrat. He shared, for the first time with any media outlet, that the flagship product that he's been working on for the last five years is "just about ready," and that is will be getting its "unveiling" at the 18th World Congress of Jewish Studies, which will take place at The Hebrew University of Jerusalem next week. Koppel gave the example of a book that can be scanned by Dicta Maivin and allow access of the text to a larger audience because of its many features. "Let's say you have a book that is written in this old Rashi script" – a typeface for Hebrew letters based on 15th-century Sephardic handwriting, very popular with Jewish books that were published in the past several hundred years – "it doesn't have any nekudot (diacritical signs used to represent vowels or distinguish between alternative pronunciations of letters in Hebrew), it doesn't have punctuation and it probably even has mistakes, because the printers back then were a bit choppy," he explained. Regarding references, "it has a million of [them], but it doesn't tell you where the references are." Koppel said that many times, in Jewish texts, one of the rabbis will write, "as the Ramban said," but it won't say exactly where he said or wrote it. "Rabbis could be quoting Talmud in their books without even telling you where it is; there's no attribution." Koppel picked up his cell phone and displayed exactly how the technology works. "What we've done is make it so that you can take your phone and take a picture of the page, and you'll get the page back with the text that has already been digitized," he said enthusiastically. "It's not a picture anymore; it's gone through optical character recognition [OCR]; the text has been corrected for mistakes and it has become more accessible in so many ways. You could punctuate this text; you could put in the nekudot. Anyone of the Rashei Tevot [Hebrew abbreviations] can be explained. You just put your cursor on top of it and it'll just show you what it stands for." Both ancient and modern Jewish texts have many Hebrew abbreviations, which can make learning them very difficult for someone who doesn't have a broad enough education or understanding. Koppel has said that many people decide to quit learning such texts since, even though they speak Hebrew, they don't know what all of these abbreviations mean. The app developer then showed how the technology connected this newly scanned text to other ones. A sentence that said "As the sages said," without any attribution will show you which one of the sages said or wrote this saying or quote while just pressing the text online. "Maivin creates footnotes for all of the texts it scans," Koppel kvelled, explaining that "the app basically recreates a scientific edition of this old Jewish book." A delightful addition to Dicta's Maivin app is that Rashi Script is transformed to a new font of Hebrew letters that resemble the ancient script, but anyone who knows how to read Hebrew will be able to read it. "My daughter-in-law is a font inventor," Koppel said, smiling. "We commissioned a font from her: for people who like the Rashi font, but have difficulty understanding it. It's an adjusted Rashi font that anybody can read. "Right now we have zillions of books in the library," he said. "I'm not waiting for you to take a picture of these books – we're doing it on our own. In less than an hour, we can scan a whole book and do all the processing." How it works Asked how much of Maivin is automatic and how much of it needs human intervention, Koppel responded that it is almost completely automatic. "There's a very minor part of the work that at the moment needs intervention," he revealed. Basically, if the technology scans a word that doesn't exist in Hebrew or is rarely used in all of the other Jewish texts it has already scanned, it will go for a second round. Then they use a new technology in AI called BERT, Koppel told the Post. "The way BERT works is you give it a context, block out one word and then tell it: Guess what this word should be! It then gives you the order of the probabilities of the words that should go there." "When we're not sure about a word, we use BERT – and we also train it, since it is a machine learning technology, so you give it material and it learns during this training," he said. "We gave BERT all of the rabbinic literature we already scanned, and said 'okay, guess what this word should be in a rabbinic text." Human intervention comes, according to Koppel, in only 1% of the situations, when BERT gives a list of probable words. "We give the text to basically anyone who knows how to read a book, preferably someone with a yeshiva background. They don't have to be a world expert." "Everything we do is 100% free," Koppel shared, revealing that Dicta is totally donor-supported. Who is Moshe Koppel? Kopple grew up in New York, and after high school studied at Yeshivat Har Etzion in Alon Shvut. He went back to complete his doctorate in mathematics at New York University. Before moving back to Israel, he did his post-doctorate at Princeton. For dozens of years, Koppel was a member of the Department of Computer Science at Bar-Ilan University. But most Israelis would probably know of him or of his work as founder of the Kohelet Policy Forum, a Jerusalem-based conservative-libertarian think tank, funded by a US donor whose name is kept secret. He has been mentioned many times by Israeli news outlets, both liberal and conservative, as one of the most powerful and influential people in the country. According to Kohelet's site, the forum "strives to secure Israel's future as the nation-state of the Jewish people, to strengthen representative democracy and to broaden individual liberty and free-market principles in Israel." Maivin is ready to launch, Koppel said. "It's in our lab and we're refining some of the tools that are better than others. Most of the functions are working in the best way possible. The punctuation aspect of the app at this point is a hack; it's not the final product that we would want. The abbreviation feature, I would say, is 90% accurate, which is not good enough. But we know how to improve it and it just takes time." Most of the features are already online as well as a new app, and Koppel stressed that in the next few days, some of the new features will be uploaded to the site. So for people who have been looking for a better Jewish text reading experience, Dicta Maivin is here.
|
No comments:
Post a Comment