![]() |
Screenshot from the Nougat Research Paper |
The recent breakthroughs in the field of Artificial Intelligence, and its sub-fields, such as Natural Language Processing, Natural Language Generation, Computer Vision, and so on, have quickly gained a lot of popularity due to their vast application cases. The field of optical character recognition (OCR) is well-established and heavily researched in computer vision. It has a variety of applications, including document digitalization, handwriting recognition, and scene text identification. One area of OCR that has gotten a lot of attention in academic studies is the recognition of mathematical expressions.
One of the most popular formats for scientific knowledge, which is frequently kept in books or published in scholarly publications, is the Portable Document Format (PDF). PDFs are often used for document delivery since they are the second most popular data format on the internet, accounting for 2.4% of the data. Despite their ubiquitous use, PDF files can be challenging to extract information from, especially when working with more specialized content like scientific research publications. In particular, the semantic information of mathematical statements is typically lost when these publications are converted to PDF format.
A group of researchers from Meta AI have developed a remedy called Nougat, which stands for "Neural Optical Understanding for Academic Documents," to fix the issues. Nougat is a Visual Transformer model for Optical Character Recognition (OCR) in scientific publications. Its objective is to convert these files into a markup language so that they can be machine-readable and more accessible.
The team has also created a brand-new dataset of scholarly publications to demonstrate the effectiveness of the methodology. This approach provides a workable solution for improving digital era access to scientific knowledge. It bridges the gap between language that humans can easily process and evaluate and text that computers can. Nougat makes it easier for researchers, teachers, and anybody else interested in scientific material to access and work with scientific documents. Nougat is essentially a transformer-based approach created to turn PDF and other image-based document page pictures into structured markup text.
The group has listed their major contributions as follows:
Release of a Pre-trained Model: The group has developed a pre-trained model that can convert PDFs into a basic markup language. The research community and anyone else can obtain this pre-trained model and the associated code on GitHub.
Pipeline for Dataset Creation: The study describes a method for creating datasets that pair PDF documents with the corresponding source code. The Nougat model must be tested and refined using this dataset development technique, which may also be valuable for the next document analysis studies and applications.
Dependency on the Page's Image Only: Nougat's ability to run only on the Page's Image is one of its unique features. Since the original papers are not always available in digital text formats, it become a versatile tool for extracting content from a range of sources. Both scanned papers and books can be processed by it.
Visit the Paper and Github for more details.
All the credit for this research belongs to the researchers who worked on this project.
Also, make sure to join our AI SubReddit, Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, awesome AI projects, AI guides/tutorial, Best AI tools, and more.