Join us October 22nd to hear Coglate-Palmolive, IDC, and Sequoia Capital discuss moving to a digital-first environment
Learn more
Woman asks a question to a PDF business document using flat illustration style.

“Hey machine, what’s my invoice total?”

Introducing DocQuery, an open source tool that lets you ask questions about documents.

Introducing DocQuery


Today, we're bringing you closer to a future where AI can answer questions about documents. We're excited to announce a new open source query engine.

Imagine you could say something like:

     “Hey Siri, what is the offering amount for that terms sheet I just received?”
     “Hey Google, what is the invoice total for this email attachment?”

While you can’t do that just yet, but DocQuery brings us closer to a future where you can.

DocQuery lets you ask a wide variety of questions about semi-structured (like invoices) and unstructured documents (like contracts) using large language models (LLMs).

Dear non-nerds, we built in a Super Nerd Translator for the more technical bits.
Dear nerds, please feel free to ignore these parts.
Some documents are structured, like a form you fill out at the doctor’s office. Some are less structured, like a home purchasing contract (usually containing paragraphs of text). 

We’ve used a bunch of examples to make it possible to ask questions about both types.

Playing with the DocQuery demo on Hugging Face 🤗

Hugging Face is a website where people use artificial intelligence to solve cool problems and share it with the world.

We’ve provided some easy examples to test it out. Or you can get started right away with your own examples.

  1. Upload your document.
  2. Ask a question.
  3. Get an answer.

If you don’t like the answers you get, you can try to tweak the question. For example, use more specific pronouns (“who” to refer to a person) or provide clues (“options include…“). You can also try using the alternate model we’ve included, Donut.

Using DocQuery in Terminal

DocQuery is also an MIT-licensed library and command-line tool. Check out our repo and learn how to use DocQuery right in your Terminal. You can also use these to try one of the dozens of other pre-trained question and answering models published on HuggingFace.

Some people will want to do more than just play with our demo on Hugging Face. They can get a look behind the scenes, and play with it on their computer, and even use it in their own code if they like.

How does DocQuery work?

Under the hood, DocQuery uses a pre-trained zero-shot language model that has been fine-tuned for question/answering. In the world of document processing, most models have to be pre-trained on a certain document type with hundreds or thousands of examples, but question/answering disrupts this constraint, allowing you to work with any document by simply encoding its schema into a set of questions.

The way that machine learning usually works is that you give a machine a bunch of examples and then it eventually says, “Ah, I think I understand.” At this point, it has been “trained,” and it can then make “predictions” based on what it has learned.

When we think about applying this to documents, “training” is a person saying, “Hey, machine, here is the Total Due value on this document.” And then going to the next document and saying, “This is where Total Due is on this one.” And eventually, the machine figures out, “Total Due is usually found on this part of the page, near these words.”

What's cool about DocQuery is that you don't need to do any prep work to start getting answers. This is called "zero-shot learning." So you can find info like "Total Due" without ever pointing out where that info is in your document.

We’ve already trained the machine on those big datasets for you, so it comes “pre-trained.” This way, you can get your answers right away after asking a question without having to tell the machine anything at all about the document.

The model is trained using a combination of SQuAD2.0 and DocVQA which make it particularly well suited for complex visual question answering tasks on a wide variety of documents. The underlying model is also published on Hugging Face as impira/layoutlm-document-qa which you can access directly.

So, someone did the hard work for you. Actually, a lot of people did the hard work for you. A team at Stanford started a project called SQuAD, which took 500+ articles from Wikipedia and crowdsourced humans to write the answers for 100,000+ questions about the articles. For example:

      Question: In what country is Normandy located?
      Answer: France.

This provided a dataset.
DocVQA is a similar project, but for documents. We used both datasets to acquire “a bunch of examples.”

The model is based on LayoutLM. We’ve done extensive testing with LayoutLM and consistently seen state-of-the-art results from LayoutLMv1 for semi-structured and unstructured documents. It works particularly well on PDF documents with embedded text and scanned images with high quality OCR results. Due to its simple architecture, runtime performance is also exceptional.

We had the machine look at all these examples: The articles/documents and their corresponding questions and answers. Then, we gave the machine a big digital “brain” called “LayoutLM,” that knows how to interpret language and the visual relationships of words on a page. We then asked the machine to use its big digital “brain” to learn by looking at: 1. The relationship of these words to each other, and 2. where these words appear on the page. 

Bring in the models

We encourage you to try other models. Vision Encoder/Decoder models like Donut circumvent the OCR process altogether, which makes them better at visual reasoning and more resilient to OCR errors. Our hope is that DocQuery makes it incredibly easy for you to try various models, and ultimately pick the right tool for the problem you are solving.

We made it so you can try out other different digital “brains” like Donut. Donut doesn’t use Optical Character Recognition (OCR), which converts images of words into text, a task that is sometimes a little tricky and can make mistakes.

If you find yourself wondering how to achieve higher accuracy, work with more file types, teach the model with your own data, have a human-in-the-loop workflow, or query the data you're extracting, then do not fear — you’re running into the same challenges that almost every organization does while putting document AI into production. 

But wait, there's more!

The Impira platform is designed to solve these problems in an easy and intuitive way. Impira comes with question answering models that are additionally trained on proprietary datasets and can achieve 95%+ accuracy for most use cases out-of-the-box, like invoice processing. It also has an intuitive UI that enables subject matter experts to label and improve the models, as well as an API that makes integration a breeze. Try Impira for free or reach out to us for more details.

Impira trained the model further on actual documents, creating a proprietary dataset. We can’t provide that on an open source platform like Hugging Face, so create a free Impira account to try it out. It’s a bit smarter, and it also lets you provide more examples in case it isn’t totally correct the first time. We hope “zero shots” have to be taken, but Impira easily lets you take more shots to help train the machine if you need to.

Freeing data frees people

We believe that freeing the data in documents unlocks new levels of human productivity. We’re working hard to create a future where machines release us from the burden of scanning, parsing, and extracting data. To leap forward, we need to embrace an open future where more engineers and scientists can contribute to this problem.

That’s why we’re releasing DocQuery and why we plan to continue innovating and releasing these technologies in the open. Next up, we’re planning to support document classification and tabular data extraction through DocQuery. If you're interested in getting involved in these efforts, please reach out to us! We welcome feedback, requests, and contributions to help achieve this vision.

We really hope that a bunch of nerds and non-nerds alike will join us in these efforts, so that you don’t have to take time extracting data from documents, and can spend your time doing bigger and better things.

Subscribe to Impira's blogStay up to date with all things Impira, automation, document processing, and industry best practices.