Join us October 22nd to hear Coglate-Palmolive, IDC, and Sequoia Capital discuss moving to a digital-first environment
Learn more

Last updated

October 28, 2021

Table of contents

Machine learning confidence

Impira’s automated machine learning (AutoML) technology learns from your interactions to automate your data entry workflow. Every value you manually extract is another example from which Impira’s AutoML learns. This kind of machine learning far outperforms rigid template-based approaches for the kinds of variations and imperfections in documents that we see in the real world.

However, there can be cases where getting the right prediction can be difficult. It could be for a new document that you’ve never uploaded before or a document that is heavily rotated or wrinkled. In addition to producing the best possible predictions, we also seek to ensure that we properly communicate our best estimate of any uncertainty around those predictions. 

How is confidence represented in Impira?

We show our estimate of confidence for each value in a machine learning field through the following visual representations:

Manual confidence: If a user manually extracts or corrects a value, we assign 100% confidence to that value. In the table view, these manually extracted values are denoted by a thick black border on the left side of the cell.

High confidence: If a machine learning model is highly confident in its prediction, it is denoted by a dashed green border on the left side of the cell.

Review Recommended: If a machine learning model is moderately confident in its prediction, it is denoted by a dashed red border along with an red triangle on the left side of the cell.

Blank prediction: If the machine learning model is not able to identify the value in the record, the cell will be blank and either have the high confidence or review recommended indicator. Not all blank predictions are inherently incorrect and some records may not contain the value in question.

In addition to the visual representation of confidence, you can also access the numeric confidence score itself by opening a machine learning field in the JSON view or via the API. These score scores will always range between 0 and 1 and manually extracted values always have a score of 1.

The machine learning confidence score available via the API and JSON view is currently quantized, meaning that the values only take the values of 0.0, 0.25, 0.5, 0.75, and 1.0. Predictions with a confidence score of 0.75 or higher have the “high confidence” designation while predictions with a confidence score below 0.75 will receive the “review recommended” designation. As part of a future release, users will be able to access more granular confidence scores as well as manually set the thresholds that determine the visual confidence indicators.

The confidence score for a text extraction field visualized in the JSON view

What goes into confidence?

Confidence represents the machine learning model’s estimated probability that the extracted value is correct given the documents and labels you’ve provided. Impira’s different machine learning models take into account different factors when calculating uncertainty. 

Text extraction

The confidence score for text extraction represents the probability that the model has extracted the correct set of text in the document. However, it doesn’t measure the confidence that the OCR algorithm has correctly read the text. That OCR confidence score is accessible using IQL. For more details on how to access that data via IQL, contact

Checkbox extraction

The confidence score for the checkbox model represents the probability that the checkbox is in its predicted state (e.g., checked, unchecked, or not present). The score does not currently estimate how likely it is that the model has identified the correct checkbox. For more details, contact

How can you increase confidence?

Impira’s AutoML learns from each and every manual extraction that you do. The more extractions that you do, the more accurate and the more confident the machine learning will become. Users get the best results when they consistently verify or correct all of their review recommended predictions. Each verification that you do will improve the model and decrease the amount of files you need to review in the future.

How can you query for confidence and processing status through IQL?

Impira exposes a few fields per record in a collection that allow you to inspect and query by the processing status and confidence of fields in a record:

  • `File.IsPreprocessed` is true when a file has been fully preprocessed, including loading and saving the file, analyzing its contents using OCR, and producing image thumbnails.
  • `__system.IsProcessed` is true when all of the machine learning fields have completed processing for a record.
  • `__system.IsConfident` is true when all of the machine learning fields are high confidence for a record.

For example, to query for all fully processed files, you can run the following IQL query:

File.IsPreprocessed=true and __system.IsProcessed=true

and to query for all confident files, you can run the following IQL query:


Stay in the loop

Get our Release Notes hot off the press, straight into your inbox.

Need more help?

Talk to someone