Impira’s automated machine learning (AutoML) technology learns from your interactions to automate your data entry workflow. Every value you manually extract is another example from which Impira’s AutoML learns. This kind of machine learning far outperforms rigid template-based approaches for the kinds of variations and imperfections in documents that we see in the real world.
However, there can be cases where getting the right prediction can be difficult. It could be for a new document that you’ve never uploaded before or a document that is heavily rotated or wrinkled. In addition to producing the best possible predictions, we also seek to ensure that we properly communicate our best estimate of any uncertainty around those predictions.
How is confidence represented in Impira?
We show our estimate of confidence for each value in a machine learning field through the following visual representations:
Manual confidence: If a user manually extracts or corrects a value, we assign 100% confidence to that value. In the table view, these manually extracted values are denoted by a thick black border on the left side of the cell.
High confidence: If a machine learning model is highly confident in its prediction, it is denoted by a dashed green border on the left side of the cell.
Review Recommended: If a machine learning model is moderately confident in its prediction, it is denoted by a dashed red border along with an red triangle on the left side of the cell.
Blank prediction: If the machine learning model is not able to identify the value in the record, the cell will be blank and either have the high confidence or review recommended indicator. Not all blank predictions are inherently incorrect and some records may not contain the value in question.
In addition to the visual representation of confidence, you can also access the numeric confidence score itself by opening a machine learning field in the JSON view or via the API. These score scores will always range between 0 and 1 and manually extracted values always have a score of 1.
What goes into confidence?
Confidence represents the machine learning model’s estimated probability that the extracted value is correct given the documents and labels you’ve provided. Impira’s different machine learning models take into account different factors when calculating uncertainty.
The confidence score for text extraction represents the probability that the model has extracted the correct set of text in the document. However, it doesn’t measure the confidence that the OCR algorithm has correctly read the text. That OCR confidence score is accessible using IQL. For more details on how to access that data via IQL, contact email@example.com.
The confidence score for the checkbox model represents the probability that the checkbox is in its predicted state (e.g., checked, unchecked, or not present). The score does not currently estimate how likely it is that the model has identified the correct checkbox. For more details, contact firstname.lastname@example.org.
How can you increase confidence?
Impira’s AutoML learns from each and every manual extraction that you do. The more extractions that you do, the more accurate and the more confident the machine learning will become. Users get the best results when they consistently verify or correct all of their review recommended predictions. Each verification that you do will improve the model and decrease the amount of files you need to review in the future.