Get our latest book on "Top 10 artificial intelligence myths."
Download

Last updated

September 30, 2021

Table of contents

Getting started: Data extraction

Start extracting data from your documents in four steps. 

Step 1: Create a collection

A collection is like a folder that holds a group of documents with similar layouts and data. Each collection is custom trained to extract the same set of structured data you want from each of those files.

  1. Go to the left sidebar and click the + next to Collections.
  2. Name your collection (e.g., Invoices, purchase orders, etc).


Step 2: Upload files

  1. Open your new collection by choosing it from the left sidebar. 
  2. Upload your files into Impira with these options:

Your uploaded files will be accessible in this collection and in All Files, located in the left sidebar.

Step 3: Select the data you want to extract

With Impira, you can extract:

  • Text, numbers, and dates
  • Checkboxes
  • Data from tables
Extracting text, numbers, and dates

  1. While in your collection, hover over any file name and select Open to see File view.



  1. Click Add field in the upper right corner. 
  2. Use your mouse to highlight the data value you want on your document. 
  3. Name this field (e.g., “First name”) and choose “Text,” “Number,” or “Date.” 
  4. Choose Create field

By creating an extraction field with a highlighted data value, you’ve given Impira’s AutoML an example to learn from. Impira will immediately start to extract matching values from the other documents within your collection.


Extracting checkboxes
Choose individual checkbox-style items to extract.

  1. Open a file in your collection to enter File view.
  2. Click Add field in the upper right corner.
  3. Choose Checkbox and a bounding box will be automatically placed on your document.
  4. Drag the bounding box to fit over your desired checkbox. 
  5. Name this field (e.g., Bronze) and choose Create field


By creating an extraction field with a highlighted checkbox, you’ve given Impira’s AutoML an example to learn from. Impira will immediately start to extract the matching checkboxes from the other documents within your collection.

Extracting data from tables
Two types of tables within a document.


On top of being able to extract data in the form of text, numbers, and checkboxes, Impira allows you to extract data out of tables within your document. 

See our extended table extraction documentation for details.

Extracting data from tables is a beta feature. We’d love for you to join our table extraction beta program, take it through the paces, and give us any feedback. Contact feedback@impira.com to access this new feature.

Step 4: Review our work

Jump to Review Workflow article. 

After you’ve created all the fields you need, close File view and see that Impira went ahead and extracted the same fields from the rest of the files within that collection and placed it in a table. 

Let’s review Impira’s work to make sure your machine learning models are trained up and in tip-top shape. Reviewing predictions helps boost Impira’s confidence for each extracted value. These confidence scores are marked by red (review recommended), green (high confidence), and black markers (manual input by user) on each cell, as depicted in the graphic below. Read more about machine learning confidence at Impira.

  1. Click the Review X predictions button in the top right corner.



  1. Go down the queue and ensure the bounding box for reach prediction is over the correct value. Correct any errors by dragging the box to the correct value and check that the value is right.
  2. Choose Confirm value and highlighted area for each confirmed or corrected value. 

Impira keeps learning and reprocessing predictions as you go through your Review queue. Impira can see you confirming or correcting predictions and applies that learning to other values in the queue in real time and will clear them automatically from the queue if the new prediction has a high confidence.


Stay in the loop

Get our Release Notes hot off the press, straight into your inbox.

Need more help?

Talk to someone