form on a top of a colored shape background.


How to automate insurance claims processing using OCR and Impira AutoML

Extracting the relevant data from insurance claims is a costly, time-consuming, and painstaking process for claims operations teams in insurance companies. This guide describes how to set up Impira AutoML solution for your insurance claims, so you can automate claims processing and save time — in no time.


Using automated machine learning (AutoML) for insurance claims processing

In an insurance company, claims operations is a critical function. It serves as the central hub for intaking, processing, and paying out claims. For the insured customer, the claims department is often a frequent point of contact with an insurance company and is, therefore, a huge driver for overall customer satisfaction. According to an Accenture study, more than 30% of customers who endured a bad claims experience switched insurers within a year of the incident. 

Today, the two most common insurance claim forms are CMS-1500 and UB-04. Both are quite similar; even though the UB-04 is based on the CMS-1500, it is actually a variant of it. CMS-1500 forms are used for non-institutional healthcare facilities (e.g., private practices), while UB-04 forms are generally used in institutional healthcare facilities, such as hospitals. 

For many insurance companies, processing insurance claims is still a very manual process, whereby outsourced labor is manually reviewing and re-keying data from faxed, scanned, or emailed forms. Optical Character Recognition (OCR) technology is often used to facilitate that process. However, the results are often error-prone and still require correction by a human. 

Impira AutoML helps alleviate errors from OCR technology by allowing users to directly train and provide feedback to their own unique models through an easy-to-use interface. In just a few clicks, Impira AutoML can be trained to learn your CMS-1500 or UB-04 form — and securely, too, on Impira’s HIPAA-compliant platform. With Impira AutoML, you can automate your insurance claims processing in less than five minutes, and keep your insured customers happy from start to finish.

Business ROI 

Better than OCR technology alone, Impira AutoML reduces the amount of time and effort required to process insurance claims for claims operations teams. By empowering teams to work more efficiently, complex and high priority claims can be addressed with greater care, driving higher customer satisfaction, greater retention, and increased revenue. 

Impira AutoML can be applied to a wide variety of insurance use cases across the enterprise. Follow the steps below to see how easy it is to implement Impira's powerful technology in making processing claims simpler and faster.

Step 1: Sign up for a free Impira account

Head over to to create your free account with an email address. Our free plan includes 200 file units (pages) with no time limit.

screen capture of

Step 2: Create a new collection

Once you create an account in Impira, you’ll want to organize similar files into collections. Collections are used for organizing and grouping together files for which you share the same intent. For example, this could mean grouping files from which you want to extract a common set of data, grouping files that you need to keep organized together for easy retrieval and sharing, or grouping files that you wish to combine with other files or data. Click on the ‘plus’ symbol next to the word “collections” in the left-hand sidebar.

Screenshot of the Impira platform highlighting the plus sign for creating a new account.

In the dialogue box that appears, give your collection a meaningful name of your choice, such as “Insurance Claims.”

Screenshot of naming your collection

Step 3: Upload your insurance claims 

Impira offers several ways to ingest files, including manual upload via the web interface, programmatic ingestion using RESTful write APIs, or via integration with a storage system such as Amazon S3 or Dropbox.  Let’s start with manually uploading some forms to the “Insurance Claims” collection.

  1. First, click on the name of your newly created collection.
screenshot of the Impira platform with your named collection highlighted

  1. Then, click on ‘Upload Files’ and select whether you want to upload files from your computer. Download the sample documents to follow along in this guide, or you can upload your own insurance claims. 
  2. Once you download and unzip the sample files, navigate to the folder with the files, select them and click open.

  1. Your files will immediately begin processing. When complete, you will see that each file has its own row in the table, as shown below.
screenshot of Impira platform with your files uploaded.

Step 4: Train an AutoML model on your insurance claims

Extracting initial data from your claims

Let’s begin extracting the relevant data from your claims. 

  1. First, double-click on the first file to open the Impira mark-up interface. 
  2. Let’s start by extracting the “Patient’s Name” field. Click on the patient’s full name and highlight the entire name. 
  3. On the right-hand sidebar, let’s call this field “Patient’s Name” in the “Name” field. Select “Text extraction” for “Type” and “Text” for “Data”. In this mark-up interface:
  • “Type” refers to the action you want to take with your data. We can extract text from a field, extract a checkbox, manually input data, create our own function (expression), or join this field to another field from another document. The latter three types are ways to manipulate or connect data, whereas the first two types are ways to extract data using Impira’s AutoML.
  • “Data” refers to the data type you want to extract, such as Text, Number, or Date. 
  1. Click the “Add Field” button. 
  2. Close the mark-up interface by clicking “X” in the top-right corner, and you’ll see that we now have a new column in our table for “Patient’s Name”.
animation of the steps to extract data from a pdf

Notice the spinner in the column header. This indicates that the Impira AutoML is updating the model and is applying this learning to the remainder of the documents in your collection. In a few seconds, the “Patient’s Name” column has been updated with the extracted name from every CMS-1500 form. 

Revising your AutoML models

Notice also that some of the cells may be blank and that there is a colored indicator on the left of the cell. This is a quick visual indication of the underlying numerical confidence score for the prediction. 

Screenshot of your predictions with one missing.
  • A dotted red indicator with a flag means that the confidence score for the prediction needs to be reviewed.
  • A dashed green bar indicator means that the confidence score for the prediction is high.

Confirming and correcting values

We need to provide additional training to the model to boost the confidence levels. 

  1. Double-click on a red cell to reopen the mark-up interface. In the mark-up interface, you’ll see a box around the predicted text on the claim. 
  2. If the value is correct, simply click the “Confirm value” button on the right-hand sidebar. 
  3. If the value is incorrect or blank, redraw a box around the correct value.

Each time you accept or correct a predicted value, that information is fed back into the AutoML model for immediate retraining. As you validate predictions, you will start to see the colored confidence indicators change from red to green. 

Extracting additional data from your claims

Now, we can extract additional information from your claims by adding more fields. We can do this by repeating the above steps to add fields such as “Insured’s ID Number”, “Patient’s Address”, and so forth. 

As you add these additional fields, the number of columns in your table will grow.

Screenshot of your full table with multiple predictions

Step 5: Export your data

Download a CSV

Once you are satisfied with your extraction results (i.e. all the cells are displaying the green, high-confidence flag), you might choose to download your data as a CSV file.  To do that, expand the drop-down menu in the top right corner, and choose “Download all files records (CSV)”.


Programmatically access your data

Alternatively, you may want to programmatically access your extracted data.  Impira provides a robust set of RESTful APIs for both reading and writing data. Now that your AutoML model has been trained, any new CMS-1500 forms that are sent to the Impira platform will be automatically categorized and extracted. For more information about our APIs, read our support document or connect with our sales engineers.

Related resources

An illustration showing a financial space that contains a sheet with a pie chart, bar graph, and line graph, with money in front of it.

Unlock the data that’s stuck in your AR/AP invoices, contracts, expense reports, and paystubs. Impira automatically extracts key fields without any manual data entry, so you can instantly build an accurate database of your financial documents.

Text Button
Illustration of a creative space that contains a container with a paint brush, pen, and pencil, photo, and canvas with a shape drawn.

Effortlessly find the assets you need with the confidence that you can use them. Impira’s technology will automatically tag your assets, extract usage terms from contracts, and link everything together so you can focus on delivering world-class creative.

Text Button
An illustration of an IT setting containing a computer with email, security and a database.

Automatically process invoices, contracts, forms, expenses, and other documents to free up your colleagues to do their best work. We make it easy to create document extraction models that continue to improve as you provide more data and review results.

Text Button

Get started in minutes.

Already using Impira? Sign in.