form on a top of a colored shape background.


How to automate medical form processing using OCR and Impira AutoML

This guide walks through how to set up an AutoML solution for medical forms processing, including OCR data extraction and export, software, people, and processes.

Download sample files

Medical forms and automated machine learning (AutoML)

Inefficiencies in processing medical forms will, at best, frustrate patients, while really hurting your bottom line. A healthy dose of AutoML might be just what the doctor ordered. In this article, we discuss how to use Impira to automate the processing of your medical forms, which can be implemented in less than five minutes.

Factors to consider with an AI/ML-assisted medical form processing solution include:

  • Training (and retaining) your ML models. Whether you take a DIY approach, use pre-trained software, or a system such as Impira, your ML models need to be trained to know what to look for. You should consider not only the initial training effort, but also how you will refine the model over time as your forms change or new forms are encountered.
  • Keep a human in the loop. AI can make humans much more efficient at form processing, but your patients’ health is too important to delegate to software alone.
  • Obtaining actionable data. It’s not enough to extract data from your forms. You need to be able to extract (and possibly store) that data in a manner that advances your business processes.
  • HIPAA compliance. Your solution needs to adhere to regulations, like HIPAA compliance.

For your initial training you might be using a system with a template that matches your forms, in which case the training of the ML model was done for you in advance. While such an approach can save time up front, it might prove difficult to update should your form layout get altered, or the information you need to extract changes. Assuming you need to train the ML model, a traditional approach might require providing a data science team with a large number of training documents — hundreds, or perhaps even thousands, of these — for you to use in training your model, and a similar number each time you want to re-train or even tweak said model. Impira takes a different approach which not only lets you start seeing results right away, but also affords the flexibility needed to adapt to changing business requirements.

Keeping your bottom line healthy

You’re in the business of keeping your patients healthy, but inefficiencies in processing medical records can affect both the care you can provide and your profitability. Impira offers you flexibility in how you ingest your files, direct access to train the AutoML extraction models, real-time application of updated models across all of your documents, and several ways to extract your data to incorporate into your medical form processing workflow. The result of incorporating Impira into your medical record processing workflow will allow you to process more forms in less time, eliminating redundant data entry to free your staff up to better care for your patients.

Follow the steps below to walk through Impira’s HIPAA-compliant software-as-a-service (SaaS) technology.

Step 1: Ingesting your forms 

Once you create an account in Impira, start by creating a new collection.  A collection is the main feature within Impira for organizing and grouping files together. Collections are a lot like folders on your computer. Click on the "+" symbol next to the word “collections” in the left-hand sidebar.

This image shows how to create a new collection. 

In the dialogue box that appears, give your collection a meaningful name of your choice, such as “Medical Forms.”

Impira offers several ways to ingest files, including manual upload via the web interface, programmatic ingestion using RESTful write APIs, or via integration with a storage system such as Amazon S3 or Dropbox. Let’s start with manually uploading some forms to the ‘Medical Forms’ collection. Details on other ingestion methods are covered in other how-to guides.

First, click on ‘Upload Files’ then select whether you want to upload files from your computer, or from a Dropbox or Amazon S3 storage account.

This image shows the options for manually uploading files from your computer, Dropbox, or Amazon S3.

If your files are in Dropbox or Amazon S3, you’ll be prompted to provide credentials to the selected  storage services. Here is a set of sample documents which you can use to follow along in this example, or you can use this example as a guide and upload your own documents. Once you download and unzip the sample files, navigate to the folder with the files, select them and click upload.

This image shows the selection of files from a local computer for uploading into a collection.

With your files uploaded into the collection, you're ready to start extracting data from your forms.

This image shows forms uploaded and organized into a collection.

Step 2: Initial training of your ML models 

Impira takes a unique approach to training which doesn’t require large numbers of training documents and allows you to train your model directly. Impira AutoML technology, coupled with an intuitive user interface, allows your business users to train and update the models directly. Within your collection, you can double-click into one of your medical forms to begin extracting data.

This image shows the initial state of your collection, before any training of the ML models.

1. Add your first field

Let’s open the records-0.pdf file, and then in the document viewer panel on the left highlight the value in the box under “Account #” 

This image shows the Account # value highlighter in preparation for adding a new field.

2. Name and define the field

Let’s name this field "Account Number," and leave it as the default type of "Text Extraction" (we’ll discuss the other types later), and for Data, we’ll leave this as "Text." Even though the Account Number value contains only numerals, you're unlikely to perform mathematical functions on the account number, and as a text value, any leading zeros will be preserved.

This image shows the Add Field dialog box with all data complete for the creation of a new Account Number field.

3. That’s all it takes

Once you’ve added the Account Number field, you’ve just trained your first ML model — congratulations. You’ll see the panel on the right has been updated to show all fields associated with this collection (at this point there’s only one), and the specific values for the open document. 

This image shows what you’ll see after adding your first field.

4. Rinse and repeat for more fields

From here, you can go on and add more fields or you can close this window and see that Impira has applied your model against all files in the collection, and has extracted the Account Number from each of them. Since you’ll need more than just an account number, let’s extract a few more fields and then check our other documents.

Highlight ‘Roberts’ In the Patient name box, and create a field called ‘Patient Last Name’. 

This image shows the Add Field dialog box with all data complete for the creation of a new Patient Last Name field.

Now do the same for fields, you’ll name Patient First Name (highlighting Barry), SSN (highlighting 864 - 37 - 5912), Primary Insurer (highlighting Pacific Care) and Secondary Insurer (highlighting Eastern Care). In practice, you would continue this for each value you want to extract, policy numbers, dates, address values, diagnoses, etc. but, for the purpose of this instruction, let’s stop here. 

Image showing all extracted values for records-0.pdf

Step 3: Correcting data and revising ML models 

In order to improve the AutoML models’ training, or to correct any inaccurate values, you can double-click on any document and then edit or confirm the values Impira has predicted just as we did in the previous step. 

1. Edit any incorrect values

Review the values for records-0.pdf, in the right hand section. If any are incorrect, you can edit them by hovering over the value, which will bring up a dotted line showing where the value is located on your form. Click on the pencil icon at the far right to edit the value.

Image showing how to edit a field’s value when examining the form.

You can also extract check boxes, in which case you’ll want to create a new field for each possible option (e.g. one for a ‘yes’ box and one for a ‘no’ box). These can be easily combined into one field within your collection or as you export your data. 

As you continue highlighting values and creating fields, or correcting any values, Impira is creating micro-models for each field behind the scenes, taking into account not only its absolute position on the form, but also proximity to certain anchor text values. Additionally, Impira applies these models in real time against all files in the collection, and provides a visual representation of the confidence we have in the accuracy of the extracted data. With all of the correct values in place for this initial document, it’s time to see what the AutoML models you’ve just trained have extracted from the other documents. 

2. Check on the results from other documents

Once you’ve finished adding fields as described above, click on the ‘X’ in the upper right to return the to table view of your documents. You note that there are now columns for each of the fields you created, populated with extracted values for each document you uploaded. 

Values entered manually, such as those in records-0.pdf are shown with a solid black bar to the left of the value, and those that Impira AutoML has predicted are shown with a dashed green bar when the accuracy confidence is high, and a dotted orange bar with a flag for those which should be reviewed manually. 

This composite image shows how data extraction confidence is presented to users.

If you want to edit or confirm any value, just double-click on it and the file will open for you to make the edits in the same manner as previously described.

With each interaction, whether confirming or editing, you are re-training the models, and your confirmed values will be highlighted in black, just as those you initially extracted are. With just two documents manually reviewed, you’ll already start seeing improvements in the confidence of Impira’s predictions. 

Image showing improved results after manual review of only two documents.

Your medical records are central to the care and well being of your patients. As such, ensuring you’ve captured the data from your forms accurately is paramount. Today you might be doing this by having humans re-key information from forms into your business systems. It’s critical that you retain a ‘human in the loop’ to ensure the ML-extracted data is accurate. Using Impira, not only is the number of forms that can be processed/reviewed by an individual greatly multiplied, but with each validation or correction of data, Impira’s unique feedback loop updates the ML models and applies the updates to all files in your collection in real time. While meaningful results can be extracted after training on just a document or two, Impira’s results will become increasingly more accurate over time with a small team reviewing/revising the extracted data.

Step 4: Getting your data out in the form you need 

Once you’ve extracted the data from your forms and feel confident in the results obtained, you may need to get the data into one or more of your downstream systems so that you can act on the extracted data. This can be as simple as a few clicks to get a CSV file of your data, or can be customized to your needs either via Impira’s API or by embedding some logic directly in Impira. Let’s take a look at the options:

Option 1: Download your data in two clicks

Click 1: Click the Download button in the top right corner of your screen.

Click 2: Click CSV. 

Image showing the Download options.

That’s all there is to it. You’ve just downloaded all of the data for each of the records in your collection and you can open the .csv file in Excel or Google sheets, or ingest it into one of your downstream business systems.

While it is possible that the extracted data conforms to your needs, most likely there’s some "massaging" that might be needed. In most systems, this would require you to export your data and augment or modify it so that it conforms to the requirements of your downstream systems. Impira makes it easy for you to adapt your data to the needs of downstream systems prior to exporting rather than necessitating changes to those systems. This can be done using Impira’s Read API or within the application itself.

Option 2: Exporting via API

First, you’ll need to create an API token to authenticate your API requests. Impira’s use of token-authenticated API requests protects your data from unauthenticated access. You should use unique tokens for each integration you set up so that, should you need to block access at some later date, you can remove a single token, leaving all other integrations intact. To create a token, click on the gear icon in the top menu and then click the plus icon to the right of your token list. 

Image showing how to create a new API token.

Give your token a name, and then you’ll be presented with the token, which you can copy to your clipboard.

Image showing newly created API token.

Now you’ll need to construct an IQL query, which you’ll pass, along with your token, as URL parameters to an HTTP GET request using the the endpoint<your_org_name>/api/v2/iql?query=foo&token=bar</your_org_name> where query= will be your URL encoded IQL query, and token= will be your API token you created from within Impira.

The easiest way to create a URL encoded IQL query is by using Impira’s IQL playground. To get there, click on the gear icon, and then select API.

Image showing links to your account’s unique API documentation and Impira’s IQL playground.

In the IQL playground, you can construct a query to extract just the Patient Name, Primary Insurer and SSN. To do so, click on the ‘IQL Playground’ button and enter the following query in the search bar at the top:

@`file_collections::b5dd1ef4170b62fb` [`Patient Name`, `SSN`, 'Primary Insurer'] 

Image showing results of an IQL query in the IQL Playground.

From here, you could download the results as a CSV file or view the API response, but you’ll probably want to make this API call directly, rather than from the IQL Playground. In that case, you’ll construct the HTTP GET request discussed above as follows:<your_org_name>/api/v2/data/%60file_collections%3A%3Ab5dd1ef4170b62fb%60?fields=SSN,Patient%20Name,Primary%20Insurer&token=foo</your_org_name>

Where you would replace <your_org_name></your_org_name> with your Impira organization name, file_collections%3A%3Ab5dd1ef4170b62fb with your collection ID (obtained from your API documentation) and foo with your API token created earlier.

Image showing partial results of API query for specific fields.

Option 3: Modifying your data within Impira prior to exporting

In the first option, the CSV file contained separate fields for ‘Patient First Name’ and ‘Patient Last Name’, as that’s how they were extracted from your forms. However, you might need to export a single ‘Patient Name’ field. To do so, you would create a new field in your collection by clicking the ‘plus icon’ at the right of your extracted fields. You need to name the new field, select ‘function’ as the field type, and then enter a function, such as:

 concat(`Patient First Name`," ",`Patient Last Name`). 

A full list of available functions for manipulating strings or performing calculations on numbers can be found in Impira’s support documentation. 

This image shows how to add a computed field.
This image shows the resulting field, computed from the function entered in the step above.

Now you can extract the data as you did in Option 1 to get a CSV file which includes ‘Patient Name.’  You can specify which fields to export. Since you’ve just created a ‘Patient Name’ field, you can de-select the ‘Patient First Name’ and ‘Patient Last Name’ fields if those are not needed by clicking on the Fields button selecting the fields to export in the resulting dropdown menu.

This image shows the selection of fields to export.

Altering the values extracted from your forms might not be enough to get your data exactly as you need it exported. Perhaps you need an internal code for the Insurer, which your patient would not know. In this example, the sample forms have insurers which include Eastern Care, Northern Care, Pacific Care, Southern Care, and Western Care. In this example, a simple .csv file with the insurer name and insurer code can be uploaded and opened as a data set, creating a new Collection, and following the steps below.

Example .csv file containing data to be joined with the Medical Forms collection.

First, upload the Insurer Codes.csv file to Impira.

Within the list of All files, double click on this .csv file.

Next, click on the ‘Open file as Dataset’ button, which will create a new collection with the data from the .csv file.

An uploaded .csv file can be opened as a Dataset, creating a new collection.

To connect this data to your medical forms, navigate back to your Medical Forms collection, where we’ll create a join field.

  1. Add a new field by clicking the ‘plus’ icon to the right of your data columns.
  2. Name your new field and select ‘Join’ as the field type.
  3. Select the field in the Medical Forms collection which is common to both collections, such as ‘Primary Insurer’ or ‘Secondary Insurer.’
  4. Next, select the collection which was created in Step 2 above.
  5. Select the field in this Dataset collection which is in common with your Medical Forms field.
  6. Upon clicking ‘Create’ you’ll see a new field with the same value as in the linked column from the Medical Forms collection, but including a linked symbol.


Diagram showing how to create a Join field.

Diagram showing the results of creating a Join field.

With your join field in place, you can double click on any value to see the related values connected by the join field. In this example, the ‘Inse Code’ is now available for the Primary Insurer. You could add fields to get the code for the secondary and tertiary insurers, or add more data sets to get other related data from your Patient Management System, or full descriptions of diagnoses from diagnostic codes, etc.

If you click on the icon at the far right of the Impira interface, you can change from a table view to the JSON view where you can see the structure of your data.

Image showing how to change from table view to JSON view.

Image showing Impira’s JSON view and the structure of a linked Dataset.

Related resources

An illustration showing a financial space that contains a sheet with a pie chart, bar graph, and line graph, with money in front of it.

Unlock the data that’s stuck in your AR/AP invoices, contracts, expense reports, and paystubs. Impira automatically extracts key fields without any manual data entry, so you can instantly build an accurate database of your financial documents.

Text Button
Illustration of a creative space that contains a container with a paint brush, pen, and pencil, photo, and canvas with a shape drawn.

Effortlessly find the assets you need with the confidence that you can use them. Impira’s technology will automatically tag your assets, extract usage terms from contracts, and link everything together so you can focus on delivering world-class creative.

Text Button
An illustration of an IT setting containing a computer with email, security and a database.

Automatically process invoices, contracts, forms, expenses, and other documents to free up your colleagues to do their best work. We make it easy to create document extraction models that continue to improve as you provide more data and review results.

Text Button

Get started in minutes.

Already using Impira? Sign in.