March 26, 2021
This guide walks through how to set up an AutoML solution for medical forms processing, including OCR data extraction and export, software, people, and processes.
Inefficiencies in processing medical forms will, at best, frustrate patients, while really hurting your bottom line. A healthy dose of AutoML might be just what the doctor ordered. In this article, we discuss how to use Impira to automate the processing of your medical forms, which can be implemented in less than five minutes.
Factors to consider with an AI/ML-assisted medical form processing solution include:
For your initial training you might be using a system with a template that matches your forms, in which case the training of the ML model was done for you in advance. While such an approach can save time up front, it might prove difficult to update should your form layout get altered, or the information you need to extract changes. Assuming you need to train the ML model, a traditional approach might require providing a data science team with a large number of training documents — hundreds, or perhaps even thousands, of these — for you to use in training your model, and a similar number each time you want to re-train or even tweak said model. Impira takes a different approach which not only lets you start seeing results right away, but also affords the flexibility needed to adapt to changing business requirements.
You’re in the business of keeping your patients healthy, but inefficiencies in processing medical records can affect both the care you can provide and your profitability. Impira offers you flexibility in how you ingest your files, direct access to train the AutoML extraction models, real-time application of updated models across all of your documents, and several ways to extract your data to incorporate into your medical form processing workflow. The result of incorporating Impira into your medical record processing workflow will allow you to process more forms in less time, eliminating redundant data entry to free your staff up to better care for your patients.
Follow the steps below to walk through Impira’s HIPAA-compliant software-as-a-service (SaaS) technology.
Once you create an account in Impira, start by creating a new collection. A collection is the main feature within Impira for organizing and grouping files together. Collections are a lot like folders on your computer. Click on the "+" symbol next to the word “collections” in the left-hand sidebar.
In the dialogue box that appears, give your collection a meaningful name of your choice, such as “Medical Forms.”
Impira offers several ways to ingest files, including manual upload via the web interface, programmatic ingestion using RESTful write APIs, or via integration with a storage system such as Amazon S3 or Dropbox. Let’s start with manually uploading some forms to the ‘Medical Forms’ collection. Details on other ingestion methods are covered in other how-to guides.
First, click on ‘Upload Files’ then select whether you want to upload files from your computer, or from a Dropbox or Amazon S3 storage account.
If your files are in Dropbox or Amazon S3, you’ll be prompted to provide credentials to the selected storage services. Here is a set of sample documents which you can use to follow along in this example, or you can use this example as a guide and upload your own documents. Once you download and unzip the sample files, navigate to the folder with the files, select them and click upload.
With your files uploaded into the collection, you're ready to start extracting data from your forms.
Impira takes a unique approach to training which doesn’t require large numbers of training documents and allows you to train your model directly. Impira AutoML technology, coupled with an intuitive user interface, allows your business users to train and update the models directly. Within your collection, you can double-click into one of your medical forms to begin extracting data.
Let’s open the records-0.pdf file, and then in the document viewer panel on the left highlight the value in the box under “Account #”
Let’s name this field "Account Number," and leave it as the default type of "Text Extraction" (we’ll discuss the other types later), and for Data, we’ll leave this as "Text." Even though the Account Number value contains only numerals, you're unlikely to perform mathematical functions on the account number, and as a text value, any leading zeros will be preserved.
Once you’ve added the Account Number field, you’ve just trained your first ML model — congratulations. You’ll see the panel on the right has been updated to show all fields associated with this collection (at this point there’s only one), and the specific values for the open document.
From here, you can go on and add more fields or you can close this window and see that Impira has applied your model against all files in the collection, and has extracted the Account Number from each of them. Since you’ll need more than just an account number, let’s extract a few more fields and then check our other documents.
Highlight ‘Roberts’ In the Patient name box, and create a field called ‘Patient Last Name’.
Now do the same for fields, you’ll name Patient First Name (highlighting Barry), SSN (highlighting 864 - 37 - 5912), Primary Insurer (highlighting Pacific Care) and Secondary Insurer (highlighting Eastern Care). In practice, you would continue this for each value you want to extract, policy numbers, dates, address values, diagnoses, etc. but, for the purpose of this instruction, let’s stop here.
In order to improve the AutoML models’ training, or to correct any inaccurate values, you can double-click on any document and then edit or confirm the values Impira has predicted just as we did in the previous step.
Review the values for records-0.pdf, in the right hand section. If any are incorrect, you can edit them by hovering over the value, which will bring up a dotted line showing where the value is located on your form. Click on the pencil icon at the far right to edit the value.
You can also extract check boxes, in which case you’ll want to create a new field for each possible option (e.g. one for a ‘yes’ box and one for a ‘no’ box). These can be easily combined into one field within your collection or as you export your data.
As you continue highlighting values and creating fields, or correcting any values, Impira is creating micro-models for each field behind the scenes, taking into account not only its absolute position on the form, but also proximity to certain anchor text values. Additionally, Impira applies these models in real time against all files in the collection, and provides a visual representation of the confidence we have in the accuracy of the extracted data. With all of the correct values in place for this initial document, it’s time to see what the AutoML models you’ve just trained have extracted from the other documents.
Once you’ve finished adding fields as described above, click on the ‘X’ in the upper right to return the to table view of your documents. You note that there are now columns for each of the fields you created, populated with extracted values for each document you uploaded.
Values entered manually, such as those in records-0.pdf are shown with a solid black bar to the left of the value, and those that Impira AutoML has predicted are shown with a dashed green bar when the accuracy confidence is high, and a dotted orange bar with a flag for those which should be reviewed manually.
If you want to edit or confirm any value, just double-click on it and the file will open for you to make the edits in the same manner as previously described.
With each interaction, whether confirming or editing, you are re-training the models, and your confirmed values will be highlighted in black, just as those you initially extracted are. With just two documents manually reviewed, you’ll already start seeing improvements in the confidence of Impira’s predictions.
Your medical records are central to the care and well being of your patients. As such, ensuring you’ve captured the data from your forms accurately is paramount. Today you might be doing this by having humans re-key information from forms into your business systems. It’s critical that you retain a ‘human in the loop’ to ensure the ML-extracted data is accurate. Using Impira, not only is the number of forms that can be processed/reviewed by an individual greatly multiplied, but with each validation or correction of data, Impira’s unique feedback loop updates the ML models and applies the updates to all files in your collection in real time. While meaningful results can be extracted after training on just a document or two, Impira’s results will become increasingly more accurate over time with a small team reviewing/revising the extracted data.
Once you’ve extracted the data from your forms and feel confident in the results obtained, you may need to get the data into one or more of your downstream systems so that you can act on the extracted data. This can be as simple as a few clicks to get a CSV file of your data, or can be customized to your needs either via Impira’s API or by embedding some logic directly in Impira. Let’s take a look at the options:
Click 1: Click the Download button in the top right corner of your screen.
Click 2: Click CSV.
That’s all there is to it. You’ve just downloaded all of the data for each of the records in your collection and you can open the .csv file in Excel or Google sheets, or ingest it into one of your downstream business systems.
While it is possible that the extracted data conforms to your needs, most likely there’s some "massaging" that might be needed. In most systems, this would require you to export your data and augment or modify it so that it conforms to the requirements of your downstream systems. Impira makes it easy for you to adapt your data to the needs of downstream systems prior to exporting rather than necessitating changes to those systems. This can be done using Impira’s Read API or within the application itself.
First, you’ll need to create an API token to authenticate your API requests. Impira’s use of token-authenticated API requests protects your data from unauthenticated access. You should use unique tokens for each integration you set up so that, should you need to block access at some later date, you can remove a single token, leaving all other integrations intact. To create a token, click on the gear icon in the top menu and then click the plus icon to the right of your token list.
Give your token a name, and then you’ll be presented with the token, which you can copy to your clipboard.
Now you’ll need to construct an IQL query, which you’ll pass, along with your token, as URL parameters to an HTTP GET request using the the endpoint https://app.impira.com/o/<your_org_name>/api/v2/iql?query=foo&token=bar</your_org_name> where query= will be your URL encoded IQL query, and token= will be your API token you created from within Impira.
The easiest way to create a URL encoded IQL query is by using Impira’s IQL playground. To get there, click on the gear icon, and then select API.
In the IQL playground, you can construct a query to extract just the Patient Name, Primary Insurer and SSN. To do so, click on the ‘IQL Playground’ button and enter the following query in the search bar at the top:
@`file_collections::b5dd1ef4170b62fb` [`Patient Name`, `SSN`, 'Primary Insurer']
From here, you could download the results as a CSV file or view the API response, but you’ll probably want to make this API call directly, rather than from the IQL Playground. In that case, you’ll construct the HTTP GET request discussed above as follows:
Where you would replace <your_org_name></your_org_name> with your Impira organization name, file_collections%3A%3Ab5dd1ef4170b62fb with your collection ID (obtained from your API documentation) and foo with your API token created earlier.
In the first option, the CSV file contained separate fields for ‘Patient First Name’ and ‘Patient Last Name’, as that’s how they were extracted from your forms. However, you might need to export a single ‘Patient Name’ field. To do so, you would create a new field in your collection by clicking the ‘plus icon’ at the right of your extracted fields. You need to name the new field, select ‘function’ as the field type, and then enter a function, such as:
concat(`Patient First Name`," ",`Patient Last Name`).
A full list of available functions for manipulating strings or performing calculations on numbers can be found in Impira’s support documentation.
Now you can extract the data as you did in Option 1 to get a CSV file which includes ‘Patient Name.’ You can specify which fields to export. Since you’ve just created a ‘Patient Name’ field, you can de-select the ‘Patient First Name’ and ‘Patient Last Name’ fields if those are not needed by clicking on the Fields button selecting the fields to export in the resulting dropdown menu.
Altering the values extracted from your forms might not be enough to get your data exactly as you need it exported. Perhaps you need an internal code for the Insurer, which your patient would not know. In this example, the sample forms have insurers which include Eastern Care, Northern Care, Pacific Care, Southern Care, and Western Care. In this example, a simple .csv file with the insurer name and insurer code can be uploaded and opened as a data set, creating a new Collection, and following the steps below.
First, upload the Insurer Codes.csv file to Impira.
Within the list of All files, double click on this .csv file.
Next, click on the ‘Open file as Dataset’ button, which will create a new collection with the data from the .csv file.
To connect this data to your medical forms, navigate back to your Medical Forms collection, where we’ll create a join field.
With your join field in place, you can double click on any value to see the related values connected by the join field. In this example, the ‘Inse Code’ is now available for the Primary Insurer. You could add fields to get the code for the secondary and tertiary insurers, or add more data sets to get other related data from your Patient Management System, or full descriptions of diagnoses from diagnostic codes, etc.
If you click on the icon at the far right of the Impira interface, you can change from a table view to the JSON view where you can see the structure of your data.