We’ve got new integrations, UI redesigns, and nifty features that’ll ease your workflow.
Table extraction: Beta
You asked and we’re answering: Currently in beta, our new table extraction capability allows you to pull data from tables within documents like purchase orders and invoices. We know that tables are rarely straightforward nor do they always follow convention. This feature allows users to extract standard grid-like tables as well as more complex tables that may not conform to a rigid grid format.
We’d love for you to join our table extraction beta program, take it through the paces, and give us any feedback. Contact firstname.lastname@example.org to access this new feature.
Redesign: The “Add field” feature
The new “Add field” feature in our UI makes it easier for users to understand the order of steps needed to add new fields for data extraction. The redesign displays important information that allows the process to be more intuitive. This feature will be rolled out throughout the month of August.
Dynamic bounding box
The bounding boxes (used to select values to extract) will dynamically expand or contract based to correspond with what a user is actively typing.
Want to see what the field you just extracted looks like on other files? When you’re in file view, check out the lower right corner to see a preview of what Impira’s doing in real time.
Text preprocessing improvements
We’ve updated the method of preprocessing incoming files to improve the quality of the extracted text. These changes include using a different parser for digital PDFs, improving the quality of images we use to run optical character recognition (OCR), ignoring embedded PDF text with font issues, and many more. All of these changes contribute to higher quality text extraction for your documents.
We’ve been hard at work improving the performance and speed of the Impira across the board. Some notable examples: Confirming and editing the values for machine learning (ML) predictions is now 10x faster, and file uploading speeds are more than 2x faster than before.
Lots of new entities
In our last release notes, we announced that we have started using Named Entity Recognition (NER) to improve the accuracy of the text extraction models. Since then, we’ve dramatically expanded the set of entities we detect, including addresses (and their constituent parts), email addresses, phone numbers, currency, and more. The ML models can use this semantic information to improve the accuracy of your extractions.