Subscribe to Impira's blog
Stay up to date with all things Impira, automation, document processing, and industry best practices.
Impira’s CEO, Ankur Goyal, had the opportunity to sit down with for a Founders Fireside Chat with CMU Tech & Entrepreneurship on Wednesday, July 14 to talk about his entrepreneurial journey, Impira’s evolution, and his thoughts about trends in AI/ML startups, as well as other bits of insight and knowledge.
Watch the video below for the full Fireside Chat, or scroll down for the conversation highlights (edited for readability):
On the evolution of Impira
I got a few folks to join Impira around September of 2017 and figure out what to build. We found a great use case where lots of customers were having trouble sorting through mountains of unstructured data, specifically image and video content, and could use our technology. Back then, it was a combination of using Computer Vision to understand the content of images and database-like infrastructure that let you search and query that information very efficiently.
We found our first few customers and started working with them that summer. They’re still customers today, and we’ve had the really good pleasure of growing with them and working with them for a really long time while evolving the software.
The next year, as we were growing and selling and learning about the market, some of our customers started loading PDF documents into the product and started asking if we could read structured data out of them. We realized that we could do that with our technology and that this was an even larger opportunity and bigger use case [than working with images and video].
We also realized that, unlike searching for lots of images and videos, this use case is something that can be self-service because an individual user could take advantage of it. We started working on retooling some of our technology to work for this use case.
Then COVID came along — and like every startup, including well-funded startups like us (we had raised another round by this point), we had to ask ourselves what we should really focus on because we couldn’t do too many things at once. We were so excited by processing PDFs and reading structured data out of text that we decided to put all our resources into that use case.
We spent a good chunk of last year really focused on advancing the product and shipped a self-service version of our product later that year. That has been the fun part of what we’ve been doing since then. We’re now acquiring all of our new customers through people signing up online and playing with the product — sometimes even putting Impira into production before even talking to us. Hopefully some of them come along and pay us as well. That has been the evolution of Impira over the past couple years.
As a CEO, do you still code?
That’s a good question! Everyone is different. My working style is to have my hands involved in a lot of different things at once, but go really deep into one or two things at a time. Sometimes that means going really deep into product, and sometimes that means writing code that is on the critical path for the company.
So, I do still write code. Sometimes I write code when it’s really important for the company in an area related to databases or programming languages, which is my sweet spot and area of expertise. Separately from that, I do still code for a few hours a day sometimes, mostly because I love coding. I’ve been trying to follow the general advice of stopping coding once you become a manager or founder, but I can’t stop because I really love doing it.
On trends in AI/ML startups
It’s really, really, really hard to build a machine learning (ML) startup because machine learning software doesn’t behave like traditional software. If you look at public market companies, the ones that successfully use machine learning today are consumer companies. That’s because the UI that consumer companies are able to expose is really simple and easy to use, and has a very small surface area that can just be accelerated by machine learning. Google Search is a great example of that. There’s machine learning ranking which search results come in what order. If the search results come in an order that’s not exactly what you wanted them to be, a user may not realize that and it’s not the end of the world. Similarly, TikTok is a really cool example. The machine learning model tries to predict what video a user will like to see next, but if it picks a video a user didn’t have in mind, that’s not something a user would necessarily be upset about.
But in the business world, especially in the data world, users are trained to expect exact things. For example, in business intelligence and analytics, when users are aggregating sales results for the past quarter, they want to know the exact number, not a guess. Actually, surprisingly (or unsurprisingly), if you try to forecast what sales will be in the next quarter, people get pretty upset if it’s not accurate.
Another thing — I’m going to get really technical for a second — in consumer applications, the “schema” for every consumer is the same. It’s really hard to write code or build software that works on multiple schemas at once. But if the schema is the same, then you can train a model that works across a number of different use cases and users. In the enterprise world, everyone’s schema is a bit different. One person might be forecasting sales, and another person might be forecasting clicks. Another person might be trying to calculate how many in-person sales will happen, and a different person might be trying to calculate how many online sales will happen. You can imagine the differences in the data, therefore the differences in the model that you have to produce.
The challenge in machine learning becomes: If you produce a product that requires hand-holding (and it’s very easy to fall into that trap), then your engineers will be training and improving the ML model for every customer that you acquire. That is a recipe for building a consulting company, not a software company.
That’s one of the things we’ve seen a number of our competitors and peers who are building ML companies struggle with. When we doubled-down on document processing and data extraction, we drew a line in the sand and said that we are 100% committed to making our product self-service, which forces us to avoid that problem. It’s a very hard technical problem — to make an ML company scalable (like for any other software company) — but I think it can be done.
Being laser-focused on topics and problems
What you want to avoid is a scenario where, to gain more users, your software engineers have to train or update a ML model to support the customer. Let’s imagine you sell software to a company that has a certain financial model and drivers for their business, so you inhale all that data and build a model that tells them what their sales will be. You do a great job and sell that solution to the next customer, but the data that tells you how their sales will perform is different, so you have to train a new model to actually be able to do that. If your engineers have to train a model, you’re toast because you’ve ended up building a consulting company, not a software company.
So, one part of that is narrowing the problem enough so that you don’t need to train a new model every time. The difference between, let’s say, a direct sales company and a consumer e-commerce company is probably really big. But the difference between two e-commerce companies is a lot smaller. So, it could be the case that if you narrow the problem down to just looking at e-commerce companies, the model you trained for the first set of customers might generalize better to the next set of customers.
At Impira, we treat the ML problem more like a compiler problem. Under the hood, Impira compiles a new model for every customer, and the models we develop can train with very small amounts of data. That’s a different approach that happens to work very well with extracting data from documents, but it may not work with every problem.
At the end of the day, you have to figure out a way to generalize the work you do for some customers to the next.
On the evolution of UI/UX for machine learning solutions
I wrote a blog post a few years ago about how all the UIs we’re used to using to manipulate data have to change in the world of machine learning because the information you get when you use Excel, Tableau, Power BI, or Salesforce — they’re all deterministic. All the information you see on the screen is exact and that’s what people are used to. But in the machine learning world, all the information carries some uncertainty. If you’re showing a bar graph, how do you show uncertainty in a way that someone without a PhD can understand it? It’s not an easy problem to solve.
For example, in Impira, unlike Google or TikTok, we’re giving people a very exact piece of information. If they upload an invoice into Impira and they want to know exactly what the total is, what the invoice number is, when it’s due, etc., the way we communicate uncertainty is very simple. If we’re very certain, we have a little green bar next to the data, and if we’re uncertain, we have an orange bar. That actually helps users sift through what they need to review. We have a number of pieces of the user experience that helps people understand that.
The cool thing is (and this is kind of a hard technical problem, though doable), in Impira, when you change something — like if you find an orange cell and you correct the value or confirm that it’s already correct, you actually teach the model as you’re doing that. That makes it fun for users to actually go and put in the time and energy to review the things they see and improve the model in the process.
Design is a very, very big part of what we do at Impira — everything from our website to every little piece of the product experience, we think (or strive to think) about the design first because it’s so important. Users who are using ML software like Impira are learning a bunch of these concepts for the first time.
Who were your first customers and how did you source them?
Our first two customers were Goop, Gwenyth Paltrow’s e-commerce company, and StitchFix, another e-commerce company. Right after starting Impira, I asked all of my friends and our investors (and anyone who would talk to me) to connect me to anyone who might be interested in what we were doing. I talked to a bunch of people in a bunch of different industries. Some people kept talking to us and some people told me to go away. Imagine just grinding and doing that for several months at a time. Goop and StitchFix were two people who just kept talking to us and had some really exciting use cases for how to use our technology. They were visionaries who saw past the fact that we didn’t have a product or a team or any customers, but had enough pain that it was actually worth talking to us about how we could help them. I met them both in February of 2018 and they became customers in May and June of that year.
Sometimes they ignored me for weeks, other times we were talking every day. Throughout the whole experience, I just kept trying to talk to as many people as I could and kept talking to these people over and over again. Ultimately, that led us to opportunities to demo and convince them that taking a bet on us was worth it.
What was the traction for Impira when you raised your first round. Did you have any paying customers?
Actually, we had zero traction. It was just me and I had just come back from a trip, and I made a demo which I showed some customers (with whom I had a preexisting relationship with from my time at MemSQL) on the way back from Kenya. You can imagine — getting from Nairobi from San Francisco requires stopping in a few places. I stopped in Munich and New York and showed the demo to some customers that I had worked with before and got their feedback. That was enough for me to raise funds.
I don’t think everyone has that experience. I was fortunate to have been an executive at another venture-backed company, had the time to build relationships with people, had a lot of things de-risked (like my ability to recruit people), and the customer relationships that I had on top of the ability to build software.
The formula worked out. Some great investors like Lightspeed and General Catalyst were willing to take a bet on me with conviction based on my background, the demo, and the initial feedback I had. They didn’t need a bunch of paying customers to build the same amount of conviction.
Do you have any advice for founders from non-technical or business backgrounds who want to start a company in the ML space?
Actually, I do! We wrote a blog post about three kinds of machine learning companies.
The first one is called a “powered by ML” company. These are companies that require a combination of a very deep understanding of a particular problem and machine learning. If you are a non-technical or business person, what you can bring to the table for a machine learning company is a deep understanding of a problem that could benefit from machine learning. You’re going to have to partner with someone who has a deep understanding of ML technology — I don’t think you can reasonably start an ML company without an ML co-founder. You can be a person who provides focus, excitement, and interest in a problem that an ML founder can apply their technical insight into and drive a lot of value.
Gong is a good example of a company like this. Some of the founders are ML folks and the CEO was a sales leader for a long time so he understands the inefficiencies in a sales organization and how they can be improved. In their case, watching videos and automatically coaching salespeople based on what happens in those videos.
CMU Tech & Entrepreneurship is a Carnegie Mellon University alumni-led community that empowers entrepreneurs, industry experts, technologists, and startups by providing resources, opportunities, and events.