Join us October 22nd to hear Coglate-Palmolive, IDC, and Sequoia Capital discuss moving to a digital-first environment
Learn more
A person in a red sweatshirt crunching numbers and data, wondering why big data seems like isn't all it's cracked up to be.

When big data weighs you down instead of setting you free

“Big data” has been the buzzword of the 21st century. What are you missing if you don’t get on the bandwagon? Is it possible to unlock the power of big data without building complex and costly data infrastructure? We’ll explore the costs and risks of big data, and discuss some promising alternatives.

The mystique of big data

“Big data” has been the buzzword of the 21st century. The term brings to mind a series of powerful images: Vast troves of data, giant server farms in massive warehouses, mathematical equations of dizzying complexity, robots, and rocket ships. 

If a company touts big data as a key element of their business model, it feels like they are part of an elite community. It’s as if they’re one of the brilliant few who have access to the magic of the future to conjure up the world’s most powerful insights.

In other words, big data is not exactly accessible to the common startup, individual, or reasonably-sized operation. But what does “big data” really mean? Is it as powerful as it seems, and at what point could you have access? What does being a big data-enabled organization entail? 

Breaking down the mystery of big data: The three V’s

To differentiate big data from normal, run-of-the-mill, old-fashioned data, most practitioners use “the three Vs.” The first “V” is volume, as in large companies using terabytes and petabytes of data on servers and storage farms. The second “V” is velocity, as in the speed at which data arrives, sometimes in large batches and sometimes in a constant stream. The third “V” is variety, as in data collected from a variety of sources: Text, PDFs, videos, wearable devices, social media, CSVs, analytics tools, CRMs, and beyond.

Big data isn’t just “a lot of data.” It implies novel ways of utilizing and understanding data. Ultimately, big data is used to inform business decisions.

So, at what stage can your business unlock the power of big data? You may not need as much data as you think in order to carve out accurate, actionable insights.

While “big data” — in the strictest sense of the phrase — may be out of reach, that’s not necessarily a bad thing. Ultimately, letting go of the “big” component can free up your organization. Given the choice between an unwieldy big data program and a more focused approach — we suggest going with the latter.

With a more intentional targeted data strategy, you’ll likely find that your business can scale faster. On top of that, you can avoid the high costs and risks of big data management.

More data, more problems 

In many cases, big data can hinder progress. That’s why implementing a big data framework is not necessarily the silver bullet solution that most businesses need. With its advantages come heavy costs, large time and money investments, and having to sift through a massive amount of information. 

The costs of big data

Not all data is quality data, and the ease with which you can extract the highest quality data is more important than the quantity of data you can amass. Let’s look at some of the costs and risks associated with a big data framework. 

  1. Infrastructure — The actual physical technologies required to host and operate data at this scale can take up immense amounts of space. The cost of analytical databases like Hadoop will scale according to the amount of storage, computing, and processing power used. Some estimates put the cost of an average petabyte Hadoop cluster at around $1 million. 
  2. Maintenance — The costs don’t come to an end once the big data infrastructure is in place. Management and maintenance is ongoing, and it’s costly. As data collection scales, so does the cost of the operation. Handling thousands or hundreds of thousands of nodes of data requires space, processing, more infrastructure, and more people to manage it. 
  3. Integration and migration risks — For data to have value, it usually needs to be integrated and analyzed. If your migration and integration strategies require a lot of resources and manual efforts, implementing them will come with high risk. You could experience major redundancies and be hit with hidden costs, like wasted time and improperly-used infrastructure. Data integration is risky, because when data isn’t correctly streamlined, it can cause huge problems, seen and unseen. Legacy technology and migration costs can be a large one-time investment, or an ongoing financial drain. 
  4. Backup and networking costs — While it’s important to backup your data, redundant backups can take up unnecessary space and burn through cash flow. Transferring terabytes of information is not a simple procedure. Those bandwidth costs start to really add up. 
  5. Human resources — From implementation to maintenance to security to engineering to compliance, these large data systems are complex operations with lots of headcount attached. While these are primarily technology-related frameworks, there are people attached to the process at every step of the way. 
  6. Data quality — Having bad quality data costs a lot, sometimes more than having no data at all. From storage to integration to analysis, if your data is leading you toward the wrong business decisions, the extent and cost of the damage is incredibly difficult to even identify or calculate.

One-shot learning and the case for targeted data

When organizations need to solve specific business problems, at Impira we recommend leveraging automation that uses one-shot learning. With one-shot learning, machine learning allows users to provide feedback and see updated predictions in real time. It requires smaller, more targeted sets of data, and less complexity in the set up.

Thanks to no-code innovations, it’s possible to improve the training models that inform machine learning with even one additional piece of feedback. So, while one-shot learning isn’t a “big data” solution, it’s a data-driven solution with similar outcomes like real-time feedback and impactful analysis. 

Let’s look at two use cases where one-shot learning plays a significant role. In these cases, one-shot learning takes the impact and analytical power of big data, and combines it with the fleet-footedness of an agile operation that isn’t bogged down by heavy costs and infrastructure. 

  1. Invoice data — 80% of finance departments are overwhelmed by the significant number of invoices they are expected to process. Finance departments suffer from bottlenecks and errors when manually processing all invoice data. One-shot learning solves these bottlenecks through automation that can quickly, accurately, and automatically pull data from multi-layout invoices and purchase orders at scale. The system gets smarter as more data is input, and the entire invoice and purchase order processing cycle gets quicker. 
  2. Medical and insurance forms - Claims professionals spend nearly 50% of their time on activities that don’t impact the outcome of the claim. For both insurers and clients, processing claims can be frustrating. It requires manual input, lots of waiting and missed communications, and backlogs. When claims professionals have to spend time chasing down missing data, it can delay claims payouts. With one-shot learning, automation can quickly and accurately pull data from claims forms. Easy error-checking and triggers are introduced, making the claims processing cycle much faster. 

One-shot learning in action: Stitch Fix “does more with less” by remotely managing workflows and prioritizing access to shared data

Stitch Fix is a personal styling platform that ships curated clothing choices directly to customers. Faced with a business continuity crisis during COVID-19, the company sought to reconfigure workflows with less reliance on on-prem servers and on-site offices. 

To do so, the company set up a content repository application that enabled the continual processing of new content. It also ensured that all content was enriched with relevant data, like product information and tags. As a result, even offsite team members could easily access this information and use it to inform the end product. 

Remote team members could access unused content quickly, understand usage rights, seasonality, performance, and (most importantly) have fresh content during a time when there were limited options for creating new content in-studio.

More than needing access to vast amounts of data, the team needed access to the right data. They also needed enhanced transparency across remote work locations. One-shot learning provided them with these things, and catalyzed the continual improvement of the end product. 


Big data is just a big problem if it’s not managed properly. Targeted data is more personal and it offers knowledge and actionable insights relevant to an organization’s needs and consumers.

It’s one thing to collect as much data as possible. It takes a different level of strategy and sophistication to be discerning about the data you do collect. For example, the right targeted, focused data can be managed by one person. That person can conduct sophisticated analyses with relatively few data points. No giant server farms required.

In summary, the components of one-shot learning and quality data involve:

  • Simplicity — Paring back the number of tools, integrations, and resources you’re willing to incorporate.
  • Analysis — Deciding which issues can be aided with data. What are your KPIs and end goals? 
  • Stakeholders and timelines — Designating parties for each stage of data collection and analysis, plus estimating the time it will take to go through that process. 
  • Tools — The right tools will be your best friend in one-shot learning implementation. Which software do you need, and what third-party partners can enhance your data strategy? 

Remember that not all data is created equally, and more does not always mean better. So next time you feel FOMO on not being part of the big data club, remember that you have something powerful in your back pocket: Smart, targeted data waiting to be unlocked and put to work.

Subscribe to Impira's blogStay up to date with all things Impira, automation, document processing, and industry best practices.

Come explore.