Beyond Layered Metadata

Most businesses have had to adapt their metadata to match the capabilities of the product information management (PIM) systems they employ, rather than employ systems that adapt to their specific business needs. Today, emerging technologies like Artificial Intelligence hope to enhance the utility of such systems.

I’ve been obsessing over metadata in recent weeks. If metadata is defined as “data about data,” just where do we draw the line? Having spent a career in Online Video Platforms (OVP), I’m well-versed in metadata specific to video. As a new hire to Impira, I’ve really needed to expand my knowledge, and in doing so, have realized that Customers were required to adapt their business needs into their system of choice, and did not have the option to instead use systems modeled around their business needs, because of the way current systems were designed to manage metadata.

The Beginnings of Metadata Flexibility

This was certainly true for many OVPs. The first I encountered was The Feedroom, the granddaddy of OVPs, founded in 1999 as a centralized online newsroom. Built during the pre-dotcom bust era, its core product (called D.A.M.S.) sat on heavy iron in an NJ data center, with an Artesia database under the hood. There were many limitations to the architecture, and as a pivot was made from a centralized newsroom to what would become an OVP. The Feedroom faced stiff competition from upstart cloud-based alternatives such as Delve and Ooyala, which offered greater flexibility in terms of metadata and other features. But the roots of The Feedroom allowed it to survive, as there were countless metadata fields (such as cameraman, location scout, editor, on-screen reporter, etc.) included in the legacy newsroom use case that were mostly unused by Customers who employed D.A.M.S. as an OVP. These obscure metadata fields were mostly ignored, but whenever a Customer needed an additional field, perhaps to house product data, sports team rosters or links to press releases, there always seemed to be some field that could be repurposed, even if the Customers were forced to remember that the “Sound Engineer” field was where the Original Language value was stored, or to attach pdf files of game schedules in the “Closed Caption” field, as there were limited fields allowing attached files.

As I moved on to work for various cloud-based OVPs, it was the norm to support “custom metadata” in one form or another. This was either key-value pairs that could be associated with any video or sometimes these were fields that could be added at the account level and applied to all videos. The ability to have fields specified as required fields, or to restrict the values to specific data types (integers, text blobs, dates, etc.) varied from OVP to OVP, but for most Customers, especially those with an AVOD (ad-supported) or SVOD (subscription video) business model, these constraints were deemed acceptable. I would sometimes come across Customers with more complex metadata needs — such as a pick-list of possible values for one field that was dependent on the value of some other field, or deep hierarchies, beyond a simple Show->Season->Episode model.  In most of those cases, the OVP was not well-suited to be the system of record, and deployments became quite complex, with dependencies on various in-house or third-party systems.

Two people collaborating

Many Layers, Restricted Schemas

I had exposure to various Digital Asset Management (DAM) systems throughout my career as well, including the FeedRoom’s acquisition of Clerestory and Ooyala’s acquisition of Nativ, which offered a media logistics platform with some DAM capabilities and third-party integrations. The DAM space was better-suited for non-video assets (no more pdf files uploaded to caption fields!), but Customers were still stuck adapting their business to the capabilities of the DAM rather than modeling the platform to their unique business needs.

This reflection on the needs of Customers to adapt their businesses to systems brings me back to my dive into metadata. There are numerous blogs and whitepapers on the various layers of metadata, such as this one by The Getty Research Institute. Some layers seem obvious and are easily overlooked, e.g. file type (.jpg vs. .pdf, vs. .mov). Others, such as embedded data on the technical details of the device/software used to create the file might offer limited business value.  As I’ve been reading up on DAMs, I’ve found that there are still limitations on the useful business value of the metadata they manage.  

Categories of Metadata

Most metadata falls into broad categories of being descriptive, structural or administrative.  For visual elements, such as images, examples and potential uses of such metadata might be:

  • Descriptive metadata, such as filenames and keywords which are generally useful for search.
  • Structural metadata, which are often more technical in nature, describing the containers of data such as file type, and includes versioning. This can be useful for distinguishing web-ready or mobile-friendly images from raw images.
  • Administrative metadata, which are useful for managing resources, covering rights management, data created, access permissions, etc.

The ability of a system to extract and maintain all of these metadata types is valuable, and many companies, Impira included, leverage artificial intelligence and machine learning to assist in the data gathering and input tasks. Automatically adding EXIF data from image files or extracting text identified using OCR can reduce the cost and time involved in fleshing out metadata. More advanced AI algorithms can help classify visual data, by identifying recognized objects or people, and in many cases can also add context, such as location, whether a product should show the package open or closed, or even the sentiment of people in the image based on body posture and facial analysis. Systems that employ AI/ML to aid in metadata creation can unlock great value, especially for companies with large volumes of content.

Value of Metadata

Whether entered manually, extracted from files or generated using AI, metadata still affords limited value in terms of the questions that companies ask daily to drive growth and profitability. Take one common metric, ROAS, or Return On Ad Spend. Consider an example of A/B testing of two campaigns, using identical copy but an image of white shoes vs. blue shoes. BI tools would be used to determine which ad received the most clicks, and, furthermore, which color shoe sold the best. If the white sneakers were shown to outperform the red sneakers, then an eCommerce channel manager might be inclined to feature the white sneaker in the next push to an eCommerce partner. But what if the white sneakers had no inventory left?  Perhaps the red sneaker might be better to feature on the eCommerce site.

red shoes climbing up blue stairs

The current inventory of a product is an example of metadata that might be associated with a product image, but would generally not be available within a DAM. Users might use a DAM to find a desired image (e.g. search for “sneakers”), another to get ad performance data (e.g. white sneakers get more clicks than red sneakers) and a third to get inventory levels (e.g. white sneaker are out of stock) before deciding what to include in the feed to their eCommerce partner. If a DAM could enable a user to search for images of sneakers with a high ROAS and sufficient inventory levels to meet a channel partners potential demand, that would provide real business value.

A DAM system should be able to handle all of the standard metadata layers, and should employ AI/ML where possible as a force multiplier to the knowledge workers using the DAM.  But when a DAM can handle metadata beyond the standard layers, it moves beyond a system of record and becomes what successful VC, Jerry Chen of Greylock Partners has deemed a System of Intelligence.