Tivo Research Analytics Mines Big TV Data

May 06, 2013 (07:05 AM EDT)

Read the Original Article at

Twentieth-century department store magnate John Wanamaker famously said, "Half the money I spend on advertising is wasted; the trouble is I don't know which half."

With big-data analysis, it's now possible to know which half is which. Tivo Research and Analytics (TRA) correlates data on television viewing habits to third-party purchasing data to show advertisers which ads are driving more sales, whether it's consumer packaged goods, automobiles or even prescription drugs. TRA is even exploring correlations among television advertising and online advertising so it can help marketers allocate cross-channel campaign spending.

How does TRA do it? It's all in the data, and in TRA's case it's "naturally occurring" data, meaning it's not based on surveys, diaries or logs collected from a small sample of TV viewers. The company's Media TRAnalytics service, launched four and a half years ago, relies on data from cable company set-top boxes to compile actual -- not estimated -- data on what shows are being watched and which ads are being seen in roughly 4.4 million households. Nielsen, by contrast, uses TV viewing data from a panel of some 20,000 households to extrapolate what share of approximately 116 million households are tuning in to particular shows.

[ Want more on surprising insights into household matters? Read Big Data Knows When Your Home Will Sell. ]

Nielsen and TRA are in fundamentally different businesses. Nielsen's ratings tell you how many households are tuning into mainstream broadcast and cable television shows. TRA goes after "the long tail" of smaller networks and programs and it licenses Nielsen data so it can combine and offer both sources to customers who are already subscribers to Nielsen ratings data. More to the point, TRA answers the fundamentally different question: Was my advertising effective in driving increased sales?

"Just knowing what people are watching on television doesn't answer that question," says Mark Lieberman, TRA's CEO. "To do that, we went out and licensed purchase data on what people are buying in places like supermarkets, what cars they're buying and what prescriptions they're filling."

The essence of Media TRAnalytics is correlating TV viewing data with these third-party data sets, and doing so in a way that doesn't raise privacy concerns.

In the case of supermarket purchasing, TRA partners with loyalty card data aggregators Dunnhumby and GFK. More than 80% of supermarket purchases are tied to loyalty card records, and using double-blind matching processes, TRA can correlate TV viewing with consumer packaged goods (CPG) purchases across 40 million households. TRA knows that set-top box "123" belongs to the same household that holds loyalty card "ABC," but no personally identifiable information is held in TRA servers or tied to that insight, according to Lieberman.

With this combination, TRA can tell advertisers which and how many households in which zip codes are heavy purchasers of, say, breakfast cereals, and which brands they're buying. Further, it can tell them which shows these buyers watch and whether ad campaigns run on these programs are stimulating higher sales.

Retailers and consumer package goods companies use these insights in any number of ways. If they're simply trying to increase sales, they can run ads on shows watched by loyal customers. If they're introducing new brands or trying to gain market share, they can run campaigns on shows watched by buyers of rival brands.

The real win -- and the part that addresses Wanamaker's lament -- is that advertisers can track campaign results over time and quickly discover whether their ad investments are paying off.

"What we're doing is telling advertisers which programming is rich in households that have particular purchasing preferences," explains Brian Canning, TRA's chief technology officer. "You can identify which households are known to buy Wheaties, for example, and that are heavy purchasers of cereal in general. Then you can see the rating for every program on television against that universe of households."

Ratings are expressed as an index, so, for example, the show "CSI" might have an index of 120 for Wheaties purchasers -- 20% higher than the average index of 100. Without TRA insight, ad buyers typically purchase based on the gross rating points (GRP) and demographics of any given show. But it could be that when considering two shows with identical GRPs and similar demographics, one show over-indexes for buyers of particular products whereas the other show under-indexes for those buyers.

 Big Data Analytics Masters Degrees: 20 Top Programs
Big Data Analytics Masters Degrees: 20 Top Programs
(click image for larger view and for slideshow)
Beyond consumer package goods, TRA correlates viewing habits and ad results with automotive data from Experian, so it knows what make, model and year vehicles were purchased over the last 18 years across 100 million households. Correlations to data on some 1.5 billion prescriptions filled each year can also be made, but this data stays on premises at data provider IMS, which does the correlation work using one-way hashing of TRA viewing data (but not household information) to ensure HIPAA-compliant data practices, according to Canning.

TRA gathers its data on TV viewing every day. TRA (formerly known as The Right Audience) was acquired by Tivo last summer, so it now has data from more than 1 million Tivo boxes as well as from more than 3 million conventional set-top cable boxes. With the Tivo data, TRA has data on not only every click on the remote but also time-shifting, ad-skipping, rewind and playback behavior.

TRA keeps 15 months' worth of daily viewing data so it can track campaign results over time, and it's by far the largest chunk of the 15 terabytes the company manages. The supermarket data is refreshed once per week while the auto ownership data is revised quarterly, so these sources account for less than 5% of TRA data.

[ Want more on Kognitio's database? Read Kognitio Tries Fast, Faster, Fastest Data Warehouse Strategy. ]

Media TRAnalytics is far from a petabyte-league big data deployment, but it quickly grew beyond the capabilities of the MySQL database the company started with four and a half years ago. The TRAnalytics service is exposed through an online portal, and the idea is to let media buyers and planners explore as many variables as possible to analyze programming and purchasing habits. But in those early days, complex, multi-dimensional reports were taking as long as 20 minutes. That was unacceptable, and with more data, more variables and more analyses on the way, the company knew it needed a more robust platform.

After reviewing alternatives, TRA switched from MySQL to the Kognitio database, and "all the problems went away," said Canning. Like Netezza, Greenplum and many of the other database options available four years ago, Kognitio offered the power of distributed, massively parallel processing on commodity X86 hardware, but it stood out (as it still does today) for its ability to exploit high levels of memory.

"Use of memory is huge because it's the only way to get reasonable, Internet-query speed response times," says Canning. "We can pin up to 5 terabytes of data into memory, and we need that to generate ratings for all of the shows that people might be watching at any given time."

Having as much as a third of all available data available in memory is unusual for a data warehousing deployment (although some vendors, like SAP, are now touting all-in-memory warehouses). The in-memory access speeds have become increasingly important as TRA now has roughly 10 times more data than it did four years ago. The complexity of reports has also grown, but even the most complex, multi-dimensional reports (with as many as 20,000 lines of data when extracted to Excel) take about one minute. The system has about 400 users from ad agencies, advertisers and TV networks. Last year some 12,000 reports were done on more than 50,000 ad campaigns.

TRA is getting into cross-media measurement through cookie-based information Experian has on the Web browsing habits of 70 million households. That will expand the scale of analysis and the number of correlations available yet again.

"The Internet has obviously become an increasingly important element to advertisers, but they want to know where they'll get the best bang for the buck," Lieberman said. "Should they spend 70% TV and 30% Internet or 40% TV and 60% Internet? We can go deep on TV and Internet ad impacts on purchasing habits to help them determine the right mix." Internet data, too, is privacy protected through double-blind correlation approaches, Lieberman said.

The data that TRA analyzes is all highly structured, so it doesn't fit the classic notion of big data variety -- or the need for Hadoop or a NoSQL database as a platform for variable data. But it's no less a big data deal for TRA customers.

With all the information that's available -- on what's being watched on TV, which ads are actually seen, what cars are owned, what drugs are being prescribed, what websites are visited and what's being purchased the day after seeing particular ads on TV or on the Internet -- it seems like there's very little that can't be known through the power of query, correlation and analysis.