Shopzilla boosts e-commerce with Big Data
Wednesday August 27 2014
The online marketplace is using enterprise analytic data management to gain in-depth retail insight
Online shopping comparison engine Shopzilla has complemented its Oracle enterprise data warehouse (EDW) with the addition of a Cloudera enterprise data hub to boost its e-commerce capabilities.
Shopzilla is one of the world’s largest shopping and marketing e-commerce platforms. Its consumer division operates a premier portfolio of online shopping brands in the U.S. and Europe, consisting of Bizrate, Beso, Retrevo, Shopzilla, Tada!, PrixMoinsCher, and SparDeinGeld
With its 500-terabyte data warehouse growing by five terabytes a day, reaching a global audience of over 40 million shoppers each month through both its destination websites and affiliate network, Shopzilla’s existing legacy data warehouse had outgrown its capacity, impacting on the company’s ability to provide business analytics in a timely and effective manner.
Shopzilla operates sites and business services in the United States, the United Kingdom, France and Germany with plans to roll out further across the European e-commerce market.
Combining consumer insights and media buying
Shopzilla is now processing 15,000 feeds and 100 million products daily from retailers using the enterprise data hub technology, which provides analytic data management powered by Apache Hadoop.
Cloudera enables Shopzilla to combine consumer insights and media buying within its existing programmatic platform, helping marketers to learn more about their customers, discover valuable audiences and activate new consumers at scale.
In this hybrid Big Data environment, Shopzilla can now process and deliver new insights on millions of page views and ten billion ad requests daily, reaching over 100 million unique visitors and gaining valuable insights in hours or minutes instead of days.
From hours to minutes
“Our legacy system delivers great performance for analytics and reporting, but didn’t have the bandwidth for the intensive data transformations we needed,” explained Paramjit Singh, director of data for Shopzilla. “It would take hours to process 100 million products per day.
"We needed enormous processing capabilities, scalability, full redundancy, and extensive storage–at a cost-effective price. Our Cloudera platform provides all that and more, while complementing our current data warehouse system. We were able to reduce latency from days to hours and soon minutes.”
Singh explained that Cloudera provides an exploration environment for Shopzilla's data scientists that reveals tremendous insights, which would be virtually impossible to obtain otherwise. “We’re able to answer complex questions on multi-structured data, such as how a user is behaving on a particular site and what ads would be most effective, as well as execute other sophisticated data mining queries.
“It improves Shopzilla’s ability to provide relevant results to users–a core tenet of our business. Many of the things we do as a business would not be possible without this platform running alongside our Oracle data warehouse.”
This improved processing performance also benefits Shopzilla’s search engine marketing (SEM) activities, allowing the company to score and bid on ten million keywords each day.