The business of Experian, a global leader in credit reporting and marketing services with annual revenues exceeding US$4.3 billion (for 2017), is all about data.
Experian has four main business units: Credit Information Services, Decision Analytics, Business Information Services, and Marketing Services. Experian Marketing Services (EMS) helps marketers connect with customers through relevant communications across a variety of channels, driven by advanced analytics on an extensive database of geographic, demographic, and lifestyle data. EMS has built its business on the effective collection, analysis, and use of data.
Quadrillions of records
The company has always handled large amounts of data, billions and quadrillions of records, on who consumers are, how they’re connected, how they interact. With today’s proliferation of digital channels and information of social media likes, web interactions and email responses, older systems no longer have the capacity to deal with the data volumes.
In the past, there was no requirement to provide data in real-time. Experian sent customer database updates to clients once a month for campaign adjustments, allowing Experian to process large volumes of data through a number of diverse platforms, which were mostly mainframe-based.
That’s changing. Today’s consumers leave a digital trail of behaviours and preferences for marketers to leverage so they can enhance the customer experience. Experian’s clients, which includes many of the top retail companies in the world, are asking for more frequent updates on consumers’ latest purchasing behaviours, online browsing patterns and social media activity so they can respond in real time. They are increasingly looking for a single, integrated view of their customer.
Technology infrastructure for real-time reporting
Meeting the need for immediacy of information and customisation of data in real time for clients would require a technological infrastructure that can accommodate rapid processing, large-scale storage, and flexible analysis of multi-structured data. Experian’s mainframes were hitting their limits in terms of performance, flexibility and scalability.
EMS set an internal goal to process more than 100 million records of data per hour, translating to 28,000 records per second.
The team decided to look for new architectures that could handle the new volumes of data. About 30 criteria were identified for the new platform, ranging from depth and breadth of offering to support capabilities to price to unique distribution features. Two criteria were prioritized: Both batch and real-time data processing capabilities; and scalability to accommodate large and growing data volumes.
The North America Experian Marketing Services group led the evaluation of NoSQL technologies within Experian. Hadoop and HBase quickly surfaced as a natural fit for Experian’s needs. EMS engineers downloaded raw Apache Hadoop.
They saw certain gaps that could be filled by a commercial distribution. EMS evaluated several distributions and selected Cloudera to meet EMS’ enterprise-level Hadoop needs, such as meeting client SLAs (service level agreements) and having 24×7 reliability.
Experian invested in Cloudera Enterprise, which is comprised of three things: Cloudera’s open source Hadoop stack (CDH), a management toolkit (Cloudera Manager), and expert technical support.
A production version of Experian’s Cross-Channel Identity Resolution (CCIR) engine was launched. CCIR is a linkage engine that is used to keep a persistent repository of client touch points. CCIR runs on HBase, a high-performance, a distributed data store that integrates with Cloudera’s platform to deliver a secure and easy-to-manage NoSQL database.
EMS’ HBase system spanned five billion rows of data, as of 2017, and the number is expected to grow tenfold in the near future. HBase offers a shared architecture that is distributed, fault tolerant, and optimised for storage. In addition, HBase enables both batch and real-time data processing.
Experian feeds data into the CDH-powered CCIR engine using custom extract, transform, load (ETL) scripts from in-house mainframes and relational databases including IBM DB2, Oracle, SQL Server, and Sybase IQ.
Processing performance accelerated by 50x
The new platform is delivering operational efficiency to Experian by accelerating processing performance by 50x, at a fraction of the cost of the legacy environment. The new system can process 100 million records per hour compared to 50 million matches per day earlier.
Cloudera Enterprise allows Experian to get maximum operational efficiency out of their Hadoop clusters. Due to a wide variation in use cases for customers, the team had to do a lot of tweaking on the platform to get the performance we need. Cloudera Enterprise provides the ability to store these store different configuration settings and version those settings.
McCullough added, “Not only has Cloudera Manager simplified our process, but it’s made it possible at all. Without a Linux background, I would not have been able to deploy Hadoop across a cluster and configure it and have anything up and running in nearly the timeframe that we had.”
Furthermore, Cloudera Manager enabled the deployment and configuration of Hadoop across a cluster in the timeframe Experian had. Cloudera Manager monitors services running on cluster and reports when servers are unhealthy, services have stopped, and/or nodes are bad. It automates distribution across the cluster, monitors CPU usage across various applications and data storage availability and provides a single portal to see into all cluster details.
The deployment allowed Experian to process orders of magnitude more information through its systems. Experian’s platform is the first data management platform of its kind that accepts data, links information together across an entire marketing ecosystem, and puts it into a usable format for an enhanced customer experience. These data processing capabilities combined with Experian’s expertise in bringing together data assets provided new insights into tomorrow’s marketing environments.
In January 2017, it was announced that Experian was integrating Cloudera Enterprise onto its cloud environment for its Credit Information Services, Decision Analytics and Business Information Services business lines, with the aim of improved credit data processing speeds for clients. Thus, Cloudera continues to transform the way Experian provides consumer and business credit data to its clients.