Data Warehouse, Data Mining and CRM

 

Dr. S. Chandrasekhar

Chair Professor, Chairman, I T and Operations Management Group and MCA and

Director Software Development Center, Fore Schools of Management, New Delhi

& R. Das

Information Officer & Assistant Professor
National Insurance Academy

 

 

CRM, or Customer Relationship Management, is a business strategy of a company to identify and manage the customer by reducing company’s costs and increasing profitability. CRM redesign the basic functional activities of a company that in turn demands re-engineering of work processes. To have successful CRM, it is necessary to bring together all the information from all possible sources (within / outside organization) about the customer and process in such a manner that the organization gets a real about the customer behaviour.  This will help the management, sales people, marketing and people providing service so that they can react to the customer accordingly and keep pace with the competition. It is practically impossible to consider these processes without addressing technology.  CRM applications can enable effective Customer Relationship Management, provided that an organization has the right business strategy. Three factors are responsible for the successful implementation of CRM: People, Process and Technology.

 

People: In an organization starting from top layer of employees to bottom layer employees should support CRM and promote the same.

 

Process: Business Process should be re-engineered.

 

Technology: Right technology should be chosen to drive these new processes, provide real time customer analysis to the employees.

 

If the business strategy is clearly defined, it is the technology that can drive the business and customer services more efficiently with the full co-operation of people. 

 

Data Mining plays an important role in the implementation of Analytical CRM. Analytical CRM comprises all programming that analyzes data about an enterprise's customers and presents it so that better and quicker business decisions can be made by employing online analytical processing (OLAP) and Data Mining tool.

 

Before getting into details of Data Mining let’s discuss about the building up data warehouse.

 

DATA WAREHOUSE

 

Date warehousing means a central repository of all critical data, which help managers to take decisions, based on authentic information. Building a data warehouse is not an easy task. The type of data to be kept in a data warehouse is a pivotal issue for the organisation; since this exercise involves a lengthy and tedious process of consolidating all back end data from different databases.

 

A Data Warehouse supports business analysis and decision-making process by creating an integrated database of consistent, subject-oriented and historical information. A Data warehouse is a database, populated from the existing data source system and used for reporting, analysis & business intelligence purposes. It integrates data from any kind of data structure including external data into one consolidated database. By transforming and integrating data into meaningful information, a data warehouse helps business analysts in taking business decisions. Business trends can be forecast by analyzing the historical data.

 

W. H. Inmon, the ‘father’ of data warehousing, or other authorities in the industry, the definition of a data warehouse is a subject-oriented, integrated, time variant, non-volatile collection of data in support of management decisions.

 

 

 

 

 

Data Warehouse Benefits

 

Implementing a Data Warehouse provides significant benefits -- many tangible, some intangible.  They are:

 

Structure of a Data Warehouse

 

Data Warehouses have a distinct structure. There are different levels of summarization and detail that describe the Data Warehouse. The different components of the Data Warehouse are:

 

 

 

·         Meta Data: Meta Data is an important component of the data warehouse that facilitates the user to know what is contained in the warehouse and in fact guides the use through the warehouse. To handle the links between stored data we have to define the relationships that is defined as Metadata whereas data is defined as Facts. Without metadata, data is likely to be useful only to the persons who collected the data in the first place, as they are the only ones who understand the frame of reference that goes with the data.   So, Metadata is "data about data" - it is not the actual dataset, but answers the "who, what, where, when, why and how" questions about the dataset.

 

Meta Data can be classified into two categories:

·         Technical Meta Data

·         Information about data sources

·         Transformation description

·         Warehouse object and data structure definitions

·         Rules to perform data clean up and data enhancement

·         Data mapping

·         Access authorization and history

 

·         Business Meta Data

·         Subject areas and information object type

·         Internet Homepages

·         Other information to support all data warehousing components

·        Data warehouse operational information

Data Mart

 

A data mart is a repository of data gathered from operational data and other sources that is designed to serve a particular community of knowledge workers. In scope, the data may derive from an enterprise-wide database or data warehouse or be more specialized. The emphasis of a data mart is on meeting the specific demands of a particular group of knowledge users in terms of analysis, content, presentation, and ease-of-use. Users of a data mart can expect to have data presented in terms that are familiar.

 

In practice, the terms data mart and data warehouse each tend to imply the presence of the other in some form. However, most writers using the term seem to agree that the design of a data mart tends to start from an analysis of user needs and that a data warehouse tends to start from an analysis of what data already exists and how it can be collected in such a way that the data can later be used. A data warehouse is a central aggregation of data (which can be distributed physically); a data mart is a data repository that may derive from a data warehouse or not and that emphasizes ease of access and usability for a particular designed purpose. In general, a data warehouse tends to be a strategic but somewhat unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need.

 

Building a Data Warehouse

 

In general, building any data warehouse require the following steps:

·        Data Warehouse Design: The organisation has to choose appropriate architecture for designing Data Warehouse. They are Star, Snow Flake, and Entity-Relationship. And also to identify the summarised data to be pulled from transactional database to data warehouse.

·        Extracting Transactional Data: Extraction of data from internal and external sources and apply transformations as per business rules and put in a structured and standarised format, then load into the Data Warehouse.

·        Data Availability: Data is to be available for analytic application and decision-making application.

·        Data Warehouse Strategy: The strategy should be such that it should help in achieving business goals and the data warehouse should be flexible enough to confirm to the business need as and when business strategy changes. The strategy should include how long data can be archived?

·        Data Quality: The most important aspect of Data warehouse is data quality and maintenance. The data pumped into data warehouse should match with the business rule and also measure the data quality. If the qualitative data is not achieved, it is necessary to change the data transformation process to ensure that good quality of data is pushed into data warehouse.

Text Box: Web toolText Box: Optimized LoaderText Box: Data Warehouse 
Engine
Oval: Extraction
Transform-ation
Transport-ation
Can: ERP
Systems
Can: Purchased 
Data
Can: Relational
Databases

Text Box: Metadata RepositoryCan: Legacy
Data
Text Box: Query Analyser

 

 

 

 

 

 

Text Box: Client /Server
Tools

Text Box: Applications

 

DATA WAREHOUSE ARCHITECTURE

 

 

 

 

 

Data Access Tools

 

The following are the front-end tools for user interaction.  They support both dynamic and preplanned analysis. They use Meta Data for accessing the warehouse. They can be classified into five main groups:

 

·         Data query and Reporting Tools

·         Application Development Tools

·         Executive Information Systems Tools

·         On-line Analytical Processing tools

·         Data Mining tools

 

On-line Analytical Application (OLAP)

 

Since, Data warehouse has to cater to ad-hoc and complex queries, it uses a special type of architecture called “Multi-Dimensional Data Bases (MDDB)” which can be implemented using relational technology with star schema.

 

Multi-dimensional data may reside in spreadsheets, relational databases, or legacy data. The data-access and analysis tools must be able to take enterprise data from a variety of sources and give work groups the accessibility, power, and flexibility the need to view it in every conceivable way.  Only multidimensional analysis provides a clear picture of the business at any given time.  OLAP is the answer.

 

Characteristics of OLAP

 

·         The ability to scale large volumes of data and large numbers of concurrent users.

·         Provides fast, interactive response time.

·         Provides for analyzing time series.

·         Supports ‘What if’ analysis and planning in a multi user read/write environment.

·         Robust data access security and user management.

·        Availability of a wide variety of viewing and analysis tools and support different user community.

 

Data Mining

Data Mining is the process of extracting hidden information from databases.  Data Mining also helps in predicting future trends and behaviour allowing business to make pro-active and knowledge driven decision.

Data Mining is often viewed as a corollary to Data Warehousing because of the necessity to integrate and derive new information that transactional systems do not provide.

 

Data Warehousing allows building the Data Mountain.  Data Mining allows shifting the mountain down to the level of essential information that is useful to the business.  The metaphor here is that some nugget of gold hidden in the mountain of data and Data Mining can find the gold, which would otherwise be too costly or too difficult to find without Data Mining tools.

 

Data Mining Techniques

 

·         Classification: This technique is used to classify database records into a number of redefined classes based on certain criteria. For example, a bank wants to classify its customer records as good, medium, or poor risk based on the attributes income, and age.  A generated rule could be that a customer in the age group between 50 and 60 with an income greater than Rs.50,000 are a good credit risk.

 

·         Association Rules: These techniques are often used for market basket analysis and discover rules that are hidden between the attributes.  For example, 60% of all customers those who buy coke also buy coke on Friday afternoon. The percentage of occurrence is the confidence factor.

 

·         Sequencing: This technique helps identify patterns in time series.  This is useful for stock market predictions or for catalogue companies.  For example, they might discover that buyers of toys buy learning software for children five years later.

 

·         Decision Tree: It is predictive model that can be perceived as a tree.  Each branch of the tree is a classification question and the leaves are the partitions with their classifications.

 

·         Neural Networks: Neural networks try to stimulate the human brain.  The nodes of the neural networks are connected and every connection has a weight.  The difference between the output and the training output is called the error and is “Back-Propagated” through the net.  The weights are adjusted according to the error.

 

·         Neural networks can handle noise in the data and are good for most of the problems, but the knowledge cannot be analysed like for the decision trees.  Therefore, neural networks can be pursued as a black box only.

·         Genetic Algorithms: Genetic algorithms stimulate the biological evolution.  The attributes are coded like the DNA.  A lot of individuals are generated and from generation to generation they change their DNA with operators like mutation and crossover.  The survival of the fittest principle selects only individuals that are better than the generation before.

 

·         Rule Induction (Association Rules): All possible patterns in the database are systematically pulled out and then the accuracy (confidence) and the coverage (support) are calculated.  The rules are easy to understand.

 

Data Mining Technology & CRM

 

Data mining plays a leading role in the every facet of CRM.  Only through application of Data mining techniques, which is a part of Analytical CRM, one can turn large volumes of data collected from various front-end systems like operational CRM/CIC into meaningful customer knowledge.

 

At this stage it is pertinent to understand the difference between (i) Data (ii) Information and (iii) knowledge.

 

For example a number like 3000 is data.  Adding a context to data, converts data into information.  Let us add context i.e. Rs to number 3000 making it Rs 3000.  Rs 3000 convey much more meaning than number 3000.

 

Rule contains (more context in a relative Sense) compared to Information.

If the salary of a person is Rs 3000 p.m. he is poorly paid

 

A generalized rule for example

If salary of a person is between Rs 2000 – Rs 5000 p.m., he is poorly paid:

Represents Knowledge

 

So in essence data mining technology converts data contained in database to form a rule, and generalized rule, which is knowledge.

 

In other words, it summarizes the information contained in thousands of records to few rules say 20 or 30, which is more actionable and meaningful.

 

Major techniques for converting Data into information/knowledge are:

 

 

Basic Steps in Data Mining