Data Warehouse, Data Mining and CRM
Dr. S.
Chandrasekhar
Chair Professor, Chairman, I
T and Operations Management Group and MCA and
Director Software
Development Center, Fore Schools of Management, New Delhi
& R. Das
CRM, or Customer Relationship Management, is a
business strategy of a company to identify and manage the customer by reducing
company’s costs and increasing profitability. CRM redesign the basic functional
activities of a company that in turn demands re-engineering of work processes.
To have successful CRM, it is necessary to bring together all the information
from all possible sources (within / outside organization) about the customer
and process in such a manner that the organization gets a real about the
customer behaviour. This will help the
management, sales people, marketing and people providing service so that they
can react to the customer accordingly and keep pace with the competition. It is
practically impossible to consider these processes without addressing
technology. CRM applications can enable
effective Customer Relationship Management, provided that an organization has
the right business strategy. Three factors are responsible for the successful
implementation of CRM: People, Process and Technology.
People: In an organization starting from top layer
of employees to bottom layer employees should support CRM and promote the same.
Process: Business Process should be re-engineered.
Technology: Right technology should be chosen to
drive these new processes, provide real time customer analysis to the
employees.
If the business strategy is clearly defined, it is
the technology that can drive the business and customer services more
efficiently with the full co-operation of people.
Data Mining plays an important role in the
implementation of Analytical CRM. Analytical CRM comprises all programming that
analyzes data about an enterprise's customers and presents it so that better
and quicker business decisions can be made by employing online analytical
processing (OLAP) and Data Mining tool.
Before getting into details of Data Mining let’s discuss about the building up data warehouse.
Date warehousing means a central repository of all
critical data, which help managers to take decisions, based on authentic
information. Building a data warehouse is not an easy task. The type of data to
be kept in a data warehouse is a pivotal issue for the organisation; since this
exercise involves a lengthy and tedious process of consolidating all back end
data from different databases.
A Data Warehouse supports business analysis and
decision-making process by creating an integrated database of consistent,
subject-oriented and historical information. A Data warehouse is a database,
populated from the existing data source system and used for reporting, analysis &
business intelligence purposes. It integrates data from any kind of data
structure including external data into one consolidated database. By
transforming and integrating data into meaningful information, a data warehouse
helps business analysts in taking business decisions. Business trends can be
forecast by analyzing the historical data.
W. H. Inmon, the ‘father’ of data warehousing, or
other authorities in the industry, the definition of a data warehouse is a
subject-oriented, integrated, time variant, non-volatile collection of data in
support of management decisions.
Implementing a Data Warehouse provides significant
benefits -- many tangible, some intangible.
They are:
Data Warehouses have a
distinct structure. There are different levels of summarization and detail that
describe the Data Warehouse. The different components of the Data Warehouse
are:
·
Meta Data: Meta Data is an important component of the
data warehouse that facilitates the user to know what is contained in the
warehouse and in fact guides the use through the warehouse. To handle the links
between stored data we have to define the relationships that is defined as
Metadata whereas data is defined as Facts. Without metadata, data is likely
to be useful only to the persons who collected the data in the first place, as
they are the only ones who understand the frame of reference that goes with the
data. So, Metadata is "data
about data" - it is not the actual dataset, but answers the "who,
what, where, when, why and how" questions about the dataset.
Meta Data can be classified into two
categories:
·
Technical
Meta Data
·
Information
about data sources
·
Transformation
description
·
Warehouse
object and data structure definitions
·
Rules
to perform data clean up and data enhancement
·
Data
mapping
·
Access
authorization and history
·
Business
Meta Data
·
Subject
areas and information object type
·
Internet
Homepages
·
Other
information to support all data warehousing components
· Data warehouse operational information
A
data mart is a repository of data gathered from operational data and other
sources that is designed to serve a particular community of knowledge workers.
In scope, the data may derive from an enterprise-wide database or data
warehouse or be more specialized. The emphasis of a data mart is on meeting the
specific demands of a particular group of knowledge users in terms of analysis,
content, presentation, and ease-of-use. Users of a data mart can expect to have
data presented in terms that are familiar.
In practice, the terms data mart and data
warehouse each tend to imply the presence of the other in some form.
However, most writers using the term seem to agree that the design of a data
mart tends to start from an analysis of user needs and that a data warehouse
tends to start from an analysis of what data already exists and how it can be
collected in such a way that the data can later be used. A data warehouse is a
central aggregation of data (which can be distributed physically); a data mart
is a data repository that may derive from a data warehouse or not and that
emphasizes ease of access and usability for a particular designed purpose. In
general, a data warehouse tends to be a strategic but somewhat unfinished concept;
a data mart tends to be tactical and aimed at meeting an immediate need.
In general, building any data
warehouse require the following steps:
·
Data Warehouse Design:
The organisation has to choose appropriate architecture for designing Data
Warehouse. They are Star, Snow Flake, and Entity-Relationship. And also to
identify the summarised data to be pulled from transactional database to data
warehouse.
·
Extracting Transactional Data:
Extraction of data from internal and external sources and apply transformations
as per business rules and put in a structured and standarised format, then load
into the Data Warehouse.
·
Data Availability:
Data is to be available for analytic application and decision-making
application.
·
Data Warehouse Strategy:
The strategy should be such that it should help in achieving business goals and
the data warehouse should be flexible enough to confirm to the business need as
and when business strategy changes. The strategy should include how long data
can be archived?
·
Data Quality:
The most important aspect of Data warehouse is data quality and maintenance.
The data pumped into data warehouse should match with the business rule and
also measure the data quality. If the qualitative data is not achieved, it is necessary
to change the data transformation process to ensure that good quality of data
is pushed into data warehouse.
![]()


![]()

![]()



![]()
![]()


![]()
![]()

![]()
![]()


![]()
DATA
WAREHOUSE ARCHITECTURE
Data Access Tools
The
following are the front-end tools for user interaction. They support both dynamic and preplanned
analysis. They use Meta Data for accessing the warehouse. They can be
classified into five main groups:
·
Data
query and Reporting Tools
·
Application
Development Tools
·
Executive
Information Systems Tools
·
On-line
Analytical Processing tools
·
Data
Mining tools
On-line Analytical Application (OLAP)
Since,
Data warehouse has to cater to ad-hoc and complex queries, it uses a special
type of architecture called “Multi-Dimensional Data Bases (MDDB)” which can be
implemented using relational technology with star schema.
Multi-dimensional
data may reside in spreadsheets, relational databases, or legacy data. The
data-access and analysis tools must be able to take enterprise data from a
variety of sources and give work groups the accessibility, power, and
flexibility the need to view it in every conceivable way. Only multidimensional analysis provides a
clear picture of the business at any given time. OLAP is the answer.
Characteristics of OLAP
·
The
ability to scale large volumes of data and large numbers of concurrent users.
·
Provides
fast, interactive response time.
·
Provides
for analyzing time series.
·
Supports
‘What if’ analysis and planning in a multi user read/write environment.
·
Robust
data access security and user management.
· Availability of a wide variety of viewing and analysis tools and support different user community.
Data
Mining is the process of extracting hidden information from databases. Data Mining also helps in predicting future
trends and behaviour allowing business to make pro-active and knowledge driven
decision.
Data
Mining is often viewed as a corollary to Data Warehousing because of the
necessity to integrate and derive new information that transactional systems do
not provide.
Data
Warehousing allows building the Data Mountain.
Data Mining allows shifting the mountain down to the level of essential
information that is useful to the business.
The metaphor here is that some nugget of gold hidden in the mountain of
data and Data Mining can find the gold, which would otherwise be too costly or
too difficult to find without Data Mining tools.
Data Mining Techniques
·
Classification:
This technique is used to classify database records into a number of redefined
classes based on certain criteria. For example, a bank wants to classify its
customer records as good, medium, or poor risk based on the attributes income,
and age. A generated rule could be that
a customer in the age group between 50 and 60 with an income greater than
Rs.50,000 are a good credit risk.
·
Association
Rules: These techniques are often used for market basket analysis and discover
rules that are hidden between the attributes.
For example, 60% of all customers those who buy coke also buy coke on
Friday afternoon. The percentage of occurrence is the confidence factor.
·
Sequencing:
This technique helps identify patterns in time series. This is useful for stock market predictions
or for catalogue companies. For
example, they might discover that buyers of toys buy learning software for
children five years later.
·
Decision
Tree: It is predictive model that can be perceived as a tree. Each branch of the tree is a classification
question and the leaves are the partitions with their classifications.
·
Neural
Networks: Neural networks try to stimulate the human brain. The nodes of the neural networks are
connected and every connection has a weight.
The difference between the output and the training output is called the
error and is “Back-Propagated” through the net. The weights are adjusted according to the error.
·
Neural
networks can handle noise in the data and are good for most of the problems,
but the knowledge cannot be analysed like for the decision trees. Therefore, neural networks can be pursued as
a black box only.
·
Genetic
Algorithms: Genetic algorithms stimulate the biological evolution. The attributes are coded like the DNA. A lot of individuals are generated and from
generation to generation they change their DNA with operators like mutation and
crossover. The survival of the fittest
principle selects only individuals that are better than the generation before.
·
Rule
Induction (Association Rules): All possible patterns in the database are
systematically pulled out and then the accuracy (confidence) and the coverage (support)
are calculated. The rules are easy to
understand.
Data
mining plays a leading role in the every facet of CRM. Only through application of Data mining
techniques, which is a part of Analytical CRM, one can turn large volumes of
data collected from various front-end systems like operational CRM/CIC into
meaningful customer knowledge.
At
this stage it is pertinent to understand the difference between (i) Data (ii)
Information and (iii) knowledge.
For
example a number like 3000 is data.
Adding a context to data, converts data into information. Let us add context i.e. Rs to number 3000
making it Rs 3000. Rs 3000 convey much
more meaning than number 3000.
Rule
contains (more context in a relative Sense) compared to Information.
If
the salary of a person is Rs 3000 p.m. he is poorly paid
A
generalized rule for example
If
salary of a person is between Rs 2000 – Rs 5000 p.m., he is poorly paid:
Represents
Knowledge
So
in essence data mining technology converts data contained in database to form a
rule, and generalized rule, which is knowledge.
In
other words, it summarizes the information contained in thousands of records to
few rules say 20 or 30, which is more actionable and meaningful.
Major
techniques for converting Data into information/knowledge are: