User interface language: English | Español

Date May 2019 Marks available 2 Reference code 19M.2.HL.TZ0.4
Level HL Paper 2 Time zone no time zone
Command term Outline Question number 4 Adapted from N/A

Question

ZCC has a chain of offices that sell different types of paper to customers all over the world. They have data stored in their data warehouses that will help them make important marketing decisions for the future, as they have plans to diversify into other products like gift-wrappers, scribble-pads, stationery, books and calculators.

ZCC is going to use data mining techniques to discover patterns in their data.

The company has customers who have missed the payment deadline for their purchases from ZCC.

Outline why data warehousing is time dependent.

[2]
a.i.

Outline one reason why ZCC uses a data warehouse.

[2]
a.ii.

Outline why transformation of the data is necessary prior to it being loaded into the data warehouse.

[2]
b.

Compare cluster analysis and classification as techniques for discovering patterns in ZCC's data.

[6]
c.

Describe how the process of deviation detection can be applied to identify customers who are likely to miss the payment deadline for their purchases from ZCC.

[3]
d.

ZCC is aware that other data mining and detection techniques will allow more informed marketing decisions to be made.

Explain how database segmentation and link analysis can be used by ZCC to improve their marketing strategies.

[5]
e.

Markscheme

Award [2 max].
Data warehouses contain both historical and current data;
Timestamps are required to compare data from different times;

a.i.

Award [2 max].
Increased Query and System Performance;
data warehouse is built for analysis and retrieval of data rather than efficient upkeep of individual records (i.e. transactions);

Timely Access to Data;
ETL, are used within a data warehouse environment. These routines consolidate data from multiple source systems and transform the data into a useful format that enable quick querying;

Enhanced Data Quality and Consistency;
Data from the various business units and departments is standardized and the inconsistent nature of data from the different sources is removed;
Individual business units will start to utilize the same data repository as the source system for their individual queries and reports;

Historical Intelligence;
Data warehouse stores large amounts of historical data and time-period analysis, trend analysis, and trend prediction thus allowing for advanced reporting and analysis of multiple time-periods;

a.ii.

Award [2 max].
When the data is collected from different sources each source will have their own standards if we have two different data sources A and B;
Selection of the data that is going to be useful in analysis – the different offices will have data relevant only to them, such as staff names, etc.;
Standardization of the data – the company may have imposed a standard on all its offices, but this is not always the case, and the data will certainly have to be Checked, e.g. date formats are different in different countries;
Other transformation techniques that students may give as examples are more like:

b.

Award [6 max].
Award [1] for a description of cluster analysis
Award [1] for description of cluster analysis being used to find patterns
Award [1] for a description of link analysis
Award [1] for a description of link analysis being used to find patterns
Award up to [2] for comparison between the two techniques

Cluster Analysis
Cluster analysis groups customers by age / according to different factors, such as region/location;
Therefore, it enables comparison between groups;

Classification
A classifier or model is developed using training sets of data;
New data is then added to the model and compared against the predicted outcomes / new classifiers may be developed;

Discussion
Classification requires prior knowledge of the customer base, cluster analysis does not;
Data can be classed in new samples using classification whereas for cluster analysis only suggests groups based upon patterns in data;
Labelled samples from a set of classes is required for classification whereas for cluster analysis unlabelled samples will do;

c.

Award [3 max].
Access the customer payment details for purchases of fairly large orders from the data warehouse;
For a given group (and for a particular period/given a range for timestamp) of customers;
Identify the outliers/customers who have defaulted more than a fixed number of times in payment after running the deviation detection (multiple if-then-else statements) algorithm;
List the names of those customers;

d.

Award [5 max].
Award [1] for an outline of customer segmentation
Award [1] for an outline of link analysis
Award up to [3 max] for use by ZCC to improve marketing strategies

Customer segmentation
Divides a customer base into groups of individuals that are similar in specific ways relevant to marketing, such as age, gender, order of the type of paper, frequency of orders placed and the size of the orders normally placed;
assign each customer to one of the segments;
A typical segmentation makes each segment distinct from other segments (different segments have different needs), it is homogeneous within the segment (exhibits common needs);
segmenting is using borders to form groups;
Segmentation groups objects into similar groups;
The resulting groups contain members that are more similar to each other than they are to other groups;

Link Analysis
Load a claim and runs a query back to the database to find all other claims sharing any similar attributes;
show matches on the address of a claimant in the original case being investigated;
Now combine matches-merge identical nodes. So, we can more easily see unusual connections;
Once seen a suspicious link, accept or escalate;
Representing data as a network offers an engaging way for analysts to rapidly understand events;

e.

Examiners report

Most candidates were able to identify why data warehouses required timed data.

a.i.

Many candidates were unable to clearly explain why data warehouses are advantageous for a company.

a.ii.

Most candidates were able to describe why the data from different data sources needs to be standardized before being loaded into a data warehouse.

b.

Many candidates were able to give generic descriptions of cluster analysis and classification, but they were unable to discuss the differences between the two in any detail.

c.

Most candidates demonstrated a good understanding of data deviation in this situation.

d.

The majority of candidates were only able to discuss data segmentation and link analysis at an abstract level.

e.

Syllabus sections

Option A: Databases » A.4 Further database models and database analysis
Show 33 related questions
Option A: Databases

View options