Date | May 2018 | Marks available | 2 | Reference code | 18M.2.HL.TZ0.4 |
Level | HL | Paper | 2 | Time zone | no time zone |
Command term | Outline | Question number | 4 | Adapted from | N/A |
Question
Basking Coats is a business that sells textile products such as shirts, coats and trousers. The company was formed in 1970 and has numerous shops in Europe and South-East Asia. To ensure their marketing is targeted at appropriate customers, Basking Coats has asked Singalytics, a data analytics company, to assist them in improving their marketing strategy.
Extract, Transform, Loading (ETL) processes can be used to clean up data for use in a database warehouse. When ETL is carried out, certain precautions should be taken.
Data in the Singalytics data warehouse is stored with a timestamp.
Describe how deviation detection can be used to analyse this data.
Outline why data warehouses tend to use unnormalized data sets.
Identify three precautions to be taken before extraction is carried out on the database.
Outline why data warehousing is time dependent.
Explain why Basking Coats could use association analysis to improve the marketing of its products.
Basking Coats has decided to use an object-oriented database rather than a relational database to store its data.
Explain why Basking Coats would use an object-oriented database rather than a relational database to store its data.
Markscheme
Award up to [2 max].
Deviation detection is a statistical technique;
Appropriate marketing of product;
Which is used to detect outlying data that does not fit the assumed model;
Therefore it can be used to predict the trends and patterns of demand for certain consumer goods in the future;
Award up to [3 max].
To speed query execution, which can be especially important in data warehouses used by Singalytics;
May effectively be used in data warehouses as they contain pre-joined tables that package data for common uses;
If the data is all present in a single table, there will be no need for joins, hence the selects can be done very quickly;
A single table with all the required data allows much more efficient index usage;
If there is heavy read load and when the application is read intensive;
If the columns are indexed properly, then results can be filtered and sorted by utilizing the same index;
Award up to [3 max].
To retrieve all the required data from the source system with as little resources as possible;
Designed in a way that it does not affect the source system;
In terms or performance/ response time/ locking;
Makes it accessible for processing on the data;
Ensuring that historical data being extracted can be read by the current systems;
Ensuring the different data formats being extracted can all be converted or scrubbed to become readable by the system and able to be formatted;
Ensuring that the data is relevant to what the user wishes to extract and utilise;
Award up to [2 max].
The content in the data warehouse is only valid for a time period;
Because the data undergoes changes dynamically;
A data warehouse's focus on change over time is time variant;
Award up to [4 max].
Award [2 max] for the explanation and [2 max] for the example(s).
Associations:
Correlate the presence;
of a set of items with another range of values for another set of variables;
Breaks up data sets by variables such as gender, location, age;
It may be used to detect patterns independently from the geographic region,
females buy more dark colour trousers than males;
Examples:
when a female retail shopper buys a cotton shirt, she is likely to buy a stole.
Associations of the type Full arm formal shirts => Dark colour trousers; Full arm formal shirts => cufflinks may produce enough confidence and support to be valid association rules of interest;
If the application area has a natural classification of the item sets into hierarchies,
discovering associations within the hierarchies is of no particular interest;
Specifically its associations across hierarchies;
Note: For generic responses award [2 max].
Award up to [6 max].
Award [3 max] for the feature(s) and [3 max] for supporting explanation/examples.
Features:
Enhanced modelling capabilities;
Extensibility/support new data types;
Object DBMS stores more complex data and relationships;
Improved performance;
Reusability;
Eliminates need for user defined keys;
Eliminates need for Joins;
Examples:
Inheritance property, we can re-use the attributes (size, fabric) and functionalities;
It reduces the cost of maintaining the same data multiple times;
All this information is encapsulated and, there is no fear being misused by other objects. If we need any new feature we can easily add new class inherited from parent class and that adds new features;
Reduces the overhead and maintenance costs;
So it becomes more flexible if any changes and apparel business changes with fashion needs;
Codes are re-used because of the inheritance feature;