Date | November 2018 | Marks available | 2 | Reference code | 18N.2.HL.TZ0.4 |
Level | HL | Paper | 2 | Time zone | no time zone |
Command term | Outline | Question number | 4 | Adapted from | N/A |
Question
The Ministry of Tourism intends to include data from a number of sources in a data warehouse in an attempt to improve the services the tourism companies offer to their customers.
The Ministry of Tourism will use data mining to provide the Government with information that it can use to promote and support tourism in the country.
Identify four sources of information that could be incorporated in the data warehouse.
Outline two differences between a data warehouse and database.
Explain how the Extract, Transform, Load (ETL) processes can be used to address the problems related to data migration.
Outline one ethical problem that may result from data mining.
Explain how cluster analysis can be used to improve the advertising strategy of the tourism companies.
Explain the importance of link analysis in exploring patterns in data mining.
Markscheme
Award up to [4 max].
Award [1] for each source of information up to [4 max].
For example,
Sources of travel information such as airports, ports, bus/train;
Sources of personal data about tourists (such as age, gender, etc.);
Sources of social data about tourists (such as family status, occupation, economic circumstances, etc.);
Sources of information about the tourism products and services these tourists booked such as:
- accommodation (hotels, camps, apartments);
- attractions (natural parks, palaces, museums);
- activities (yachting, biking, diving, fishing, surfing);
Award up to [4 max].
Award [1] for each identifying a difference between a data warehouse and a database and [1] for an expansion up to [2 max].
Mark as [2] and [2].
Database is based on operational processing;
Whilst data warehouse is based on informational processing;
The operations on databases consist of transactions;
Whilst operations on data warehouse consist of queries;
Database is used for everyday transactions;
Whilst data warehouse is used for decision support;
Database mainly stores current data;
Whilst data warehouse stores historical data;
Data in the database is always up to date/accurate;
Whilst the data in data warehouse is maintained over time;
Data in the database is simple/primitive/detailed;
Whilst data warehouse holds summarized/consolidated data;
The type of access to data in database is read/write;
Whilst mostly read only access for the data stored in data warehouse;
Database size is smaller (for example in up to 10GB);
Than the size of data warehouse (for example 100GB to TB);
View of data in database is flat/relational;
Whilst the view of data in data warehouse is multidimensional;
Users (end-users) of the database are common users (for example customers, tourists, clerks, etc.);
Whilst end-users of data warehouse are knowledge users (for example, analysts, managers, executives);
The number of database users is greater;
Whilst the data in data warehouse is used by smaller number of people;
Award up to [3 max].
Extract data
Award [1 max] for an explanation of how the extraction process can be used to address the problems related to data migration.
Pulling data from different source databases/ from various source systems (e.g. MS Access, SQL Server, Oracle, etc.);
(Cleanse and) transform data (so that data is clean and useful for a purpose)
Award [1 max] for an explanation of how the transformation process can be used to address the problems related to data migration.
Trimming for white space/ proper data type/do some validation, etc.;
Load data
Award [1 max] for an explanation of how the load process can be used to address the problems related to data migration.
Transfer data into data warehouse/ data mart/operational data store so it can be used for the mining processes within the data warehouse;
Award up to [2 max].
Award [1] for each identifying an ethical problem that may result from data mining and [1] for an expansion up to [2 max].
Data mining raises privacy concerns when people are traced;
Their actions are analyzed without their knowledge;
Data mining creates people/tourist/customer files with a tendency of judging and treating people on the basis of group characteristics;
Instead of on their own individual characteristics;
Data mining can provide information that could be used unethically;
For example, increasing profits by selling this information to others / or by targeting return tourists with information obtained on their previous visits;
Award up to [4 max].
Award [2 max] for evidence that candidates understand how cluster analysis is used and [2 max] for an explanation related to tourism products/services.
Cluster analysis:
Subdivide tourists into distinct subsets;
Where any subset may be selected as a market target;
Explanation related to tourism products/services:
Subsets could be “young tourists with limited budget”;
For this subset music festivals, sport could be advertised to ensure the hotel can maximize its revenue;
Subset of tourists traveling with young children;
Children activities / sports / games available / “kids hotels” could be advertised to ensure the hotel can maximize its revenue;
Subset of older tourists influenced by cost–value considerations and/or secure/or quiet environment;
So these hotels can be configured to specifically meet their needs and promoted only to these tourists such as advertise “stay 7 days and pay only 6 days” / advertise hotel located in the quiet place/secure place / at some distance from the center;
Award up to [3 max].
Award [1] for each identifying the importance of link analysis in exploring patterns in data mining and [1] for each subsequent expansion up to [3 max].
Link analysis technique is used to analyse connections/links between small instances of relational databases / only on required datasets of database/;
(But) it can also be applied on large number of databases to show the relationships between these databases;
Link analysis is important because it discovers new relations in relational databases / new patterns of interest / checks the similarity between the datasets / finds anomalies where old/known patterns are violated;