Date | November 2019 | Marks available | 4 | Reference code | 19N.2.HL.TZ0.4 |
Level | HL | Paper | 2 | Time zone | no time zone |
Command term | Explain | Question number | 4 | Adapted from | N/A |
Question
The collection, storage and sharing of data is becoming increasingly important for organizations who have a choice about which type of database to use to store their data. Two examples of database types are relational and object-oriented.
The 2016 US presidential election was seen to be a victory for data analytics. Companies that specialize in analytics use data warehouses.
Explain two advantages of using a relational database rather than an object-oriented database.
State two characteristics of a data warehouse.
Outline why data needs to be transformed before it can be loaded into the data warehouse.
Outline why opinion poll data and other election data are timestamped when added to the data warehouse.
Outline why analytics companies use link analysis.
Outline why analytics companies use deviation detection.
Once data has been loaded into a data warehouse it can be mined. The use of data analytics is believed to have been important to the outcome of the US election campaign.
Discuss whether the advantages of data mining techniques in this scenario outweigh the disadvantages.
Markscheme
Award [4 max].
Standards and support are available for RDB…
…make it more stable / easier to resolve issues / easier to recruit staff;
More user tools exist for RDBs…
…such as report generators / mail merge / security level permissions / concurrent access.
Easier to visualise data and relationships…
…so more likely to have a correctly modelled database i.e. has no redundancy / improved integrity
RDB tables and relationships are simple to implement…
…an OODB requires an understanding of concepts of OOP;
Mark as [2] and [2].
Award [2 max].
Repository data stored are historical/time variant;
Data is collected from different sources;
OLAP systems for reporting and data analysis (e.g. data mining) / provides businesses with information for informed decisions;
Provides tools so that data can be validated, reformatted, reorganized, summarized, and restructured;
Optimised for data retrieval;
Award [2 max].
Data is from my different external sources and (therefore) in many different formats;
For example, dates may be dd/mm/yy or mm/dd/yy or yy/mm/dd (allow any valid example);
To allow meaningful analysis, it must be in the same format/standardised;
Award [2 max].
Notes: maximum marks only if reference is made to the scenario
Do not award marks just for a description of time-stamping
The usefulness of information is often time dependent;
Example relating to the US presidential election, such as
Electoral opinions before a public debate may have less value than those after the debate;
Award [2 max].
Notes: don’t accept the word “link” on its own as a descriptor.
maximum marks only if reference is made to the scenario.
They use link analysis in order to establish relationships / associations between different data sets / different entities in the same data set;
Examples relating to the US presidential election, such as:
How people voted in relation to some other factor, e.g. the level of use of social media / where they took their vacations / size of family …;
Award [2 max].
Note: maximum marks only if reference is made to the scenario
They look for any unusual activity (anomaly pattern) in transactions;
Examples relating to the US presidential election, such as
Unusual switch in pre-electoral voting opinions;
Sudden pro-candidate or anti-candidate sentiment in a particular state;
Award [6 max].
Mark as follows:
Award [1] for a generic advantage of data mining;
Award [1] for expanding on this advantage;
Award [1] for linking this to the scenario;
Similarly for a disadvantage;
Award [1] for a valid conclusion;
Advantages of data mining [3 max].
Clustering / cluster analysis allows objects to be treated as one group enabling the uncovering of previously hidden patterns;
For example, cluster analysis may search groups by race or gender to discover if a candidate is unpopular with a demographic;
Classification methods (e.g. genetic, rough set, fuzzy set) can be used to recognize patterns that describe the groups to which an item belongs;
For example, classifying voters by income may provide useful information that can affect future publicity strategies;
Association analysis allows a series of statistical relationships to be further explored or tested;
Associations look for If-then rules that predict a particular stance on a controversial topic (e.g. abortion) may influence the religious voters;
Disadvantages of data mining [3 max].
Data mining is based on the data collected from individuals;
This data may be sensitive personal information that the individual concerned may not want to be shared;
This personal data may be reaggregated to compromise the privacy and/or anonymity of the data subjects;
Conclusions [1 max].
The development of more sophisticated processing algorithms is inevitable, so although there are potential concerns about the invasive nature of data mining, providing sufficient safeguards are put in place, there is nothing inherently wrong with this;
Data mining is the start of the slippery slope of the state or multinational companies holding inappropriate quantities of personal data about citizens that is of limited value. Therefore, unless the privacy and/or anonymity of the data subjects can be guaranteed, this is an unethical practice;
Examiners report
Most candidates were able to identify at least one advantage but were unable to clearly explain them.
Most candidates were able to identify the characteristics of a data warehouse.
Not well answered as answers were very generic. Candidates were aware that data is from different sources but were not aware of it having many different formats which need to be standardized to allow for a meaningful analysis.
Not well answered. Candidates were not aware of the reason of having data timestamped when added to the data warehouse, and therefore very few candidates were able to connect the answer to the given scenario.
Not well answered. A few candidates mentioned the use of link analysis to establish relationships between data sets but were unable to make a good reference to the given scenario.
Many candidates were aware of the definition of deviation detection, and answered the question from this perspective, but failed to give actual examples related to the scenario and make appropriate connections.
Many candidates were aware of the definition of deviation detection, and answered the question from this perspective, but failed to give actual examples related to the scenario and make appropriate connections.