User interface language: English | Español

Date May 2019 Marks available 1 Reference code 19M.2.HL.TZ0.12
Level HL Paper 2 Time zone no time zone
Command term Identify Question number 12 Adapted from N/A

Question

Figure 2 below shows a web graph that is a simplified representation of the World Wide Web.

Figure 2: a simplified representation of the World Wide Web

Web crawlers move through the web, indexing pages to provide information for search engines. When a web crawler arrives at a page it uses several criteria to decide whether to index that page or not.

Identify the nodes that represent web pages in the strongly connected core (SCC).

[1]
a.i.

Identify the nodes that represent web pages connected by a tube.

[1]
a.ii.

Outline why web page E would be given a higher ranking than web page C using the PageRank algorithm.

[2]
b.

Identify three criteria that may be used by a web crawler to decide whether to index a web page or not.

[3]
c.

The following information shows the number of active users (in millions) on different social media sites.

Discuss whether the application of power laws is appropriate to predict the future number of active users on these social media sites.

[5]
d.

Markscheme

Award [1 max].
D, E, F, G;

a.i.

Award [1 max].
H & K;
Accept H & L & K;

a.ii.

Award [2 max].
Both pages have the same number of in-links (i.e. 2);
However, the links to page E come from pages that have a greater number of in-links than the pages that link to C / E is connected with SCC while C has two in-nodes / in-links;
The PageRank algorithm counts links to pages recursively;
A PageRank algorithm will give a greater weighting to the pages that link to E;
Therefore, the algorithm will place E higher up the ranking than C;

b.

Award [3 max].
Whether there are any meta-tags present that restrict / guide indexing (e.g. a “robot exclusion protocol”);
Whether there is a robots.txt file linked to the page that gives instructions to the web crawler;
Whether the page has broken / dead links;
Whether the page content / meta information matches any specialism/type sought by the web crawler (e.g. some crawlers specifically target academic content);
There is no header with meta-tags;
Whether the page has ever been indexed before;
Whether the page has changed since it was last indexed;
How long ago / how frequently the page has been indexed (web-crawlers will tend not to index pages too frequently as this increases load on the web server);

c.

Award [5 max].

Reasons why power laws may be appropriate
The number of users shown for each site appears to follow the general pattern of a power law distribution / is consistent with the general principles of power laws (the “rich get richer”);
It's likely/reasonable to assume that sites with large numbers of users will tend to attract more new users than sites with fewer users;
This may be particularly true of social media sites where a high number of existing users may equate to a more diverse/engaging/attractive experience for new users;

Reasons why power laws may not be appropriate
However, correlation does not equal causation. / Just because the sites appear to exhibit a power law distribution, it doesn't mean that their growth is governed by a power law;
Other factors may be more significant (e.g. the demographic / region a social media site attracts, changing fashion, the policies of the sites themselves);
Sites that were very popular in the past but diminished/died-out may suggest that power laws are not the only / main factor influencing future development (e.g. MySpace);

d.

Examiners report

Most candidates answered this correctly.

a.i.

Most candidates answered this correctly.

a.ii.

Most candidates answered this correctly but some missed making clear reference to the actual links in the question.

b.

Most candidates answered this correctly though a few missed covering all the specifics of the question.

c.

Most candidates did not structure the answers so they could not meet the requirement of the specifics of this question.

d.

Syllabus sections

Option C: Web science » C.5 Analysing the web
Option C: Web science

View options