Web surveys
Advantages and disadvantages of web surveys
A traditional survey or a web survey?
Demonstration: an opinion poll
The biologist and the fish

Advantages and disadvantages of web surveys
A web survey is a special type of survey where respondents complete questionnaire forms on the Internet. Web surveys have grown enormously in popularity. The reason for that is that, at first sight, web surveys have a number of attractive advantages:
  • A web survey provides simple access to a large group of potential respondents. There are examples of web surveys in The Netherlands where 30,000 or 40,000 people participated spontaneously in a short period of time;

  • Setting up a web survey is cheap. No interviewers are required to visit or call respondents. There are no printing and mailing costs;

  • A web survey can be carried very fast. After the questionnaire has been uploaded, data collection can start straight away. There are examples of web surveys that have been carried out in one day;

  • Questionnaires on the Internet offer interesting extra possibilities, like use of pictures, sound, animation, video, etc.
Web surveys not only have disadvantages but also disadvantages. These disadvantages affect the reliability of the survey results:
  • Under-coverage. This is the phenomenon that certain groups of people cannot be selected in the sample. This is typically the case for web surveys, where people without Internet will never be able to participate. Particularly, the low educated, the elderly and people with a different ethnic background will be under-represented. Therefore, survey results will not apply to the population as a whole, but only to the Internet-population.

  • Self-selection. No probability sampling is used to select the sample. Instead, the questionnaire is put on the Internet, and the researcher just Waits and sees who participates. There is no control over the selection mechanism.

Example: the 2005 Book of the Year Award

Self-selection surveys are at risk that certain groups attempt to manipulate the outcomes. An example of this effect could be observed in the election of the 2005 Book of the Year Award (Dutch: NS Publieksprijs), a high-profile literary prize. The winning book was determined by means of a poll on a website. People could vote for one of the nominated books or mention another book of their choice.

More than 90,000 people participated in the survey. The winner turned out to be the new interconfessional Bible translation launched by the Netherlands and Flanders Bible Societies. This book was not nominated, but nevertheless an overwhelming majority (72%) voted for it. This was due to a campaign launched by (among others) Bible societies, a Christian broadcaster and Christian newspaper.

The campaign was completely within the rules of the contest, but the group of voters was clearly not representative for the Dutch population.

A traditional survey or a web survey?

The differences between probability samples and self-selection can be illustrated using an example related to the general elections in The Netherlands in 2003. Various organisations made attempts to use opinion polls to predict the outcome of these elections (seats in parliament).

The results of these polls are summarised in the table below. Politieke Barometer, Peil.nl and De Stemming are opinion polls carried out by market research agencies. They are all based on samples from web panels. The polls were carried one day before the election.

Peil.nl De
Sample size   1000 2500 2000 2600
CDA 41 41 42 41 41
PvdA 33 37 38 31 32
VVD 22 23 22 21 22
SP 25 23 23 32 26
GroenLinks 7 7 8 5 7
D66 3 3 2 1 3
ChristenUnie 6 6 6 8 6
SGP 2 2 2 1 2
PvdD 2 2 1 2 2
PvdV 9 4 5 6 8
Andere 0 2 1 2 1
MAD   1.27 1.45 2.00 0.36

The Mean Absolute Difference (MAD) indicates how big the differences (on average) are between the poll and the election results. Particularly, differences are large for the more volatile parties like PvdA, SP and de PvdV.

DPES is the Dutch Parliamentary Election Study. The fieldwork was carried out by Statistics Netherlands in a few weeks just before the elections. The probability sampling principle was been followed here. A true (two-stage) probability sample was drawn. Respondents were interviewed face-to-face (using CAPI). The predictions of this survey are much better than those based on the on-line opinion polls.

Demonstration: an opinion poll

Opinion polls are often based on online data collection. This is not surprising, as it is a fast and cheap way of data collection. So, these polls can be carried out with a high frequency. The demonstration below shows the risk of drawing wrong conclusions.

There will be general elections in the country of Samplonia. The National Elderly Party (NEP) seems to do well in the campaign. An opinion poll is carried out to estimate the percentage of voters this party will attract. To determine how accurate the estimator is, sample selection is repeated a large number of times. The percentage of voters is computed for each sample. The distribution of all these estimates is shown in a histogram..

To carry out a simulation, you first set the sample size. You do that by clicking on the green square adjacent to Sample size. There are three possible sample sizes: 200, 400 or 800.

You can choose to select the sample from the whole population or just from those having Internet. You do that by clicking on the greens square under Survey. To start the simulation by clciking on Start.

If you select samples from the complete population, you will see that the estimates are nicely concentrated arount the true value (25.4%).However, if you select samples from just the Internet-population, estimates will be systematically too low.

Why are the estimates too low? The reason is that the elderly are under-represented in the Internet-population. And the elderly typically vote NEP. So, then percentage of voters will be to low.

The biologist and the fish

The problems of web surveys are once more illustrated by a simple story. Suppose you are a biologist and you have to determine the state of health of fish in two lakes. You want to do that by investigating a sample of fish. The problem is that you cannot the lake on the left. Therefore, you cannot investigate any fish there. For the lake on the right, you can only investigate fish that spontaneously jump out of the water.

The question is: what conclusion can you draw about the health of all fish in the two lakes, if you can only investigate fish in the lake on the right that spontaneously junp out of the water? What about fish that stay quietly on the bottom? What about the fish in the other lake?