Surveys
What is a survey?
Selecting a sample
Estimation
Demonstration: Sample size and precision
Margin of error
Demonstration: confidence intervals

What is a survey?

Survey and poll are different names for the same thing. It is a type of research in which questions are asked of a large number of people. The questions measure opinions, attitudes and behavior of those people. The answers are used to describe characteristics of this group of people.

Surveys are not about describing the behavior of individual persons, but about the behavior of the group as whole. This group is called the population. The behavior of the population is summarized in quantities like totals, means and percentages.

Example: a radio listening survey
A radio listening survey measures the radio listening habits of people in a certain town. The population consists of all inhabitants from a certain age ( for example 13 years and older). Questions could ask whether people listen to the radio station, any how many hours people listen (if they listen). The answers allow you to compute the percentage of people in the town that listen to the radio station, and also how long (on average) people listen.

Selecting a sample

AIf you we want to describe the behavior of a group, it seems inevitable at first sight to investigate each member of the group. This is how it used to be in the old days of, say, before the year 1895. And it is still the principle of censuses that are carried out every 10 years in many countries. However, asking questions of millions of people is costly and time-consuming. Moreover, many people do not like to be bothered about answering questions.

These problems can be avoided or reduced by just investigating a sample of members of the group. It is possible to draw conclusions about the group as a whole just using sample data. However this can only be done in a reliable way if the sample is selected properly. Two conditions have to be fulfilled:

  • The sample must be drawn using a random selection mechanism (a randomizer).
  • Each member of the groups must have the same probability of being selected.

A sample that has been drawn using some kind of random selection mechanism, is, is called a probability sample. A sample that has been drawn with equal probabilities, is called a simple random sample.

Note that it is also possible to select a sample with unequal probabilities. The computation of estimates for population means and percentages becomes somewhat more complex, because they have to be corrected for these unequal probabilities. This is outside the scope of this website.


Estimation

You use the sample data to compute estimates of population characteristics. If you have drawn a simple random sample, the sample percentage is a good estimator of the population percentage. And the sample mean is a good estimator of the population mean.

What is a good estimator? The sample mean and percentage have, in case of simple random sampling, the following important properties:.

  • These estimators are unbiased. If you would repeat the survey a large number of times, the average value of the estimator will be equal to the population value to be estimated. There is no systematic under- or over-estimation.
  • The sample size determines the precision of the estimator. A larger sample size will result in more precise estimates..
The precision of the estimators is not related to the size of the population. A larger population does not require a larger sample to obtain the same precision.


Demonstration: Sample size and precision

The demonstration below shows the relation between sample size and precision of estimators.

There will be general elections in the country of Samplonia. The National Elderly Party (NEP) seems to do well in the campaign. An opinion poll is carried out to estimate the percentage of voters this party will attract. To determine how precise the estimator is, sample selection is repeated a large number of times. The percentage of voters is computed for each sample. The distribution of all these estimates is shown in a histogram.

The average of all estimates is computed. The estimators is unbiased if this average is (approximately) equal to the true population percentage (25.4%).

You can observe that estimates are closer to the true value if a larger sample size is selected.

To carry out a simulation, you first set the sample size. You do that by clicking on the green square adjacent to Sample size. There are three possible sample sizes: 200, 400 or 800. You start the simulation by clicking on Start.


Margin of error

You use sample data to estimate population parameters. Estimates will never be exactly equal to the value of population parameter. However, estimates can be close to the true value. But what is close?

Because you draw a probability sample, you can apply probability theory. It can be shown that random quantities like the sample mean and the sample percentage have approximately a normal distribution. This allows you to compute how far away the estimate can be from the true value. This results in a confidence interval

The confidence interval is characterized by a lower and an upper bound. The true value is very likely to be contained in this interval. Usually a confidence level of 95% is chosen. This leads to a 95% confidence interval. You can say for such an interval that it will contain the true value with a probability of 95%.

Example: a radio listening survey

To get some idea of how many people are listening to a local radio station, a sample of 1000 inhabitants of the town is selected. It turns at that 30% of the sample regularly listen to this radio station. The corresponding 95% confidence interval has a lower bound of 27% and an upper bound of 33%. This means that you can be 95% confident that the true population percentage will be between 27% and 33%.

There is a direct relationship between the width of the confidence interval and the size of the sample. A larger sample will results in a smaller confidence interval. So, estimators will be more precise.

When publishing the results of your survey, it is important that you not just give estimates but also margins of errors. This provides the users of the figures with a clear indication of the precision of the results.

Example: an opinion poll in the U.S.

An opinion poll was carried out on 17 March, 2003. A small sample of Americans was asked (by telephone) whether they approved Bush decision to go to war against to Iraq. In the sample 66% approved, while 30% disapproved.

Not only were these estimates published, but also the margin of error. This was equal to 4.5%. This means the confidence interval runs from 61.5% to 70.5%. You can conclude the percentage in the population approving the decision will very likely be between 61.5% and 70.5%.


Demonstration: confidence intervals

The demonstration below shows how the sample size affects the width of the confidence interval.

There will be general elections in the country of Samplonia. The National Elderly Party (NEP) seems to do well in the campaign. An opinion poll is carried out to estimate the percentage of voters this party will attract. To determine how precise the estimator is, sample selection is repeated a large number of times. The percentage of voters and the 95% confidence interval is computed for each sample. These confidence intervals are shown below as horizontal blue line segment.

Two aspects of confidence intervals can be explored is this simulation experiment:

  • The width of the intervals. A larger sample size will produce smaller intervals. Therefore, estimators will be more precise;

  • The confidence level. 95% confidence intervals are computed in this experiment. This implies that, on average, 95% of the confidence intervals will have to contain the true population value (25.4%).

To carry out a simulation experiment, you first have to set the sample size. You do that by clicking on the green square under Sample size. You can choose a sample size of 200, 400 or 800. You start the simulation by clicking on the green square adjacent to Start.

The simulation keeps on going until you click on the red square adjacent to Stop. The percentage in the upper right corner contains the percentage of intervals containing the population value.

You can draw the conclusion that the confidence level does not depend on the sample size, but the precision does.