Representativiteit
What is representative?
A better definition

What is representative?

"The conclusions are based on a representative sample of 1,000 persons". Such statements are not uncommon in stories in the media about survey research. But what does it mean? What is a "representative sample"? And if a sample is representative, does it mean the survey outcomes are reliable? All this is usually not explained in the media.

William Kruskall     Frederick Mosteller

Two statisticians, William Kruskal and Frederick Mosteller, carried out an extensive investigation of the use of the word "representative" in the scientific and non-scientific literature. It turned out that "representative" can mean many different things. They found the following meanings for "representative sampling":

  • General acclaim for data. It means not much more than a general assurance, without evidence, that the data are OK. This meaning of ‘representative’ is typically used by the media, without explaining what it exactly means;

  • Absence of se¬lective forces. No elements or groups of elements were favored in the selection process, either consciously or unconsciously;

  • Miniature of the population. The sample can be seen as a scale model of the population. The sample has the same characteristics as the population. The sample proportions are in all respects similar to population proportions;

  • Typical or ideal case(s). The sample consists of elements that are ‘typical’ of the population. These are ‘representative elements’. This meaning probably goes back to the idea of "l’homme moyenne" (average man) that was introduced by the Dutch/Belgian statistician Quetelet in 1835;

  • Coverage of the popu¬lation’s heterogeneity. Variation that exists in the population must also be encountered in the sample. So, the sample should also contain atypical elements;

  • A vague term, to be made precise. Initially the term is simply used without describing what it is. Later on, it is explained what is meant by it;

  • A specific sampling method has been applied. A form of probability sampling must have been used giving equal selection probabilities to each element in the population;

  • As permitting good estimation. All characteristics of the population and the variability must be found back in the sample, so that it is possible to compute reliable estimates of population parameters;

  • Good enough for a particular purpose. Any sample that shows that a phenomenon thought to be very rare or absent occurs with some frequency will do.

A better definition

Due to the many different meanings the term "representative" can have, it is better not to use it in practice unless it is made clear what is meant by it. It is proposed here to use the word "representative" only in one way: a sample is said to be representative with respect to a variable if its relative distribution in the sample is equal to its relative distribution in the population.

Example: a radio listening surveyk
An online survey asks respondents for their gender and age. Since the distribution of sex by age in the population is known, it is possible to compare the sampling distribution with the population distribution.

  Male Female   Young Middle Old
Population 50% 50%   30% 40% 30%
Sample 51% 49%   60% 30% 10%

The sample consists for 51% of males and for 49% of females. These percentages are colsoe to the corresponding population percentages. So we can say the sample is representative with respect to gender.

A look at age shows there is something wrong. The sample contains too many young persons and too few elderly. Therefore, the sample is not representative with respect to age.

The foundations of survey sampling learn that samples have to be selected with some kind of probability mechanism. Intuitively, it seems a good idea to select a probability sample in which each element has the same probability of being selected. It produces samples that are ‘on average’ representative with respect to all variables. Indeed, this is a scientifically sound way of sample selection, and probably also the one is most frequently applied in practice.