Why weighting? Ideally, a selected sample is a miniature of the population it came from. This should be reflected in the sample being representative with respect to all variables measured in the survey. Unfortunately, this is usually not the case. One of the problems is non-response. It may cause some groups to be over- or under-represented. Another problem is self-selection (in a online survey). If such problems occur, no reliable conclusions can be drawn from the observed survey data, unless something has been done to correct for the lack of representativity. A commonly applied correction technique is weighting adjustment. It assigns an adjustment weight to each survey respondent. Persons in under-represented get a weight larger than 1, and those in over-represented groups get a weight smaller than 1. In the computation of means, totals and percentages, not just the values of the variables are used, but the weighted values.

 Required: auxiliary variables A weighting adjustment technique can only be carried of proper auxiliary variables are available. Such variables must have been measured in the survey, and there population distribution must be available. Typical auxiliary variables are gender, age, marital status and region of the country. The population distribution of such variables can usually be obtained from national statistical institutes. By comparing the observed frequency distribution of a variable with its population distribution, you can establish whether the survey response is representative with respect to this variable. If there substantial difference between the response distribution and the population distribution, you can draw the conclusion that there is a lack of representativity with respect to this variable.

Weighting adjustment with one auxiliary variable

Here is a simple example of weighting adjustment with one auxiliary variable. Suppose on online survey has been carried out. Among the variables measured is the age of respondents. Because the population distribution is age is available, we can compare the response distribution of age with the population distribution.

 Young Middle Old Population 30% 40% 30% Sample 60% 30% 10%

The response consists for 60% of young persons, for 30% of middle-age persons and for 10% of elderly. These percentages are different in the population. For example, the population consists for 30% of young people. Clearly, the young are over-represented in the response. You can conclude the response is not representative with respect to age.

We can make the response representative with respect to age by assigning to the young a weight equal to

30.0 / 60.0= 0.500.

This weight is obtained by dividing the population percentage by the corresponding response percentage. The weight for middle-age persons becomes

40.0 / 30.0 = 1.333.

The weight for the elderly becomes

30.0 / 10.0 = 3.000.

The weight assigned to young people is smaller than 1. This is not surprising as they are over-represented in the survey. After weighting each young person does not count for 1 person any more but just for 0.5 person.

The elderly are under-represented in the survey. Therefore their weight is larger than 1. After weighting, each elderly persons counts for 3 persons.

Suppose, you use the weighted response to estimate the percentage of young people. The weighted percentage is equal to

0.500 x 60% = 30%

This is exactly equal to the percentage of young people in the population. Also the percentages for the other age categories will be estimated exactly. So, the weighted response is representative with respect to age.

 Weighting adjustment with two auxiliary variables What to do if more auxiliary variables are available? We can also make a division into groups. In case of one auxiliary variable, there are as many groups as the variable has categories. For example, there are two groups for the variables gender: males and females. In case of more variables, the number of groups is equal to the product of the numbers of categories of the variables. Suppose you have the auxiliary variables gender (two categories) and age (three categories young, middle-age and elderly). Combining all possibilities of gender and age leads to 2 x 3 is age different groups.: young men, middle-age men, elderly men, young women, middle-age women and elderly women. If you know the population of the six groups (the population percentage for each combination of gender and age), a weight can be computed for each group. If you weight your response by gender and age as described above, the weighted response will be representative with respect to gender and age. Even more, the response is also representative with respect to age within each gender category), and representative with respect to gender within each age category.

 Weighting adjustment with more auxiliary variables It is important use as many auxiliary variables as possible in a weighting adjustment technique. The idea behind this is the following: if you make the response representative with respect to as many auxiliary variables as possible, it is not unlikely the response also becomes representative with respect to the other survey variables. It should be stressed that weighting adjustment is only effective if the auxiliary variables used are correlated with important survey variables and/or with response behaviour.