Non-sampling error

Characteristics
Coverage errors
Response errors
Non-response errors
Processing errors
Estimation errors
Analysis errors

Aside from the sampling error associated with the process of selecting a sample, a survey is subject to a wide variety of errors. These errors are commonly referred to as non-sampling errors.

Non-sampling errors can be defined as errors arising during the course of all survey activities other than sampling. Unlike sampling errors, they can be present in both sample surveys and censuses.

Non-sampling errors can be classified into two groups: random errors and systematic errors.

Random errors are the unpredictable errors resulting from estimation. They are generally cancelled out if a large enough sample is used. However, when these errors do take effect, they often lead to an increased variability in the characteristic of interest (i.e., the greater the difference between the population units, the larger the sample size required to achieve a specific level of reliability).
Systematic errors are those errors that tend to accumulate over the entire sample. For example, if there is an error in the questionnaire design, this could cause problems with the respondent's answers, which in turn, can create processing errors, etc. These types of errors often lead to a bias in the final results.

Non-sampling errors are extremely difficult, if not impossible, to measure. Since random errors have the tendency to be cancelled out, systematic errors are the principal cause for concern. Unlike sampling variance, bias caused by systematic errors cannot be reduced by increasing the sample size.

Characteristics

Non-sampling errors.

can occur in all aspects of the survey process other than sampling
exists in both sample surveys and censuses
difficult to measure

Non-sampling errors can occur because of problems in coverage, response, non-response, data processing, estimation and analysis. Each of these types of errors is explained below.

Coverage errors

An error in coverage occurs when there is an omission, duplication or wrongful inclusion of the units in the population or sample. Omissions are referred to as undercoverage, while duplication and wrongful inclusions are called overcoverage. These errors are caused by defects in the survey frame: inaccuracy, incompleteness, duplication, inadequacy and obsolescence. Coverage errors may also occur in field procedures (e.g., a survey is conducted, but the interviewer misses several households or persons).

Response errors

Response errors result from data that have been requested, provided, received or recorded incorrectly. The response errors may occur because of inefficiencies with the questionnaire, the interviewer, the respondent or the survey process.

Poor questionnaire design
It is essential that sample survey or census questions are worded carefully in order to avoid introducing bias. If questions are misleading or confusing, then the responses may end up being distorted.

For more information, refer to the section on Questionnaire design.

Interview bias
An interviewer can influence how a respondent answers the survey questions. This may occur when the interviewer is too friendly or aloof or prompts the respondent. To prevent this, interviewers must be trained to remain neutral throughout the interview. They must also pay close attention to the way they ask each question. If an interviewer changes the way a question is worded, it may impact the respondent's answer.

Respondent errors
Respondents can also provide incorrect answers. Faulty recollections, tendencies to exaggerate or underplay events, and inclinations to give answers that appear more 'socially desirable' are several reasons why a respondent may provide a false answer.

Problems with the survey process
Errors can also occur because of a problem with the actual survey process. Using proxy responses (taking answers from someone other than the respondent) or lacking control over the survey procedures are just a few ways of increasing the possibility for response errors.

Non-response errors

Non-response errors are the result of not having obtained sufficient answers to survey questions. There are two types of non-response errors: complete and partial.

Complete non-response errors
These errors can occur when the survey fails to measure some of the units in the selected sample. Reasons for this type of error may be that the respondent is unavailable or temporarily absent, the respondent is unable or refuses to participate in the survey, or the dwelling is vacant. If a significant number of people do not respond to a survey, then the results may be biased since the characteristics of the non-respondents may differ from those who have participated.

Partial non-response errors
This type of error deals with incomplete information obtained from the respondent. For certain people, some questions may be difficult to understand. To reduce this form of bias, care should be taken in designing and testing questionnaires. Appropriate edit and imputation strategies will also help minimize this bias.

More information on editing and imputation can be found in the chapter entitled Data processing.

Processing errors

Processing errors sometimes emerge during the preparation of the final data files. For example, errors can occur while data are being coded, captured, edited or imputed. Coder bias is usually a result of poor training or incomplete instructions, variance in coder performance (i.e., tiredness, illness), data entry errors, or machine malfunction (some processing errors are caused by errors in the computer programs). The same thing can be said about captured errors. Sometimes, errors are incorrectly identified during the editing phase. Even when errors are discovered, they can be corrected improperly because of poor imputation procedures.

Estimation errors

Statistics Canada and other data-collecting agencies devote much effort to designing and monitoring surveys in order to make them as error-free as possible. If an inappropriate estimation method is used, then bias can still be introduced, regardless of how errorless the survey had been before estimation.

Here is an example of a potentially inappropriate estimation. We know that global warming is an issue where there is a lot of debate. To accurately measure this phenomenon, one should know how to come up with an acceptable "average global temperature". Figure 1 features a common portrayal of climate change data. It shows an average global temperature increase between 0.3° and 0.6°C over nearly 140 years.

Figure 1. Global climate change, 1860 to 2000

Chart showing global climate change from 1860 to 1999

The measurements that comprise the data set have been taken at various weather stations around the world. In this case, the population is the measurements of weather, from which a sample survey can be taken.

Some scientists question the accuracy of a graph like Figure 1 because they feel that the estimates from the sample survey are biased.

Scientists argue that measurements of temperature should reflect the ratio of the earth's land mass to the water mass. For example, if the land mass is half of the mass of water (seas and oceans), then twice as many measurements should come from locations over water than over land. In fact, in Figure 1, few measurements were taken from locations over the surface of water, whereas the great majority of measurements were taken from weather stations on land.

Why might this bias the estimates from the sample survey?

Temperatures on land tend to be naturally higher than on water surfaces owing to the phenomenon known as 'urban heat island effect.' If the sample is too heavily weighted in favour of land-based temperatures, and the estimates do not take this into account (as some scientists claim), then the results may not truly reflect a global average.

For more information on estimation, refer to the Sampling methods chapter.

Analysis errors

Analysis errors include any errors that occur when using the wrong analytical tools or when the preliminary results are used instead of the final ones. Errors that occur during the publication of these data results are also considered analysis errors.

Previous section