Because it is not the entire population that is interviewed in public opinion research but only a representative sample of it, it is unavoidable that sampling error will occur. We can imagine this as the degree of uncertainty about whether the information measured in the survey (for example, the share of supporters of a particular political party) is an exact reflection of reality in the population as a whole, or whether it perhaps differs from that reality to some degree.
How certain or uncertain we are about the accuracy of our measurement depends among other things on the size of the sample. If the size of the sample is small, the sampling error will be greater; if the size of the sample is large, then the sampling error will be smaller. A sample of one thousand respondents is often thought by people to be small, but in fact it is large enough to produce relatively accurate findings on the population. Let’s imagine a hypothetical situation where we have no idea what the ratio of men to women is in society, whether it is 30% men and 70% women, 20/80, 40/60, or the other way around. When we conduct a representative survey on 1000 respondents, we will measure, for example, 51% men and 49% women. Because we know that we are working with a sample and not the total population, we have to take the sampling error into account. In this case the sampling error is three percent, which we add and deduct from the measured estimates of 51% and 49% (for estimates below and above 50% the size of the sampling error decreases symmetrically, so, for example, if we measure 20% or 80%, the sampling error is just 2.5%). In our example we discovered by surveying 1000 people that the population is made up of 48–54% men and 46–52% women. When compared to a complete lack of knowledge about the distribution, this is a very precise finding – we now know that the ratio is not 70/30 or 80/20 or 60/40 and so on. Because we chose an example where the real population figures are known, we also know that our intervals match reality and therefore that our research is accurate. Needless to point out, for the majority of questions asked in a survey we truly do not know what the accurate results are beforehand; even approximate information with a sampling error of +/- 3 percent can produce important information. (For readers from the professional community it is important to note that this is not a hundred-percent confidence interval, and depending on some of the other criteria it may represent, for instance, a confidence of only 95%, which means that there is still a 5% risk that the real figure lies outside the interval we indicated here.)
And now to the question, ‘How can 1000 respondents be enough if there are 10 million people in the Czech population?’ We know in fact from statistical theory that it does not matter how large the population from which we select the units for our sample is. The reason for this is that an estimate’s accuracy (the width of the interval measured) and reliability (the confidence that the real value lies within the given interval), which when combined provide information about the sampling error, are linked almost exclusively to the size of the sample (and also to some extent the variability of the target population). So when we are calculating the size of the sampling error, the size of the population that will be the subject of our conclusions is not a factor we need to consider. To determine an estimate’s accuracy, what matters is the size of the sample (and the variability of the responses).
Let’s look at how the accuracy of our estimates would improve if we were to increase the size of our sample. If we had a sample of five thousand respondents, the sampling error for our estimate of 49% women in the population would be 1.4 percent. We would know with 95% confidence that the share of women in the population is between 47.6 and 50.4%. This is indeed a more precise finding, but not such an advantage when we consider that it would cost us approximately five times as much to conduct a survey on a sample of five thousand people. And what if we had ten thousand respondents? In this sample the sampling error for an estimated 49% women would be 1 percent, and the share of women in the population would in this research be between 48 and 50%. The further refinement of the estimate achieved by this is again small compared to the very substantial increase in financial costs.
We keep talking about 1000 respondents. Does a sample have to have at least 1000 respondents for it to be of informational value? How do you decide how many people to survey?
A sample does not have to have at least 1000 respondents for it to be of informational value. This number is drawn the standard practice used by researchers to get as accurate an estimate as possible for a reasonable cost, as described above. Samples with fewer respondents than 1000 have a bigger sampling error and, conversely, increasing the number of respondents does not substantially reduce sampling error. There is no fundamental difference in the size of the sampling error, for instance, between 1000 respondents and 950 respondents. But more for reasons of efficiency it is the practice that approximately one thousand respondents will produce results of sufficient accuracy and reliability in relation to the costs that must be incurred to perform such research.
In reality, in order to get an idea of the strength of an opinion in the population as a whole even a sample of 500 respondents will do, as long as there are no great ambitions in the research to look for differences between individual subgroups. In the case of 500 respondents, the sampling error for an estimate of 49% is 4.4%, which is not much greater than the 3% error for a sample of 1000 respondents.
Nevertheless, a larger sample is practical for a reason other than reducing sampling error. The larger the sample, the better we are able to observe information about subgroups – for example, how opinions differ between men and women, between people with less and more education, or between people in different age groups. A larger sample means that in every subsample there will be more units, and consequently the sampling error for the individual subsamples will be smaller and the conclusions about these subgroups will be more accurate and more reliable. Therefore, when deciding what size of sample to use researchers also take into account the depth of the analysis they want to conduct and the desired degree of accuracy of their estimates. And in the majority of situations 1000 respondents is the optimal solution.