By Yinghao Li
Consider the following problem: You are a worker in a food company that produces different flavors of yummy beef jerkies, and because it is in summer right now, unfortunately, some beef used to make jerkies might have gone bad. You are asked by your boss to test whether this batch of beef jerkies in all flavors are qualified for going into the market, given some of them are probably made of bad ingredients.
What should you do? Are you going to test, or actually, try to eat all beef jerkies in this batch? Probably not, it is quite ridiculous for most people, because it will let your product this batch aims to the market gone. Or, are you going to randomly pick up some jerkies to eat so that you can calculate the probability of jerkies in this batch that have gone bad, and check if the probability has gone beyond certain threshold, say, 1%, the percentage that is not acceptable for most customers? Great, it sounds smarter this time, but don’t forget that bad ingredients here are not necessarily merely bad beef; we have different flavors for beef jerkies too, and you are also asked to test beef jerkies in all flavors. “OK”, you might think this time, as you become smarter and smarter, “I will randomly pick up some samples from each flavor and test if the ‘go-bad’ rate exceeds certain threshold”, and right, you got it, it is exactly how most food companies test qualifications of their products.
The method we used above, that we only test certain amount of beef jerkies but not all, is called sampling. According to Wikipedia, the definition of sampling is the concern “with the selection of a subset of individuals from within a statistical population to estimate characteristics of the whole population.” Of course, we do not want to use this way technical definition, but I will for sure explain it to you about these seemingly jargoned words, with the beef jerkies example we have mentioned above. I will repeat the definition from Wikipedia with my own explanation here: sampling is concerned with the selection of a subset of individuals (which in our beef jerkies cases, is the selection of certain amount of beef jerkies) from within a statistical population (which is all beef jerkies in this batch) to estimate characteristics of the whole population (which is the rate of going bad for this batch). Now, I hope you have probably grasped the key idea of sampling. Yes, it is nothing more than picking up samples from the whole that we want to test if something goes wrong (or in general, everything we want to test).
Let me bring it clearer to you by giving one more example related to our daily life. Say you have bought a box of strawberries, placed it in the refrigerator, and forgot to eat it until it hits you one week later. You think it has probably gone bad, but you are a college student, you don’t want to waste your money, so you decide to try to eat it to see if it is still edible. You randomly picked some and, if none of these were bad, congratulations, you saved your money. But in most cases, some did go bad but some did not. Then it is up to you whether you want to keep or dispose it as a whole. Maybe you are smarter, you don’t want to pick randomly, you want to pick by characteristics, like whether it looks like it has gone bad. This way of sampling has a name for it, called “stratified sampling”. Maybe, you are more like a scientist, you don’t want to save your money yet you would rather study how strawberries go bad, then you may want to use systematic sampling; that is, samples are chosen according a plan. Maybe you want to choose three samples every day to study the progress of strawberries going bad.
Sampling is useful, of course, not only to our daily life, but to scientific studies too. It draws especially attention to psychologists and cognitive scientists recently, because most participants in psychological studies, according to a study done in 2010 by Canadian researchers from University of British Columbia, are WERID. According to the study, “people from Western, educated, industrialized, rich and democratic (WEIRD) societies represent as much as 80 percent of study participants, but only 12 percent of the world’s population”. The over-sampling of American college students may be skewing our understanding of human behavior, raising concerns among psychologists and cognitive scientists. Some believe it is a critical issue in the field of psychology and cognitive science, but others do not think it is necessarily a bad thing. One of such people is Greg Downey, who expressed his view in his blog post, claiming that “Although WEIRD is terribly catchy and quite manageable, it may not even focus us on the most important distinctions, nor may it reflect a good starting point for a truly trans-cultural psychology, carting our own self-conceptions and obsessions, surreptitiously, into the cross-cultural comparisons”. Nevertheless, the problem of sampling does exist, and we should try our best to avoid non-representative samples in both our daily life and further scientific studies.