Super basic statistics about populations and samples

#1
Hi! So, first, excuse my level of english. I'll do my best to explain my doubts.
I've been reading OpenIntro Statistics book (4th ed) and I'm a little bit confused about the criteria used in some exercises to determine the population and the sample of a given study:

1.3 Air pollution and birth outcomes, study components.
Researchers collected data to examine the relationship between air pollutants and preterm births in Southern California. During the study air pollution levels were measured by air quality monitoring stations. Specifically, levels of carbon monoxide were recorded in parts per million, nitrogen dioxide and ozone in parts per hundred million, and coarse particulate matter (PM10) in _g=m3. Length of gestation data were collected on 143,196 births between the years 1989 and 1993, and air pollution exposure during gestation was calculated for each birth. The analysis suggested that increased ambient PM10 and, to a lesser degree, CO concentrations may be associated with the occurrence of preterm births.
1.4 Buteyko method, study components.
The Buteyko method is a shallow breathing technique developed by Konstantin Buteyko, a Russian doctor, in 1952. Anecdotal evidence suggests that the Buteyko method can reduce asthma symptoms and improve quality of life. In a scientific study to determine the effectiveness of this method, researchers recruited 600 asthma patients aged 18-69 who relied on medication for asthma treatment. These patients were randomly split into two research groups: one practiced the Buteyko method and the other did not. Patients were scored on quality of life, activity, asthma symptoms, and medication reduction on a scale from 0 to 10. On average, the participants in the Buteyko group experienced a significant reduction in asthma symptoms and an improvement in quality of life.
1.25 Haters are gonna hate, study confirms.
A study published in the Journal of Personality and Social Psychology asked a group of 200 randomly sampled men and women to evaluate how they felt about various subjects, such as camping, health care, architecture, taxidermy, crossword puzzles, and Japan in order to measure their attitude towards mostly independent stimuli. Then, they presented the participants with information about a new product: a microwave oven. This microwave oven does not exist, but the participants didn't know this, and were given three positive and three negative fake reviews. People who reacted positively to the subjects on the dispositional attitude measurement also tended to react positively to the microwave oven, and those who reacted negatively tended to react negatively to it. Researchers concluded that some people tend to like things, whereas others tend to dislike things, and a more thorough understanding of this tendency will lead to a more thorough understanding of the psychology of attitudes.

Question 1. For example 1.3 the book states that the population are "all births", just like that. But for example 1.4 it says that the population are "all asthma patients aged 18-69 who rely on medication for asthma treatment". Why is that? Wouldn't the population simply be asthma patients in example 1.4, just as in example 1.3? What's the difference?

Question 2. For example 1.3 the book says that "if births in this time span at the geography can be considered to be representative of all births, then the results are generalizable to the population of Southern California", but for example 1.25 it says that "the results of the study can be generalized to the population at large since the sample is random". What's the difference? Why does it consider the time span and the geography of the first example and makes generalization dependent on whether the sample is representative or not for example 1.3 (when it doesn't even state if they were randomly selected or not, and if they where, why does generalization depend on the sample being representative?), and for example 1.25 the results can be generalizable to the population at large just because it was a random sample, without considering it being representative or not?
I just want to add that I understand that generalization depends on random sampling and that causal relations depend on random assignment to treatment or control group.
I hope I made myself clear. Thank you for your help :)
 

obh

Active Member
#2
Question 1. For example 1.3 the book states that the population are "all births", just like that. But for example 1.4 it says that the population are "all asthma patients aged 18-69 who rely on medication for asthma treatment". Why is that? Wouldn't the population simply be asthma patients in example 1.4, just as in example 1.3? What's the difference?

Question 2. For example 1.3 the book says that "if births in this time span at the geography can be considered to be representative of all births, then the results are generalizable to the population of Southern California", but for example 1.25 it says that "the results of the study can be generalized to the population at large since the sample is random". What's the difference? Why does it consider the time span and the geography of the first example and makes generalization dependent on whether the sample is representative or not for example 1.3 (when it doesn't even state if they were randomly selected or not, and if they where, why does generalization depend on the sample being representative?), and for example 1.25 the results can be generalizable to the population at large just because it was a random sample, without considering it being representative or not?
I just want to add that I understand that generalization depends on random sampling and that causal relations depend on random assignment to treatment or control group.
I hope I made myself clear. Thank you for your help :)
Please try to have a shorter question ...:)

1. I assume you define what is the relevant population for your research, in one research it may be all the asthma patients, in another all the asthma patients that take medicine, in another all the male asthma patients age 6-10.

2. I assume that if you take only a specific age span there is always the risk that this age span does not represent other age spans. But if you take a fully random sample overall age spans it should be representative.
 

Karabiner

TS Contributor
#3
But for example 1.4 it says that the population are "all asthma patients aged 18-69 who rely on medication for asthma treatment". Why is that? Wouldn't the population simply be asthma patients in example 1.4,
There could be several different rasons. One could be that you perfom a controlled study on the
efectiveness of a health intervention, then often you want to treat those who are seriously
affected, because then you can demonstrate effects easier than in nonserious cases. So
medication could be a proxy for seriousness of the condition.
for example 1.25 the results can be generalizable to the population at large just because it was a random sample, without considering it being representative or not?
"Generalizabilty to the population at large" is an exaggeration, to the say the least.
Details of the recruitment procedure, incluson and exclusion criteria, rate of refusal
to participate, would reveal to which degree the sample is not representative for the
general population. The representativeness claim should be based on some comparisons
with some basic population data (e.g. gender, age, educational level, maritial status/children,
employment).

With kind regards

Karabiner
 
#4
Please try to have a shorter question ...:)

1. I assume you define what is the relevant population for your research, in one research it may be all the asthma patients, in another all the asthma patients that take medicine, in another all the male asthma patients age 6-10.

2. I assume that if you take only a specific age span there is always the risk that this age span does not represent other age spans. But if you take a fully random sample overall age spans it should be representative.
I know, it's just way too long, I just didn't really know how to express my doubts without considering the examples. But thanks a lot for answering. I think the book is just missing a tiny bit of explanation in the answers to the exercises.
 
#5
There could be several different rasons. One could be that you perfom a controlled study on the
efectiveness of a health intervention, then often you want to treat those who are seriously
affected, because then you can demonstrate effects easier than in nonserious cases. So
medication could be a proxy for seriousness of the condition.

"Generalizabilty to the population at large" is an exaggeration, to the say the least.
Details of the recruitment procedure, incluson and exclusion criteria, rate of refusal
to participate, would reveal to which degree the sample is not representative for the
general population. The representativeness claim should be based on some comparisons
with some basic population data (e.g. gender, age, educational level, maritial status/children,
employment).

With kind regards

Karabiner
I see. I guess the book is not as clear either on the explanation it gives for a certain topic or on the answers it provides to some of the exercises, because I find them contradicting at times, not just for these examples. And yes, I thought that the results being generalizable to the population at large was a bit off, too. Thank you for taking the time to read.
 

obh

Active Member
#6
I know, it's just way too long, I just didn't really know how to express my doubts without considering the examples. But thanks a lot for answering. I think the book is just missing a tiny bit of explanation in the answers to the exercises.
Of course, just general comment, if you write too long people may not read the question.
But in your case, you got two answers, so it is probably not too long :)