 |
... Q&A: Statistics (cont'd)
...
Measurement errors (questions 1621)
| 16. What
is sampling error? |
Every survey contains some form of error. Even a complete
census of all known members of a population is subject to random error
or potential measurement error. There are two major forms of sampling
error that might be encountered in a survey: random error and systematic
error.
|
| 17. What
is random sampling error? |
Random error occurs when a particular sample is not
representative of the population of interest due to random variation.
It can be expressed as the difference between the sample results and the
true results. Even if all aspects of the sample are executed properly,
the results are still subject to a certain amount of error because of
random, chance variation. |
| 18. What
is a systematic error? |
A systematic error occurs when something is wrong with
the technique being used or when an instrument is not calibrated correctly.
This results in an error throughout the sample. |
| 19. How
is the sampling error or standard error determined? |
Calculation of sampling error (also called standard
error) is based on the standard deviation of the sample: the greater the
sample standard deviation, the greater the sampling error. The sampling
error is also related to the sample size. The greater your sample size,
the smaller the sample error. This error cannot be avoided, only reduced
by increasing the sample size.
It is possible to estimate the range of random error at a particular level
of confidence. Suppose we surveyed 500 people and found that 65% of them
said that vanilla is their favorite ice cream. For a sample of 500, sampling
error is 4 percent. This means that we can expect our sample results to
be within 4 percentage points of the actual figure for the population
in other words, as high as 69% or as low as 61%. As sample size
increases, sampling error decreases. Sampling error is 10% for a sample
of 100 and 3% for a sample of 1000. |
| 20. How
does standard error relate to a normal distribution? |
When the area of the standard normal curve is divided
into sections by standard error above and below the mean, the area in
each section is a known quantity. The areas above and below the mean can
be added together to get the probability of obtaining a value within (plus
or minus) a given number of standard errors. There is a 65% chance of
a value falling within one standard error of the mean, a 95% chance within
two standard errors, and a 99% chance that it will be within three. Suppose
a normal distribution has a mean of 3.75 (highest point on graph below)
and a standard deviation of .25. Then 65% of the values will fall between
3.5 and 4.0 as shown below.
(Graphic taken from http://trochim.human.cornell.edu/kb/sampstat.htm)
|
| 21. What
is meant by a level of confidence, or confidence level? |
Confidence levels are used when two sets of data are
being compared. Confidence level, also called significance level, is the
likelihood of obtaining a particular result by chance rather than due
to a truly significant difference in the two sets of data. The smaller
the significance level, the more stringent the test, and the greater the
likelihood the conclusion is correct. Common confidence levels are 0.05
(1 chance in 20), 0.01 (1 chance in 100) and 0.001 (1 chance in 1000).
|
Statistics and astronomy (questions 22, 23)
| 22. Why
do astronomers use statistics? |
|
Astronomers use statistics because they can't manipulate the universe
in a laboratory the way a chemist can manipulate a compound or a biologist
can manipulate a specimen. Since it is impossible to perturb some part
of the population in order to see its effect, astronomers rely on standard
sampling design and estimation methods in order to make conclusions
regarding the universe.
Also, processes in the universe take place over a very large time scale
so noticeable changes are rare and tend to be studied in detail. As
an example, consider stellar evolution. No one has ever observed a star
go through its life cycle since the shortest cycles are about 10 million
years long, but astronomers can observe many stars at different stages
in their life cycles and make predictions.
|
| 23. How
do astronomers use sampling statistic techniques in their research? |
Astronomers use two different sampling designs depending
on the population being studied. If the population is finite in size,
such as a cluster of stars or the Hubble Deep Fields, simple random sampling
is chosen. If the population is very large and considered infinite, then
more complex designs are used depending on the characteristics of the
population and the property being studied. Active galactic nuclei and
halo stars are two populations that are considered infinite.
|
|
 |