Statistical inference traditionally consists of two branches, hypothesis testing and estimation. Hypothesis testing addresses the question "Is the value of this parameter (say, a population mean) equal to some specific value (0, for example)?" In this process, we have a hypothesis concerning the value of a parameter, and we seek to determine whether the evidence from a sample supports or does not support that hypothesis. We discuss hypothesis testing in detail in the chapter on hypothesis testing.
The second branch of statistical inference, and the focus of this chapter, is estimation. Estimation seeks an answer to the question "What is this parameter's (for example, the population mean's) value?" In estimating, unlike in hypothesis testing, we do not start with a hypothesis about a parameter's value and seek to test it. Rather, we try to make the best use of the information in a sample to form one of several types of estimates of the parameter's value. With estimation, we are interested in arriving at a rule for best calcu-
Together with calculating a point estimate, we may also be interested in calculating a range of values that brackets the unknown population parameter with some specified level of probability (a confidence interval). In Section 4.1 we discuss point estimates of parameters and then, in Section 4.2, the formulation of confidence intervals for the population mean.
4.1 POINT An important concept introduced in this chapter is that sample statistics viewed as for-ESTIMATORS mulas involving random outcomes are random variables. The formulas that we use to compute the sample mean and all the other sample statistics are examples of estimation formulas or estimators. The particular value that we calculate from sample observations using an estimator is called an estimate. An estimator has a sampling distribution: an estimate is aiixedjwmber pertaining to a given sample and thus has naianiplingjlisttihu.-tion. To take the example of the mean, the calculated value of the sample mean in a given sample, used as an estimate of the population mean, is called a point estimate of the population mean. As Example 6-3 illustrated, the formula for the sample mean can and will yield different results in repeated samples as different samples are drawn from the population.
In many applications, we have a choice among a number of possible estimators for estimating a given parameter. How do we make our choice? We often select estimators because they have one or more desirable statistical properties. Following is a brief description of three desirable properties of estimators: unbiasedness (lack of bias), efficiency, and consistency.9
• Definition of Unbiasedness. An unbiased estimator is one whose expected value (the mean of its sampling distribution) equals the parameter it is intended to estimate.
For example, the expected value of the sample mean, X, equals |jl, the population mean, so we say that the sample mean is an unbiased estimator (of the population mean). The sample variance, s2, which is calculated using a divisor of n - 1 (Equation 6-3), is an unbiased estimator of the population variance, cr2. If we were to calculate the sample variance using a divisor of n, the estimator would be biased: Its expected value would be smaller than the population variance. We would say that sample variance calculated with a divisor of n is a biased estimator of the population variance.
Whenever one unbiased estimator of a parameter can be found, we can usually find a large number of other unbiased estimators. How do we choose among alternative unbiased estimators? The criterion of efficiency provides a way to select from among unbiased estimators of a parameter.
• Definition of Efficiency. An unbiased estimator is efficient if no other unbiased estimator of the same parameter has a sampling distribution with smaller variance.
To explain the definition, in repeated samples we expect the estimates from an efficient estimator to be more tightly grouped around the mean than estimates from other unbiased estimators. Efficiency is an important property of an estimator.10 Sample mean X is an efficient estimator of the population mean; sample variance s2 is an efficient estimator of a2.
Recall that a statistics sampling distribution is defined for a given sample size. Different sample sizes define different sampling distributions. For example, the variance of sampling distribution of the sample mean is smaller for larger sample sizes. Unbiasedness and efficiency are properties of an estimator's sampling distribution that hold for any size sample. An unbiased estimator is unbiased equally in a sample of size 10 and in a sample of size 1,000. In some problems, however, we cannot find estimators that have such desirable properties as unbiasedness in small samples.11 In this case, statisticians may justify the choice of an estimator based on the properties of the estimator's sampling distribution in extremely large samples, the estimator's so-called asymptotic properties. Among such properties, the most important is consistency.
• Definition of Consistency. A consistent estimator is one for which the probability of estimates close to the value of the population parameter increases as sample size increases.
Somewhat more technically, we can define a consistent estimator as an estimator whose sampling distribution becomes concentrated on the value of the parameter it is intended to estimate as the sample size approaches infinity. The sample mean, in addition to being an efficient estimator, is also a consistent estimator of the population mean: As sample size n
9 See Daniel and Terrell (1995) or Greene (2003) for a thorough treatment of the properties of estimators.
10 An efficient estimator is sometimes referred to as the best unbiased estimator.
11 Such problems frequently arise in regression and time-series analyses, which we discuss in later chapters.
goes to infinity, its standard error, u/Vn, goes to 0 and its sampling distribution becomes concentrated right over the value of population mean, |x. To summarize, we can think of a consistent estimator as one that tends to produce more and more accurate estimates of the population parameter as we increase the sample's size. If an estimator is consistent, we may attempt to increase the accuracy of estimates of a population parameter by calculating estimates using a larger sample. For an inconsistent estimator, however, increasing sample size does not help to increase the probability of accurate estimates.
When we need a single number as an estimate of a population parameter, we make use of a point estimate. However, because of sampling error, the point estimate is not likely to equal the population parameter in any given sample. Often, a more useful approach than finding a point estimate is to find a range of values that we expect to bracket the parameter with a specified level of probability—an interval estimate of the parameter. A confidence interval fulfills this role.
• Definition of Confidence Interval. A confidence interval is a range for which one can assert with a given probability 1 - a, called the degree of confidence, that it will contain the parameter it is intended to estimate. This interval is often referred to as the (1 - a)% confidence interval for the parameter.
The endpoints of a confidence limit are referred to as the lower and upper confidence limits. In this chapter, we are concerned only with two-sided confidence intervals— confidence intervals for which we calculate both lower and upper limits.12
Confidence intervals are frequently given either a probabilistic interpretation or a practical interpretation. In the probabilistic interpretation, we interpret a 95 percent confidence interval for the population mean as follows. In repeated sampling, 95 percent of such confidence intervals will, in the long run, include or bracket the population mean. For example, suppose we sample from the population 1,000 times, and based on each sample, we construct a 95 percent confidence interval using the calculated sample mean. Because of random chance, these confidence intervals will vary from each other, but we expect 95 percent, or 950, of these intervals to include the unknown value of the population mean. In practice, we generally do not carry out such repeated sampling. Therefore, in the practical interpretation, we assert that we are 95 percent confident that a single 95 percent confidence interval contains the population mean. We are justified in making this statement because we know that 95 percent of all possible confidence intervals constructed in the same manner will contain the population mean. The confidence intervals that we discuss in this chapter have structures similar to the following basic structure.
• Construction of Confidence Intervals. A (1 - a)% confidence interval for a parameter has the following structure.
Point estimate ± Reliability factor X Standard error
12 It is also possible to define two types of one-sided confidence intervals for a population parameter. A lower one-sided confidence interval establishes a lower limit only. Associated with such an interval is an assertion that with a specified degree of confidence the population parameter equals or exceeds the lower limit. An upper one-sided confidence interval establishes an upper limit only; the related assertion is that the population parameter is less than or equal to that upper limit, with a specified degree of confidence. Investment researchers rarely present one-sided confidence intervals, however.
Point estimate = a point estimate of the parameter (a value of a sample statistic)
Reliability factor = a number based on the assumed distribution of the point estimate and the degree of confidence (1 - a) for the confidence interval
Standard error = the standard error of the sample statistic providing the point estimate13
The most basic confidence interval for the population mean arises when we are sampling from a normal distribution with known variance. The reliability factor in this case is i) based on the standard normal distribution, which has a mean of 0 and a variance of 1. A standard normal random variable is conventionally denoted by Z. The notation Za denotes the point of the standard normal distribution such that a of the probability remains in the right tail. For example, 0.05 or 5 percent of the possible values of a standard normal random variable are larger than z0.05 = 1.65.
Suppose we want to construct a 95 percent confidence interval for the population mean and, for this purpose, we have taken a sample of size 100 from a normally distributed population with known variance of a2 = 400 (so, a = 20). We calculate a sample mean of X - 25. Our point estimate of the population mean is, therefore, 25. If we move 1.96 standard deviations above the mean of a normal distribution, 0.025 or 2.5 percent of the probability remains in the right tail; by symmetry of the normal distribution, if we move 1.96 standard deviations below the mean, 0.025 or 2.5 percent of the probability ^remains in the left tail. In total, 0.05 or 5 percent of the probability is in the two tails and ^0.95 or 95 percent lies in between. So, Z0.025 = 1-96 is the reliability factor for this 95 percent confidence interval. Note the relationship (1 - a)% for the confidence interval and the Za/2 for the reliability factor. The standard error of the sample mean, given by Equation 6-1, is (Xx = 20/Vl00 = 2. The confidence interval, therefore, has a lower limit of X -X t 2 \.%<sx = 25 - 1.96(2) = 25 - 3.92 = 21.08. The upper limit of the confidence interval
^ aIH is X + 1.96ax = 25 + 1.96(2) = 25 + 3.92 = 28.92. The 95 percent confidence interval for the population mean spans 21.08 to 28.92.
• Confidence Intervals for the Population Mean (Normally Distributed Population with Known Variance). A (1 - a)% confidence interval for population mean (x when we are sampling from a normal distribution with known variance a2 is given by
The reliability factors for the most frequently used confidence intervals are as follows.
13 The quantity (Reliability factor) X (Standard error) is sometimes called the precision of the estimator; larger values of the product imply lower precision in estimating the population parameter.
• Reliability Factors for Confidence Intervals Based on the Standard Normal Distribution. We use the following reliability factors when we construct confidence intervals based on the standard normal distribution:14
• 90 percent confidence intervals: Use z0.05 = 1-65
• 95 percent confidence intervals: Use Z0.025 = 1.96
• 99 percent confidence intervals: Use za005 = 2.58
These reliability factors highlight an important fact about all confidence intervals. As we increase the degree of confidence, the confidence interval becomes wider and gives us less precise information about the quantity we want to estimate. "The surer we want to be, the less we have to be sure of."15
In practice, the assumption that the sampling distribution of the sample mean is at least approximately normal is frequently reasonable, either because the underlying distribution is approximately normal or because we have a large sample and the central limit theorem applies. However, rarely do we know the population variance in practice. When the population variance is unknown but the sample mean is at least approximately normally distributed, we have two acceptable ways to calculate the confidence interval for the population mean. We will soon discuss the more conservative approach, which is based on Student's ¿-distribution (the i-distribution, for short).16 In investment literature, it is the most frequently used approach in both estimation and hypothesis tests concerning the mean when the population variance is not known, whether sample size is small or large.
A second approach to confidence intervals for the population mean, based on the standard normal distribution, is the z-alternative. It can be .used only when sample, size is large. (In general, a sample size of 30 or larger may be considered large.) In contrast to the confidence interval given in Equation 6-4, this confidence interval uses the sample standard deviation, s, in computing the standard error of the sample mean (Equation 6-2).
• Confidence Intervals for the Population Mean—The z-Alternative (Large Sample, Population Variance Unknown). A (1 - a)% confidence interval for population mean (x when sampling from any distribution with unknown variance and when sample size is large is given by
Because this type of confidence interval appears quite often, we illustrate its calculation in Example 6-4.
14 Most practitioners use values for zo.os and Zo.oos that are carried to two decimal places. For reference, more exact values for z0 05 and Z0.005 are 1.645 and 2.575, respectively. For a quick calculation of a 95 percent confidence interval, Z0.025 is sometimes rounded from 1.96 to 2.
15 Freund and Williams (1977), p. 266.
16 The distribution of the statistic t is called Student's i-distribution after the pen name "Student" used by W. S. Gosset, who published his work in 1908.
Was this article helpful?