ASSIGNMENT No. 2
 1 (a) Describe the hypothesis testing process for population for dependent samples.
The techniques for hypothesis testing depend on
 the type of outcome variable being analyzed (continuous, dichotomous, discrete)
 the number of comparison groups in the investigation
 whether the comparison groups are independent (i.e., physically separate such as men versus women) or dependent (i.e., matched or paired such as pre and postassessments on the same participants).
In estimation we focused explicitly on techniques for one and two samples and discussed estimation for a specific parameter (e.g., the mean or proportion of a population), for differences (e.g., difference in means, the risk difference) and ratios (e.g., the relative risk and odds ratio). Here we will focus on procedures for one and two samples when the outcome is either continuous (and we focus on means) or dichotomous (and we focus on proportions).
General Approach: A Simple Example
The Centers for Disease Control (CDC) reported on trends in weight, height and body mass index from the 1960’s through 2002.^{1} The general trend was that Americans were much heavier and slightly taller in 2002 as compared to 1960; both men and women gained approximately 24 pounds, on average, between 1960 and 2002. In 2002, the mean weight for men was reported at 191 pounds. Suppose that an investigator hypothesizes that weights are even higher in 2006 (i.e., that the trend continued over the subsequent 4 years). The research hypothesis is that the mean weight in men in 2006 is more than 191 pounds. The null hypothesis is that there is no change in weight, and therefore the mean weight is still 191 pounds in 2006.
Null Hypothesis  H_{0}: μ= 191 (no change) 
Research Hypothesis  H_{1}: μ> 191 (investigator’s belief) 
In order to test the hypotheses, we select a random sample of American males in 2006 and measure their weights. Suppose we have resources available to recruit n=100 men into our sample. We weigh each participant and compute summary statistics on the sample data. Suppose in the sample we determine the following:
 n=100
 s=25.6
Do the sample data support the null or research hypothesis? The sample mean of 197.1 is numerically higher than 191. However, is this difference more than would be expected by chance? In hypothesis testing, we assume that the null hypothesis holds until proven otherwise. We therefore need to determine the likelihood of observing a sample mean of 197.1 or higher when the true population mean is 191 (i.e., if the null hypothesis is true or under the null hypothesis). We can compute this probability using the Central Limit Theorem. Specifically,
(Notice that we use the sample standard deviation in computing the Z score. This is generally an appropriate substitution as long as the sample size is large, n > 30. Thus, there is less than a 1% probability of observing a sample mean as large as 197.1 when the true population mean is 191. Do you think that the null hypothesis is likely true? Based on how unlikely it is to observe a sample mean of 197.1 under the null hypothesis (i.e., <1% probability), we might infer, from our data, that the null hypothesis is probably not true.
Suppose that the sample data had turned out differently. Suppose that we instead observed the following in 2006:
 n=100
 s=25.6
How likely it is to observe a sample mean of 192.1 or higher when the true population mean is 191 (i.e., if the null hypothesis is true)? We can again compute this probability using the Central Limit Theorem. Specifically,
There is a 33.4% probability of observing a sample mean as large as 192.1 when the true population mean is 191. Do you think that the null hypothesis is likely true?
Neither of the sample means that we obtained allows us to know with certainty whether the null hypothesis is true or not. However, our computations suggest that, if the null hypothesis were true, the probability of observing a sample mean >197.1 is less than 1%. In contrast, if the null hypothesis were true, the probability of observing a sample mean >192.1 is about 33%. We can’t know whether the null hypothesis is true, but the sample that provided a mean value of 197.1 provides much stronger evidence in favor of rejecting the null hypothesis, than the sample that provided a mean value of 192.1. Note that this does not mean that a sample mean of 192.1 indicates that the null hypothesis is true; it just doesn’t provide compelling evidence to reject it.
In essence, hypothesis testing is a procedure to compute a probability that reflects the strength of the evidence (based on a given sample) for rejecting the null hypothesis. In hypothesis testing, we determine a threshold or cutoff point (called the critical value) to decide when to believe the null hypothesis and when to believe the research hypothesis. It is important to note that it is possible to observe any sample mean when the true population mean is true (in this example equal to 191), but some sample means are very unlikely. Based on the two samples above it would seem reasonable to believe the research hypothesis when x̄ = 197.1, but to believe the null hypothesis when x̄ =192.1. What we need is a threshold value such that if x̄ is above that threshold then we believe that H_{1} is true and if x̄ is below that threshold then we believe that H_{0} is true. The difficulty in determining a threshold for x̄ is that it depends on the scale of measurement. In this example, the threshold, sometimes called the critical value, might be 195 (i.e., if the sample mean is 195 or more then we believe that H_{1} is true and if the sample mean is less than 195 then we believe that H_{0} is true). Suppose we are interested in assessing an increase in blood pressure over time, the critical value will be different because blood pressures are measured in millimeters of mercury (mmHg) as opposed to in pounds. In the following we will explain how the critical value is determined and how we handle the issue of scale.
First, to address the issue of scale in determining the critical value, we convert our sample data (in particular the sample mean) into a Z score. We know from the module on probability that the center of the Z distribution is zero and extreme values are those that exceed 2 or fall below 2. Z scores above 2 and below 2 represent approximately 5% of all Z values. If the observed sample mean is close to the mean specified in H_{0} (here m =191), then Z will be close to zero. If the observed sample mean is much larger than the mean specified in H_{0}, then Z will be large.
In hypothesis testing, we select a critical value from the Z distribution. This is done by first determining what is called the level of significance, denoted α (“alpha”). What we are doing here is drawing a line at extreme values. The level of significance is the probability that we reject the null hypothesis (in favor of the alternative) when it is actually true and is also called the Type I error rate.
α = Level of significance = P(Type I error) = P(Reject H_{0}  H_{0} is true).
Because α is a probability, it ranges between 0 and 1. The most commonly used value in the medical literature for α is 0.05, or 5%. Thus, if an investigator selects α=0.05, then they are allowing a 5% probability of incorrectly rejecting the null hypothesis in favor of the alternative when the null is in fact true. Depending on the circumstances, one might choose to use a level of significance of 1% or 10%. For example, if an investigator wanted to reject the null only if there were even stronger evidence than that ensured with α=0.05, they could choose a =0.01as their level of significance. The typical values for α are 0.01, 0.05 and 0.10, with α=0.05 the most commonly used value.
(b) An innovator felt that its new electric motor derive would capture 48% of the regional market within 1 year. There are 5000 users of motor derives in the region. After sampling 10% of these users a year later, the company found that 43% of them were using the new derives. At 1% level of significance, should we conclude that the company failed to reach the market share goal?
 2 The University of Bookstore is facing significant competition from offcampus book stores and they are considering targeting a specific class in order to retain student business. The bookstore randomly selected 150 freshmen and 175 sophomores. They found that 46% of the freshmen and 40% of the sophomores purchase all of their textbooks at University Bookstore.
(a) At 1% level of significance, is there a significant difference in the proportions of freshman and sophomores who purchase entirely at the University Bookstore?
(b) At 5% level of significance, is there a significant difference in the proportions of freshman and sophomores who purchase entirely at the University Bookstore?
 3 A coalfired power plant is considering two different systems for pollution abatement. The first system has reduced the emission of pollutants to acceptable levels 68% of the times as determined from 200 air samples. The second, more expensive system has reduced the emissions of pollutants to acceptable levels 76% of the times as determined from 250 air samples. If the expensive system is significantly more effective than the inexpensive system in reducing pollutants to acceptable levels, then the management of the power plant will install the expensive system.
(a) Which system will be installed if management uses a significant level of 1% in making its decision?
(b) Which system will be installed if management uses a significant level of 5% in making its decision?
 4 Zippy Cola is studying the effect of its latest advertising campaign. People chosen at random were called and asked how many cans of Zippy Cola they had bought in the past week and how many Zippy Cola advertisements they had either read or seen in the past week.
X(number of ads)  3  7  4  2  0  4  1  2 
Y(cans purchased)  11  18  9  4  7  6  3  8 
 Develop the estimating equation.
(b) Calculate sample coefficient of determination and sample coefficient of correlation.
 5 Yamaha Motorcycles began producing three models of mopeds in 1993. For the three years 1993 through 1995, sales (in dollars) are as follows:
Model  Average Annual Price  Units Sold (0000)  
1993  1994  1995  1993  1994  1995  
I  139  155  149  3.7  4.1  7.6 
II  169  189  189  2.3  4.6  8.1 
III  199  2.5  219  1.6  2.1  3.4 
 Calculate the weighted average of relative’s price indices using the prices and quantities from 1995 as the base and weights.
Model  P1 P2 P0  Q1 Q2 Q0  P1 / P0 *100  P2 / P0 * 100  Base Value P0Q0  WAR


1993  1994  1995  1993  1994  1995  
I  139  155  149  3.7  4.1  7.6  93.28  104.26  1132.4  105630.2 
II  169  189  189  2.3  4.6  8.1  89.41  100  1530.9  136877.7 
III  199  2.5  219  1.6  2.1  3.4  90.86  1.14  744.6  67654.3 
Sum  273.55  3407.9 
.
WAR = 932231.04 / 3407.9
WAR = 273.55
(b) Calculate the weighted average of relative’s price indices using total dollar values for each year as the weights and 1995 as the base period.
Model  P1 P2 P0  Q1 Q2 Q0  P1 / P0 *100  P2 / P0 * 100  Base Value P0Q0  WAR


1993  1994  1995  1993  1994  1995  
I  139  155  149  3.7  4.1  7.6  93.28  104.26  1132.4  105630.2 
II  169  189  189  2.3  4.6  8.1  89.41  100  1530.9  136877.7 
III  199  2.5  219  1.6  2.1  3.4  90.86  1.14  744.6  67654.3 
Sum  273.55  205.4  3407.9 
WAR = 699982.66 / 3407.9
WAR = 205.4