## ASSIGNMENT No. 1

**1 Describe different methods of data presentation and arrangement.**

Data presentation is one of the important aspects of Statistics. Presenting the data helps the users to study and explain the statistics thoroughly. We are going to discuss this presentation of data and know-how information is laid down methodically. Statistics is all about data. Presenting data effectively and efficiently is an art. You may have uncovered many truths that are complex and need long explanations while writing. This is where the importance of the presentation of data comes in. You have to present your findings in such a way that the readers can go through them quickly and understand each and every point that you wanted to showcase. As time progressed and new and complex research started happening, people realized the importance of the presentation of data to make sense of the findings. Data presentation is defined as the process of using various graphical formats to visually represent the relationship between two or more data sets so that an informed decision can be made based on them.

### Types of Data Presentation

Broadly speaking, there are three methods of data presentation:

- Textual
- Tabular
- Diagrammatic

### Textual Ways of Presenting Data

Out of the different methods of data presentation, this is the simplest one. You just write your findings in a coherent manner and your job is done. The demerit of this method is that one has to read the whole text to get a clear picture. Yes, the introduction, summary, and conclusion can help condense the information.

### Tabular Ways of Data Presentation and Analysis

To avoid the complexities involved in the textual way of data presentation, people use tables and charts to present data. In this method, data is presented in rows and columns – just like you see in a cricket match showing who made how many runs. Each row and column have an attribute (name, year, sex, age, and other things like these). It is against these attributes that data is written within a cell.

### Diagrammatic Presentation: Graphical Presentation of Data in Statistics

This kind of data presentation and analysis method says a lot with dramatically short amounts of time.

Diagrammatic Presentation has been divided into further categories:

Geometric Diagram

When a Diagrammatic presentation involves shapes like a bar or circle, we call that a Geometric Diagram. Examples of Geometric Diagram

Bar Diagram

Simple Bar Diagram

Simple Bar Diagram is composed of rectangular bars. All of these bars have the same width and are placed at an equal distance from each other. The bars are placed on the X-axis. The height or length of the bars is used as the means of measurement. So, on the Y-axis, you have the measurement relevant to the data.

Suppose, you want to present the run scored by each batsman in a game in the form of a bar chart. Mark the runs on the Y-axis – in ascending order from the bottom. So, the lowest scorer will be represented in the form of the smallest bar and the highest scorer in the form of the longest bar.

### Multiple Bar Diagram

In many states of India, electric bills have bar diagrams showing the consumption in the last 5 months. Along with these bars, they also have bars that show the consumption that happened in the same months of the previous year. This kind of Bar Diagram is called Multiple Bar Diagrams.

### Component Bar Diagram

Sometimes, a bar is divided into two or more parts. For example, if there is a Bar Diagram, the bars of which show the percentage of male voters who voted and who didn’t and the female voters who voted and who didn’t. Instead of creating separate bars for who did and who did not, you can divide one bar into who did and who did not.

### Pie Chart

A pie chart is a chart where you divide a pie (a circle) into different parts based on the data. Each of the data is first transformed into a percentage and then that percentage figure is multiplied by 3.6 degrees. The result that you get is the angular degree of that corresponding data to be drawn in the pie chart. So, for example, you get 30 degrees as the result, on the pie chart you draw that angle from the center.

### Frequency Diagram

Suppose you want to present data that shows how many students have 1 to 2 pens, how many have 3 to 5 pens, how many have 6 to 10 pens (grouped frequency) you do that with the help of a Frequency Diagram. A Frequency Diagram can be of many kinds:

### Histogram

Where the grouped frequency of pens (from the above example) is written on the X-axis and the numbers of students are marked on the Y-axis. The data is presented in the form of bars.

### Frequency Polygon

When you join the midpoints of the upper side of the rectangles in a histogram, you get a Frequency Polygon

### Frequency Curve

When you draw a freehand line that passes through the points of the Frequency Polygon, you get a Frequency Curve.

Measures of Dispersion:

(a) The Range:

The range is by far the simplest measure of dispersion. It is defined as the difference between the highest and lowest figures in a given sample. If we have grouped data, the range is taken as the difference between the midpoints of the extreme categories.

(b) The Mean Deviation:

It is the average of the deviations from the arithmetic mean.

(c) The Standard Deviation:

It is the most frequently used measure of deviation. In simple terms, it is defined as “Root-Means- Square-Deviation”. It is denoted by Greek letter 6.

It is calculated by formula:

When the sample size is more than 30, the above basic formula may be used without modification. For smaller samples, the above formula tends to underestimate the standard deviation, and therefore needs correction i.e., use n-1 instead of n.

The meaning of standard deviation can only be appreciated fully when we study it with reference to “normal curve”. The larger the standard deviation the greater the dispersion of values about the mean.

(d) Normal Distribution:

The normal distribution or normal curve is an important concept in statistical theory. The shape of the curve will depend upon the mean and standard deviation which in turn will depend upon the number and nature of observation.

**2 From the following data construct a frequency distribution and draw the histogram of the resulting frequency distribution.**

** 83 51 66 61 82 65 54 56 92 60 65 87**

** 68 64 51 70 75 66 74 68 44 55 78 69**

** 98 67 82 77 79 62 38 88 76 99 84 47**

** 60 42 66 74 91 71 83 80 68 65 51 56 **

Frequency Distribution Table |
||

Class |
Count |
Percentage |

38 – 48 |
4 | 8.3 |

49 – 59 |
7 | 14.6 |

60 – 70 |
17 | 35.4 |

71 – 81 |
9 | 18.8 |

82 – 92 |
9 | 18.8 |

93 – 103 |
2 | 4.2 |

Total |
48 | 100.1 |

**3 (a) Describe the calculation of percentiles providing suitable formulae.**

In statistics, percentiles are used to understand and interpret data. The *n*th percentile of a set of data is the value at which *n* percent of the data is below it. In everyday life, percentiles are used to understand values such as test scores, health indicators, and other measurements. For example, an 18-year-old male who is six and a half feet tall is in the 99th percentile for his height. This means that of all the 18-year-old males, 99 percent have a height that is equal to or less than six and a half feet. An 18-year-old male who is only five and a half feet tall, on the other hand, is in the 16th percentile for his height, meaning only 16 percent of males his age are the same height or shorter.

- Percentiles are used to understand and interpret data. They indicate the values below which a certain percentage of the data in a data set is found.
- Percentiles can be calculated using the formula n = (P/100) x N, where P = percentile, N = number of values in a data set (sorted from smallest to largest), and n = ordinal rank of a given value.
- Percentiles are frequently used to understand test scores and biometric measurements.

Percentiles should not be confused with percentages. The latter is used to express fractions of a whole, while percentiles are the values below which a certain percentage of the data in a data set is found. In practical terms, there is a significant difference between the two. For example, a student taking a difficult exam might earn a score of 75 percent. This means that he correctly answered every three out of four questions. A student who scores in the 75th percentile, however, has obtained a different result. This percentile means that the student earned a higher score than 75 percent of the other students who took the exam. In other words, the percentage score reflects how well the student did on the exam itself; the percentile score reflects how well he did in comparison to other students.

Percentiles for the values in a given data set can be calculated using the formula:

n = (P/100) x N

where N = number of values in the data set, P = percentile, and n = ordinal rank of a given value (with the values in the data set sorted from smallest to largest). For example, take a class of 20 students that earned the following scores on their most recent test: 75, 77, 78, 78, 80, 81, 81, 82, 83, 84, 84, 84, 85, 87, 87, 88, 88, 88, 89, 90. These scores can be represented as a data set with 20 values: {75, 77, 78, 78, 80, 81, 81, 82, 83, 84, 84, 84, 85, 87, 87, 88, 88, 88, 89, 90}.

We can find the score that marks the 20th percentile by plugging in known values into the formula and solving for *n*:

n = (20/100) x 20

n = 4

The fourth value in the data set is the score 78. This means that 78 marks the 20th percentile; of the students in the class, 20 percent earned a score of 78 or lower.

Given a data set that has been ordered in increasing magnitude, the median, first quartile, and third quartile can be used split the data into four pieces. The first quartile is the point at which one-fourth of the data lies below it. The median is located exactly in the middle of the data set, with half of all the data below it. The third quartile is the place where three-fourths of the data lies below it.

The median, first quartile, and third quartile can all be stated in terms of percentiles. Since half of the data is less than the median, and one-half is equal to 50 percent, the median marks the 50th percentile. One-fourth is equal to 25 percent, so the first quartile marks the 25th percentile. The third quartile marks the 75th percentile.

Besides quartiles, a fairly common way to arrange a set of data is by deciles. Each decile includes 10 percent of the data set. This means that the first decile is the 10th percentile, the second decile is the 20th percentile, etc. Deciles provide a way to split a data set into more pieces than quartiles without splitting the set into 100 pieces as with percentiles.

Percentile scores have a variety of uses. Anytime that a set of data needs to be broken into digestible chunks, percentiles are helpful. They are often used to interpret test scores—such as SAT scores—so that test-takers can compare their performance to that of other students. For example, a student might earn a score of 90 percent on an exam. That sounds pretty impressive; however, it becomes less so when a score of 90 percent corresponds to the 20th percentile, meaning only 20 percent of the class earned a score of 90 percent or lower.

Another example of percentiles is in children’s growth charts. In addition to giving a physical height or weight measurement, pediatricians typically state this information in terms of a percentile score. A percentile is used in order to compare the height or weight of a child to other children of the same age. This allows for an effective means of comparison so that parents can know if their child’s growth is typical or unusual.

** (b) For the data in Question No. 2, compute 9th, 29th, 49 ^{th}, 69th and 89th percentiles. **

The 9th percentile value is 48.64.

The 29th percentile value is 61.21.

The 49th percentile value is 68.

The 69th percentile value is 76.81.

The 89th percentile value is 87.61.

**4 (a) Define population and sample standard deviations providing different methods of calculation.**

When considering standard deviations, it may come as a surprise that there are actually two that can be considered. There is a population standard deviation and there is a sample standard deviation. We will distinguish between the two of these and highlight their differences.

## Qualitative Differences

Although both standard deviations measure variability, there are differences between a population and a sample standard deviation. The first has to do with the distinction between statistics and parameters. The population standard deviation is a parameter, which is a fixed value calculated from every individual in the population.

A sample standard deviation is a statistic. This means that it is calculated from only some of the individuals in a population. Since the sample standard deviation depends upon the sample, it has greater variability. Thus the standard deviation of the sample is greater than that of the population.

## Quantitative Difference

We will see how these two types of standard deviations are different from one another numerically. To do this we consider the formulas for both the sample standard deviation and the population standard deviation.

The formulas to calculate both of these standard deviations are nearly identical:

- Calculate the mean.
- Subtract the mean from each value to obtain deviations from the mean.
- Square each of the deviations.
- Add together all of these squared deviations.

Now the calculation of these standard deviations differs:

- If we are calculating the population standard deviation, then we divide by
*n,*the number of data values. - If we are calculating the sample standard deviation, then we divide by
*n*-1, one less than the number of data values.

The final step, in either of the two cases that we are considering, is to take the square root of the quotient from the previous step.

The larger the value of *n *is, the closer that the population and sample standard deviations will be.

** (b) These data are a sample from the daily production rate of fiberglass boats:**

** 17 21 18 27 17 21 20 22 18 23**

** Calculate coefficient of variation for this data set.**

## Coefficient of Variation Calculation

## N: 10

## M: 20.4

## SS: 88.4

## s^{2} = SS⁄(N – 1) = 88.4/(10-1) = 9.82

## s = √s^{2} = √9.82 = 3.13

## CV = (s/M)*100 = (3.13/20.4)*100 = 15.36

Coefficient of Variation = 15.36295%.

**5 The Statistics Department installed energy-efficient lights, heaters and air conditioners last year. Now they want to determine that the average monthly energy usage has decreased.**

** (a) Should they perform a one- or two-tailed? Explain.**

They used two tailed because they the increment or decrement in light usage.

** (b) If their previous average monthly energy usage was 3124 kilo watt hours, what are the null and alternative hypotheses?**

Ho: x<3124

Hi: x>3124

** (c) List the full five-step testing procedure.**

There are 5 main steps in hypothesis testing:

- State your research hypothesis as a null hypothesis and alternate hypothesis (H
_{o}) and (Ha or H1). - Collect data in a way designed to test the hypothesis.
- Perform an appropriate statistical test.
- Decide whether to reject or fail to reject your null hypothesis.
- Present the findings in your results and discussion section.

## Step 1: State your null and alternate hypothesis

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H_{o}) and alternate (H_{a}) hypothesis so that you can test it mathematically.

The **alternate hypothesis** is usually your initial hypothesis that predicts a relationship between variables. The **null hypothesis** is a prediction of no relationship between the variables you are interested in.

## Step 2: Collect data

For a statistical test to be valid, it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

## Step 3: Perform a statistical test

There are a variety of statistical tests available, but they are all based on the comparison of **within-group variance** (how spread out the data is within a category) versus **between-group variance** (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low *p*-value. This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high *p*-value. This means it is likely that any difference you measure between groups is due to chance.

## Step 4: Decide whether to reject or fail to reject your null hypothesis

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the *p*-value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis (Type I error).

## Step 5: Present your findings

The results of hypothesis testing will be presented in the results and discussion sections of your research paper.

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated *p*-value). In the discussion, you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

** **