Statistical Analysis - An introduction

Statistics is a science, that helps in interpreting, understanding, presenting a large volume of numerical and categorical data. It is a measure to calculate various parameters and statistic which will represent the data as a whole. Now why is it required, why not to use complete series of numbers collected by either researchers through a survey or terabytes of data gathered by machines?

Answer lies within the question. This data is humongous & it is difficult to analyze the whole data simultaneously. Hence a researcher or data scientist requires best representation of this very large collection of numbers, this is where statistics come into picture & gives an over-all idea about this "data" of  central tendencies by mean, median, mode & spread or distribution by variance or standard deviation.

This useful calculated values are information generated out of the data. How is this information going to benefit a data scientist, & whether it is going to have any "significance" over the statistic of the complete data. Enter the Analysis part of the process. Analysis is using data information & turning them into useful insights which are not visible otherwise. This is generally being supplemented by a data scientist's own knowledge about the subject & hypothesis tests.

Let's take an Example!!


We have a small data of class -V students & this data comprises of

No. of students : 60
Male students : 38
Female students : 22



Student ID Gender Height
(in cm)
Weight
(in kg)
Grade
1   Male 78 33   C
2   Female 78 31   A+
3   Female 103 35   C
4   Female 71 37   C
5   Female 84 28   A
6   Male 104 34   C
7   Male 85 30   B+
8   Male 103 28   A+
9   Male 66 38   C
10   Female 90 40   F
11   Male 108 29   B
12   Female 110 28   A+
 13   Female 108 40   A
14   Male 78 36   B+
15   Male 106 30   A
16   Male 91 45   F
17   Male 85 36   A+
18   Male 104 41   F
19   Male 73 42   A+
20   Male 89 32   B
21   Female 74 33   F
22   Male 91 30   A
23   Male 66 43   F
24   Male 104 42   A
25   Male 113 42   B
26   Male 89 42   B+
27   Male 98 37   B+
28   Female 99 34   B+
29   Male 76 42   B
30   Female 71 35   A
31   Male 82 38   B
32   Female 69 38   B
33   Female 85 37   A+
34   Female 69 40   A
35   Female 75 43   C
36   Male 109 28   B
37   Female 73 29   B
38   Male 72 40   B
39   Male 111 40   A+
40   Male 106 33   F
41   Female 115 44   A+
42   Female 68 35   F
43   Female 73 30   A
44   Male 90 37   B
45   Female 66 42   A
46   Male 114 40   B
47   Male 108 28   A
48   Female 107 33   A
49   Male 69 31   A+
50   Male 117 44   B+
51   Male 94 42   F
52   Male 109 35   A
53   Male 92 30   A
54   Male 82 37   A+
55   Female 82 30   B
56   Male 66 36   B
57   Female 105 44   A+
58   Male 88 39   B+
59   Male 83 43   A
60   Male 77 28   C
             


This data is about student's height in cm, weight in kg, & last year's grade.



Now if we say that mean height of female students is 85.2 cm & those of male students is 91.2 cm,  

it means that these mean values are single no. representations of heights for both female & male populations respectively.

While if we compare the height means for both female & male students, we find out that height of female students is lesser than the male students for class-V. Now this is an insight that we could draw looking at the mean values of the data. Let's keep it this way until we reach forthcoming topic of hypothesis testing.


A question may arises why can't we go for one to one comparison of data. Answer is simple & straight, it can be done by creating a cross table for male and female students & then further analyzing the outcomes. This is easier said than done for a large data having millions of observations over numerous variables.

This sets our stage to a further go for a complete series of data analysis & statistical approach. We will next discuss about various components of data.



>>This series of blog is an attempt to debunk the various concepts of  statistical analysis in a simpler way. We hope it is of some help to you guys in getting the things right.

No comments:

Post a Comment

My First SAS Program