Introduction
Statistics is the study of data. Data is collected resources that is translated into a meaningful information. Data is a measured values and it can be classified into four different perspectives. तथ्याङक शाश्त्र भनेको डाटाको अध्ययन गर्ने गणितको एउटा खण्ड हो। डाटा भन्नाले संकलन गरिएको कच्चा संसाधन हो जसलाई अर्थपूर्ण सुचनाको रुपमा मा प्रशोधन गर्नु पर्ने हुन्छ । समग्रमा, डाटा भन्नाले मापन गरिएको मान हो र जसलाई चार फरक दृष्टिकोणका आधारमा वर्गीकृत गर्न सकिन्छ।
 Based on collection (Primary and Secondary) संग्रहमा आधारित (प्राथमिक र माध्यमिक)
 Based on Scale/ Measurement (Nominal, Ordinal, Interval, Ratio) मापनमा आधारित (नाम बुझाउने, क्रम बुझाउने, अन्तराल बुझाउने, अनुपात बुझाउने)
 Based on nature (Qualitative and Quantitative) प्रकृतिका आधारित (गुणात्मक र मात्रात्मक)
 Based on distribution (Individual, Discrete, Continuous) बर्गिकरण \ वितरणका आधारमा (व्यक्तिगत, खण्डित, निरन्तर)
 Descriptive statistics
 Inferential statistics
Descriptive statistics
A statistics that collects, organize and summarize the information is called Descriptive statistics. For example bar graph and mean.Inferential statistics
A statistics that utilize current data and predicts it for future reference, is called inferential statistics. For example hypothesis test or regression analysis.Measure of Central tendency
Measure of central tendency लाई केन्द्रीय प्रवृत्तिको मापन भनिन्छ । यसले तथ्याङकको केन्द्रमा हुने प्रवृत्तिको एकल मान (डाटा सेटको प्रतिनिधि मान) लाई जनाउदछ जसले डाटाको सम्पूर्ण मात्रात्मक सेटको प्रतिनिधित्व गर्दछ। यस केन्द्रीय प्रवृत्तिको मापनलाई स्थान वा स्थितिको मापन पनि भनिन्छ, यसैलाई औसत मापन पनि भनिन्छ।
The Measure of central tendency is a statistic that summarizes the entire quantitative set of data in a single value (a representative value of the data set) having a tendency to concentrate somewhere in the center of the data. Therefore, the tendency of the observations to cluster in the central part of the data is called the central tendency. It measures the central location (or position) of data set. It is also known as average.
 केन्द्रीय प्रवृत्तिको मापन जहिले पनि डाटा सेटको दायरा भित्र पर्दछ। The Measure of central tendency lies somewhere within the range of the data set
 डाटा लाई फरक अर्डरमा पुनर्व्यवस्थित गर्दा केन्द्रीय प्रवृत्तिको मापनमा परिवर्तन हुदैन । The Measure of central tendency remain unchanged by a rearrangement of the data set
 मध्ययक (Mean)
 मध्यिका (Median), Quartile, Decile, Percentile
 रीत (Mode)}
Mean
Mean is measure of central tendency that utilize each and every data to give a single best value. The arithmetic mean or simply mean is also knows as average, which is obtained by dividing the sum of all the observations by total number of observations (summed).
It is denoted by \(\bar{X}\) and define as follows.
Individual data  Discrete data  Continuous data  
Arithmetic Mean  \(\bar{X}=\frac{\sum{x}}{n}\)  \(\bar{X}=\frac{\sum{fx}}{n}\)  \(\bar{X}=\frac{\sum{fm}}{n}\) 
Geometric Mean  \(\bar{X}= \left (\prod x \right)^{\frac{1}{n}}\)  \(\bar{X}=\left (\prod f x \right )^{\frac{1}{n}}\)  \(\bar{X}=\left( \prod f m \right )^{\frac{1}{n}}\) 
Harmonic Mean  \(\bar{X}= \frac{n}{\sum \left( \frac{1}{x}\right )}\)  \(\bar{X}=\frac{n}{\sum \left( \frac{f}{x}\right )}\)  \(\bar{X}=\frac{n}{\sum \left( \frac{f}{m}\right )}\) 
Weighted Mean  \(\bar{X}= \frac{\sum (w.x)}{\sum w}\)  \(\bar{X}= \frac{\sum (w.x)}{\sum w}\)  \(\bar{X}= \frac{\sum (w.x)}{\sum w}\) 
 अंकगणितिय मध्यक (AM) Arithmetic Mean
The arithmetic mean answers the question, "if all the quantities have same value, what is the value to achieve the same total?" The answer is AM. For example, let Ram has Rs 100 and Shyam has Rs 120, then the avarage amount is AM, which is answered by
\( AM =\frac{a+b}{2} =\frac{100+120}{2} =Rs 110\)
In the figure below, a+b is same as AM+AM.  ज्यामितीय मध्यक (GM) Geometric Mean
The geometric mean answers the question, "if all the quantities have same value, what is the value to achieve the same product?". The geometric mean is a useful when we expect changes in data in percentages as rate of change or ratios. It is utilised in the field of finance for the purpose of determining average growth rates, which are also known as the compounded annual growth rate. For example, let Ram deposited Rs 100 in a bank, on which 80% growth in first year and 25% growth in second year, then the average profit is GM, which is answered by
\( GM =\sqrt{1.80 \times 1.25}=1.50\), the average growth is 50%
Please note that, the situation can NOT be explained by \(\frac{80+25}{2} =52.5\%\)
In the figure below, a*b is same as GM*GM.  हार्मोनिक मध्यक (HM)Harmonic Mean
Harmonic Mean is used to calculate average speeds of various distances covered.For example, Let Ram traveled 100km with fuel efficiency 25KM per liter and next 100km with fuel efficiency 16KM per liter, then the average fuel efficiency is HM, which is answered by
\( HM =\frac{2*25*16}{25+16}=19.51\)
Please note that, the situation can NOT be explained by AM or GM
Because
The full efficiency for first 100 km is \( \frac{100}{25}=4\) liter
The full efficiency for second 100 km is \( \frac{100}{16}=6.25\) liter
The total fuel efficiency is
\(\frac{200}{4+6.25}=19.51\)
 भारित मध्यक (WM)Weighted Mean}
A weighted mean is a kind of average where some data points contribute more “weight” than others. If all the weights are equal, then the weighted mean equals the arithmetic mean.
Application of Mean
The mean is calculated from all data value, so it is affected by each and every value of data set. It is applicable if the data distribution represents Quantitative data
 Closed ended
 Normally distributed data
Relation between AM, GM and HM
Let a and b are two nonnegative numbers, then
 \( GM^2=AM \times HM \)
 \( AM \ge GM \ge HM \) Arithmetic mean is greater than geometric mean and harmonic mean, and geometric mean is greater than harmonic mean.
Let a and b are two nonnegative numbers then,
\( AM=\frac{a+b}{2}, GM=\sqrt{ab}, HM=\frac{2ab}{a+b}\)
The proof are as follows:

Now, we have
\( GM^2=ab\)
or \( GM^2=\frac{a+b}{2} \times \frac{2ab}{a+b} \)
or \( GM^2=AM \times HM \) 
Now, we have
\(AMGM=\frac{a+b}{2}\sqrt{ab}\)
or \(AMGM=\frac{a+b2\sqrt{ab}}{2}\)
or \(AMGM=\frac{{{\sqrt{a}}^{2}}+{{\sqrt{b}}^{2}}2\sqrt{a}\sqrt{b}}{2}\)
or \(AMGM=\frac{{{( \sqrt{a}\sqrt{b} )}^{2}}}{2}\)
or \(AM\ge GM\) (1)
Similarly,
\(GMHM=\sqrt{ab}\frac{2ab}{a+b}\)
or \(GMHM=\frac{\sqrt{ab}( a+b )2ab}{a+b}\)
or \(GMHM=\frac{\sqrt{ab}( a+b )2\sqrt{ab}\sqrt{ab}}{a+b}\)
or \(GMHM=\frac{\sqrt{ab}}{a+b}( a+b2\sqrt{ab} ) \)
or \(GMHM=\frac{\sqrt{ab}}{a+b}{{( \sqrt{a}\sqrt{b} )}^{2}}\)
or \(GM\ge HM\)(2)
Combining (1) and (2), we get
\(AM\ge GM\ge HM\)
Visualization of the proof
Let us suppose that a and b are two given numbers. Now, draw a semi circle with diameter a+b.

Visualization of AM
By the property of radius and diameter, we get that
\( AM =\frac{a+b}{2} \) 
Visualization of GM
By the mean proportionality property (squaring a rectangle), we can obtain by using the property of similarity that, DQ is the geometric mean given by
\( GM =\sqrt{ab} \) 
Visualization of HM
By using proportionality, we get
Triangle ADQ and QDB are similar with AD=a, DB=b, so we have
\( \frac{GM}{a}=\frac{b}{GM} \)
or \( GM= \sqrt{ab} \)
Again, using proportionality, we get
Triangle DRQ and ODQ are similar with QR=GM,QD=\(\sqrt{ab}\), OD=\(\frac{ab}{2}\), so we have
\( \frac{HM}{\sqrt{ab}}=\frac{\sqrt{ab}}{\frac{a+b}{2}} \)
or \( HM= \frac{2ab}{a+b} \)
Example 1
Find the mean of the numbers 3, −7, 5, 13, −2
The sum of the numbers is
\(\sum X= 3 − 7 + 5 + 13 − 2 = 12\)
There are 5 numbers, so n=5.
Hence, the mean of the numbers is
\(\bar{X}=\frac{\sum X}{n}=\frac{12}{5}=2.4\)
Example 2
Find the mean of the wages from the following dataWages  50  70  90  110  130  150 
Number of Workers  2  4  5  6  2  1 
Wages \( (X)\)  Number of workers \( (f)\)  \( f.x\) 
50  2  100 
70  4  280 
90  5  450 
110  6  660 
130  2  260 
150  1  150 
\( \sum f=n=20\)  \( \sum f x=1900\) 
\( \bar{X}=\frac{\sum fx}{n}=\frac{1900}{20}=95\)
Example 3
Find the average marks from the following dataMarks of the Students  020  2040  4060  6080  80100 
Number of Students  20  50  55  40  15 
Marks of students \( (X)\)  Mid value of marks \( m\)  Number of students \( (f)\)  \( f.m\) 
020  10  20  200 
2040  30  50  1500 
4060  50  55  2750 
6080  70  40  2800 
80100  90  15  1350 
\( \sum f=n=180 \)  \( \sum fm=8600\) 
\( \bar{X}=\frac{\sum fx}{n}=\frac{8600}{180}=47.8\)
Median
Median is a measure of central tendency that utilize middle portion of the data to give a single best value. The median is the middle value of the data series when the values are placed in order of magnitude (in ascending or descending order). Therefore, Median is not affected by extreme values. It is denoted by \(Md\) and define as follows.
Individual  Discrete  Continuous  
Median  \( M_d=\frac{n+1}{2} \text{th item} \)  \( M_d=\frac{n+1}{2} \text{th item} \)  \( M_{dclass}=\frac{n+1}{2} \text{th item} \)
with \( M_d=L+\frac{\frac{N}{2}cf}{f} \times i\) 
 Sort the data in an ascending order.
 Find the middle number of the sorted data.
 If there’s an odd number of data, get the value exactly in the middle.
 If there’s an even number of data, get the mean of the two middle values.
 Qualitative data
 Open ended or Skewed data
Proof of \( M_d=L+\frac{\frac{n}{2}cf}{f} \times i\)
Let us consider a continuous data given as below
X  \(x_1x_2\)  \(x_2x_3\)  ...  \(x_{k1}x_k\)  \(x_kx_{k+1}\)  ...  \(x_{n1}x_n\)  \(x_nx_{n+1}\) 
X  \(f_1\)  \(f_2\)  ...  \(f_{k1}\)  \(f_k\)  ...  \(f_{n1}\)  \(f_n\) 
Then the cumulative frequency distribution table with
\(\displaystyle F_k=\sum_{i=1}^k f_i\)
is given by
X  Number of items \( f\)  Cumulative frequency \( F=cf\) 
\(x_1x_2\)  \(f_1\)  \(F_1\) 
\(x_2x_3\)  \(f_2\)  \(F_2\) 
...  ...  ... 
\(x_{k1}x_k\)  \(f_{k1}\)  \(F_{k1}\) 
\(x_kx_{k+1}\)  \(f_k\)  \(F_k\) 
...  ...  ... 
\(x_{n1}x_n\)  \(f_{n1}\)  \(F_{n1}\) 
\(x_nx_{n+1}\)  \(f_n\)  \(F_n\) 
Now, suppose that \(x_kx_{k+1}\) is median class then we must have \(F_{k1} < \frac{n}{2} \le F_k\)
Then in the figure given below, we consider that
\(x_{k+1}x_k=i\)
A represents lower data \(x_k\), say L (means \(L=x_k\)), AB represents \(F_{k1}\)
E represents upper data \(x_{k+1}\) EF represents \(F_k\)
CD represents \(\frac{n}{2}\)
So that point C is the required median of the data, which lies between \(x_kx_{k+1}\).
Now from the figure, we get that
\(\triangle BDG \sim \triangle BFH\)
or\(\frac{DG}{BG}=\frac{AH}{BH}\)
or\(\frac{\frac{n}{2}cf}{BG}=\frac{F_kF_{k1}}{i}\)∵ \(F_{k1}=cf\)
or\(BG= \frac{\frac{n}{2}cf}{f} \times i \)∵ \(F_kF_{k1}=f_k=f\)
Hence the median is
\(M_d=A+AC\)
or\(M_d=x_k+BG\)
or\(M_d=L+\frac{\frac{n}{2}cf}{f} \times i\)
Example 1
Find the median of the following wages(in hundreds): \(40,30,35,42,32,45,48\)
Given wages (in hundreds) are
\(40,30,35,42,32,45,48\)
Arranging the wages in ascending order, we get
30  32  35  40  42  45  48 
1st item  2nd item  3rd item  4th item  5th item  6th item  7th item 
Here, the number of data are n=7, thus, based on the formula, the Median is
\(M_d= \left (\frac{n+1}{2} \right )^{th}\) item
or \(M_d= \left (\frac{7+1}{2} \right )^{th}\) item
or \(M_d= 4^th\) item
or \(M_d= 40\) hundreds
Example 2
Find the median of the wages from the following dataWages  50  70  90  110  130  150 
Number of Workers  2  4  5  6  2  1 
Wages \( X\)  Number of Workers \( f\)  Cumulative frequency \( cf\) 
50  2  2 
70  4  6 
90  5  11 
110  6  17 
130  2  19 
150  1  20 
\( M_d= \left (\frac{n+1}{2} \right )^{th}\) item
or \( M_d= \left (\frac{20+1}{2} \right )^{th}\) item
or \( M_d= 10.5^th\) item
or \( M_d= 90\)
Example 3
Find the median marks from the following dataMarks of the Students  020  2040  4060  6080  80100 
Number of Students  20  50  55  40  15 
Marks of the Students \( X\)  Number of Students \( f\)  Cumulative frequency \( cf\) 
020  20  20 
2040  50  70 
4060  55  125 
6080  40  165 
80100  15  180 
Here, the number of data are \( n=180\) , thus, based on the formula, the Median class is
Md Class\( = \left (\frac{n}{2} \right )^{th}\) item
or Md Class\( = \left (\frac{180}{2} \right )^{th}\) item
or Md Class\( = 90^th\) item
Here, \( 90^th\) item lies in the \( cf\) of 125, thus
\( L=40,f=55, cf=70,i=20\)
Hence, the Median is
\(M_d=L+\frac{\frac{N}{2}cf}{f} \times i\)
or \(M_d=40+\frac{\frac{180}{2}70}{55} \times 20=47.27\)
Example 4
Find the median marks from the following dataMarks of the Students  020  2040  4060  6080  80100 
Number of Students  2  3  5  4  6 
Marks of the Students \(X\)  Number of Students \(f\)  Cumulative frequency \(cf\) 
020  2  2 
2040  3  5 
4060  5  10 
6080  4  14 
80100  6  20 
Here, the number of data are \(n=20\), thus, based on the formula, the Median class is
Md Class\(= \left (\frac{n}{2} \right )^{th}\) item
or Md Class \(= \left (\frac{20}{2} \right )^{th}\) item
or Md Class \(= 10^th\) item
Here, \(10^th\) item lies in the \(cf\) of 10, thus
\(L=40,f=5, cf=5,i=20\)
Hence, the Median is
\(M_d=L+\frac{\frac{N}{2}cf}{f} \times i\)
or \(M_d=40+\frac{\frac{20}{2}5}{5} \times 20=60\)
NOTE
In the example above, student may ask that the median \(60\) does not lie in the class \(4060\) as instructed for inclusive data groupings, teaches need to encourage the usual rules for computing.
Quartile, Decile and Percentile
The formula for Quartile, Decile and Percentile are similar as of Median.Individual  Discrete  Continuous  
Quartile k=1,2,3  \( Q_k=\frac{k(n+1)}{4} \text{th item} \)  \( Q_k=\frac{k(n+1)}{4} \text{th item} \)  \( Q_{kclass}=\frac{k(n)}{4} \text{th item} \) with \( Q_k=L+\frac{\frac{kn}{4}cf}{f} \times i\) 
Decile \(k=1,2,\cdots 9\)  \( D_k=\frac{k(n+1)}{4} \text{th item} \)  \( D_k=\frac{k(n+1)}{4} \text{th item} \)  \( D_{kclass}=\frac{k(n)}{4} \text{th item} \) with \( D_k=L+\frac{\frac{kn}{4}cf}{f} \times i\) 
Percentile \(k=1,2,\cdots 99\)  \( P_k=\frac{k(n+1)}{4} \text{th item} \)  \( P_k=\frac{k(n+1)}{4} \text{th item} \)  \( P_{kclass}=\frac{k(n)}{4} \text{th item} \) with \(P_k=L+\frac{\frac{kn}{4}cf}{f} \times i\) 
Mode
The concept of mode, as a measure of central tendency, is preferable when it is desired to know the most typical value, e.g., the most common size of shoes, the most common size of a readymade garment, the most common size of income, the most common size of pocket expenditure of a college student, the most common size of a family in a locality, the most common duration of cure of viralfever, the most popular candidate in an election, etc.
Thus, Mode is a measure of central tendency that utilize fashionable (most repeated data) information to give a single best value. So, Mode is an average value which occurs most frequently in a set of data i.e. it indicates the most frequent (common) results. It is not affected by every values. It is denoted by \(Mo\) and define as follows.
Individual  Discrete  Continuous  
Mode  Repeated data  Repeated data/ Table analysis  \(M_0=L+\frac{f_1f_0}{2f_1f_0f_2} \times i\) 
 Nominal data
 Frequency related data
 Fashionable data
Proof of \(M_0=L+\frac{f_1f_0}{2f_1f_0f_2} \times i\)
Let us consider a continuous data given as below
X  \(x_1x_2\)  \(x_2x_3\)  ...  \(x_{k1}x_k\)  \(x_kx_{k+1}\)  \(x_{k+1}x_{k+2}\)  ...  \(x_{n1}x_n\)  \(x_nx_{n+1}\) 
X  \(f_1\)  \(f_2\)  ...  \(f_{k1}\)  \(f_k\)  \(f_{k+1}\)  ...  \(f_{n1}\)  \(f_n\) 
Let us consider that \(f_k\) is the maximum frequency, then model class is \(x_kx_{k+1}\)
Now, as we supposed that \(x_kx_{k+1}\) is model class then we consider
\(x_{k+1}x_k=i\)
A represents lower data \(x_k\), say L (means \(L=x_k\)), AB represents \(f_k\)
E represents upper data \(x_{k+1}\), EQ represents \(f_{k+1}\) and EF represents \(f_k\)
CD represents maximum frequency for the model value
So that point C is the required Mode of the data, for which the frequency curve has maxima , and which lies between \(x_kx_{k+1}\)
Let us join PF and BQ which intersect at M
Now from the figure, we get that
\(\triangle BPM \sim \triangle QBM\)
or\(\frac{LM}{MN}=\frac{PB}{QF}\)
or\(\frac{LM}{iLM}=\frac{f_kf_{k1}}{f_kf_{k+1}}\)∵ \(LN=i\)
or\(LM= \frac{f_kf_{k1}}{2f_kf_{k1}f_{k+1}} \times i\)
Hence the Mode is
\(M_o=A+AC\)
or\(M_o=x_k+LM\)
or\(M_o=L+\frac{f_kf_{k1}}{2f_kf_{k1}f_{k+1}} \times i\)
In our usual calculation, we assume that \(f_k=f_1, f_{k1}=f_0\), and \(f_{k+1}=f_2\), thus the formula is
\(M_0=L+\frac{f_1f_0}{2f_1f_0f_2} \times i\)
Example 1
Find the mode value of the following data: \(3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29\)Given data set are
\(3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23, 29\)
In frequency table, the data set becomes
X  3  5  7  12  13  14  20  23  29  39  40  56 
f  1  1  1  1  1  1  1  4  1  1  1  1 
Example 2
Find the Mode of the wages from the following data\parWages  50  70  90  110  130  150 
Number of Workers  2  4  5  6  2  1 
Example 3
Find the Mode of from the following dataWages  010  1020  2030  3040  4050  5060  6070 
Number of Workers  4  12  15  18  32  14  13 
Being highest frequency 4, the model class is \(4050\). Thus,
\(L=40,f_0=18,f_1=32,f_2=14,i=10\)
Hence, using formula, the Mode is
\(M_0=L+\frac{f_1f_0}{2f_1f_0f_2} \times i\)
or \(M_0=40+\frac{3218}{2 \times 321814} \times 10=44.47\)
Analytical method to find the Mode
If the frequency distribution is regular, then mode is determined by the value corresponding to maximum frequency. There may be a situation where frequency distribution is NOT regular, means the concentration of observations around a value having maximum frequency is less than the concentration of observations around some other value. In such a situation, mode cannot be determined by the use of maximum frequency criterion. Further, there may be concentration of observations around more than one value of the variable and, accordingly, the distribution is said to be bimodal or multimodal depending upon whether it is around two or more than two values. In such cases, we use analytical method (also called tabular or grouping or empirical method) to find the Mode.
दिएको श्रेणिमा Mode अस्पष्ट भएमा वा तलका निम्न अवस्थामा यो बिधीको प्रयोग गरिन्छ ।
 highest frequency सख्या एक भन्दा बढी भएमा
 highest frequency तथ्याङकको सुरु वा अन्यतिर भएमाा
 highest frequency को वरिपरि ठुला frequency भएमाा
 frequency को अनियमित घटबढ भएमाा
Example 4
Find the Mode of from the following dataWages  10  20  30  40  50  60  70  80  90 
Number of Workers  1  5  17  22  21  20  9  3  4 
Here, the maximum frequency is \(22\), however three are big frequencies around 22, thus we use analytical method to find the Mode.
Hence, based on the rule, the analytic table is given as below.
Wages  $f$  1st + 2nd  2nd+ 3rd  1st+2nd+3rd  2nd+3rd+4th  3rd+4th+5th 
10  1  
6  
20  5  23  
22  
30  17  44  
39  
40  22  60  
43  
50  21  63  
41  
60  20  50  
29  
70  9  32  
12  
80  3  16  
7  
90  4 
 Prepare a table consisting of 7 column, 1st column for X, 2nd column for frequencies of X.
 In third column, add the frequencies, starting from the top and grouped in twos.
 In forth column, add the frequencies, starting from the second and grouped in twos.
 In fifth column, add the frequencies, starting from the top and grouped in threes .
 In sixth column, add frequencies, starting from the top second and grouped in threes.
 In seventh column, add the frequencies, starting from the top third and grouped in threes.
 Finally, prepare frequency chart based on the analytic table
Column  10  20  30  40  50  60  70  80  90 
1  1  
2  1  1  
3  1  1  
4  1  1  1  
5  1  1  1  
6  1  1  1  
Total  2  4  5  3  1 
Relation between Mean Median and Mode
A distribution in which the values of mean, median and mode coincide (i.e. mean = median = mode) is known as a symmetrical distribution.
Conversely, when values of mean, median and mode are not equal the distribution is known as asymmetrical or skewed distribution. In moderately skewed or asymmetrical distribution, a very important relationship exists among these three measures of central tendency. In such distributions
Mode = 3 Median  2 Mean
No comments:
Post a Comment