Statistics

www.akankshaclasses.com
CLASS IX Mathematics ~6–8 marks/year Ch 12 of 12
Statistics

Class 9 · Mathematics · NCERT chapter notes · Akanksha Classes

Snapshot
  • Statistics is the branch of mathematics that deals with the collection, presentation, analysis and interpretation of numerical data.
  • Data can be primary (collected firsthand by the investigator) or secondary (obtained from existing sources).
  • Raw data is organised using frequency distribution tables with tally marks, class intervals and class marks.
  • Graphical representations include bar graphs (discrete/categorical data, gaps between bars) and histograms (continuous data, NO gaps between bars).
  • A frequency polygon is formed by joining the mid-points of the tops of histogram bars; it can also be drawn directly without a histogram.
  • Three measures of central tendency: Mean (arithmetic average), Median (middle value when sorted), Mode (most frequent value).
  • Board weightage: ~6–8 marks/year — typically one graphical question (2–3 marks) and one central-tendency calculation (3–4 marks).
Detailed notes

1. Primary vs Secondary Data

Data is any collection of facts, numbers, measurements or other information gathered for a purpose.

  • Primary data: Information collected first-hand by the investigator themselves — through surveys, direct measurements, experiments. The collector knows the context, source and accuracy exactly. Example: a teacher recording attendance each day.
  • Secondary data: Information taken from existing sources not originally created by the investigator. Examples: census reports, newspaper statistics, government publications, internet databases, published research.

Raw data is data that has just been collected, unsorted, and looks like a jumble of numbers. The first step in statistics is always to organise this raw data.

Arrayed data: When raw data is arranged in ascending or descending order it becomes an array. This is a necessary first step before constructing frequency tables or finding the median.

Range: The difference between the largest and smallest values in the data. It gives a rough idea of how spread out the data is.

NCERT Illustration — Blood groups of students

The blood groups of 30 students are recorded: A, B, O, O, AB, O, A, O, B, A, O, B, A, A, O, A, AB, O, A, A, O, O, AB, B, A, O, B, A, B, O. This list is raw primary data. A frequency table immediately reveals that O is the most common blood group (12 students) and AB is the rarest (3 students).

2. Frequency Distribution — Tally Marks, Class Intervals, Class Width, Class Mark

A frequency distribution table lists each value (or range of values) with the number of times it occurs. This compresses raw data into a clear summary.

Tally marks

While building the table, we go through the data one value at a time. Instead of writing numerals, we draw tally marks — four vertical strokes, and the fifth diagonally crosses the bundle (like a "gate"). Counting bundles of five is fast and error-free. Each complete bundle represents 5.

Ungrouped frequency distribution

Lists every distinct individual value with its frequency. Best when the range of values is small (e.g., blood groups A, B, O, AB).

Grouped frequency distribution

Groups data into class intervals. Used when the range is large (e.g., marks 0–100). Each class interval has a lower class limit and upper class limit.

TermDefinitionExample (class 20–30)
Lower class limitSmaller boundary of the class20
Upper class limitLarger boundary of the class30
Class width (class size)Upper limit minus lower limit30 – 20 = 10
Class mark (mid-point)(Lower + Upper) divided by 2(20 + 30) / 2 = 25
FrequencyNumber of observations in the classe.g., 8 students scored 20–30
$$\text{Class mark} = \dfrac{\text{Lower class limit} + \text{Upper class limit}}{2}$$ $$\text{Class width} = \text{Upper class limit} - \text{Lower class limit}$$

Exclusive (continuous) form vs inclusive form

Exclusive form: 10–20, 20–30, ... means a value of exactly 20 goes into 20–30 (upper limit excluded from the lower class). Convenient for continuous data.
Inclusive form: 1–10, 11–20, ... means both limits are included. To draw a histogram from inclusive data, first convert by subtracting 0.5 from each lower limit and adding 0.5 to each upper limit.

NCERT Example 1 — Blood group frequency table

Blood groups of 30 students: A, B, O, O, AB, O, A, O, B, A, O, B, A, A, O, A, AB, O, A, A, O, O, AB, B, A, O, B, A, B, O.

Blood GroupTallyFrequency
A|||| ||||9
B|||| |6
O|||| |||| ||12
AB|||3
Total30

Conclusion: O is most common; AB is rarest.

NCERT Example 2 — Grouped frequency distribution (Daily income of workers)

The daily income (in Rs) of 50 workers: 100–120 (12 workers), 120–140 (14), 140–160 (8), 160–180 (6), 180–200 (10). Class width = 20. Class marks: 110, 130, 150, 170, 190.

3. Bar Graphs vs Histograms

Both use rectangular bars and look similar at first glance, but represent fundamentally different data types with one critical visual difference.

FeatureBar GraphHistogram
Data typeDiscrete or categoricalContinuous (grouped into intervals)
Gaps between barsYes — bars do NOT touchNo — bars are adjacent, no gaps
Width of barsUniform, chosen for appearanceEqual to class width; must reflect the interval
X-axis labelsCategories (subjects, months, cities)Class intervals on a continuous numerical scale
Y-axisFrequency or countFrequency (area of bar proportional to frequency)
Area meaningNot meaningfulArea = frequency (when all class widths are equal)

Why no gaps in a histogram? Class intervals are continuous — 10–20 ends exactly where 20–30 begins. There is no gap in the data scale, so there must be no gap between the bars.

NCERT Example 3 — Constructing a histogram (Daily income of 50 workers)
Daily income (Rs)Number of workers
100–12012
120–14014
140–1608
160–1806
180–20010

Steps: (1) Draw a continuous x-axis from 100 to 200. (2) Mark the y-axis as frequency. (3) Draw adjacent bars of heights 12, 14, 8, 6, 10 — each bar touching the next. (4) The class 120–140 has the tallest bar, showing most workers earn in this range.

Note: If the class intervals were unequal (e.g., 100–120, 120–150), we would use frequency density (frequency divided by class width) on the y-axis so the area still represents frequency.

NCERT Example 4 — Reading a histogram (Ages of hospital patients)

A histogram shows the ages of 360 patients admitted over a year, with classes 10–20, 20–30, 30–40, 40–50, 50–60, 60–70 and frequencies 90, 50, 60, 80, 50, 30 respectively. The 10–20 age group has the most admissions (90 patients). Total = 360.

4. Frequency Polygon

A frequency polygon is a line graph that conveys the same information as a histogram but uses connected points instead of bars. Its main advantage: two frequency polygons can be drawn on the same axes to compare two distributions.

Method 1 — From a histogram

  1. Mark the class mark (mid-point) at the top centre of each bar.
  2. Join consecutive mid-points with straight line segments.
  3. Extend the line to the mid-point of a "phantom" class with zero frequency before the first class and after the last class, touching the x-axis. This closes the polygon.

Method 2 — Direct (without histogram)

  1. Compute the class mark of each class interval.
  2. Plot the points (class mark, frequency) for each class.
  3. Also plot (mid-point of phantom class before first, 0) and (mid-point of phantom class after last, 0).
  4. Join all points in order with straight line segments.
Phantom class before first: lower limit of first class minus class width, to upper limit = lower limit of first class. Its mid-point is (lower limit of first class) minus (half class width).
NCERT Example 5 — Frequency polygon for students' marks

Marks of 100 students in a test:

MarksFrequencyClass mark
0–1055
10–201015
20–30425
30–40635
40–50745
50–60355
60–70265
70–80275
80–90385
90–100995

Plot (5,5), (15,10), (25,4), (35,6), (45,7), (55,3), (65,2), (75,2), (85,3), (95,9). Add anchor points (-5, 0) and (105, 0). Join all with straight line segments to form a closed polygon.

The polygon shows the highest peak in the 10–20 range and a secondary peak at 90–100.

5. Mean for Ungrouped Data

The arithmetic mean (usually called "the mean" or "average") is found by summing all observations and dividing by the total count.

$$\bar{x} = \dfrac{x_1 + x_2 + \cdots + x_n}{n} = \dfrac{\displaystyle\sum_{i=1}^{n} x_i}{n}$$

Here $x_1, x_2, \dots, x_n$ are the $n$ observations and $\bar{x}$ (read "x-bar") is their mean.

NCERT Example 6 — Mean of ungrouped data

The marks (out of 100) obtained by 5 students in a mathematics test are: 55, 60, 48, 72, 65. Find the mean.

Sum $= 55 + 60 + 48 + 72 + 65 = 300.$ Number of students $n = 5.$

$\bar{x} = \dfrac{300}{5} = \mathbf{60}.$

Interpretation: On average a student scored 60 marks. Note that no student actually scored exactly 60 — the mean need not equal any observation in the data.

NCERT Example 7 — Finding a missing observation from the mean

The mean of $6, 4, 7, p$ and $10$ is $8$. Find $p$.

$\dfrac{6 + 4 + 7 + p + 10}{5} = 8$

$27 + p = 40 \Rightarrow p = \mathbf{13}.$

NCERT Example 8 — Mean of runs scored in cricket matches

Runs scored by Sachin Tendulkar in 10 innings: 52, 15, 11, 65, 0, 99, 8, 70, 29, 51. Find his mean score.

Sum $= 52 + 15 + 11 + 65 + 0 + 99 + 8 + 70 + 29 + 51 = 400.$

$\bar{x} = \dfrac{400}{10} = \mathbf{40}$ runs per innings.

Key properties of the mean:

  • Unique — every dataset has exactly one mean.
  • Uses every observation — the most "information-rich" measure.
  • Sensitive to extreme values (outliers). A single very large value can pull the mean far above the "typical" value.
  • The sum of deviations from the mean is always zero: $\displaystyle\sum_{i=1}^{n}(x_i - \bar{x}) = 0.$

6. Mean for Grouped Data — Direct Method and Assumed Mean Method

When data is presented in a grouped frequency distribution, individual values are not known. We use the class mark as the representative value for all observations in that class.

Direct Method

$$\bar{x} = \dfrac{\displaystyle\sum f_i x_i}{\displaystyle\sum f_i}$$

where $x_i$ is the class mark of the $i$-th class and $f_i$ is its frequency. Multiply each class mark by its frequency, add up all products, then divide by the total frequency.

Assumed Mean Method (Shortcut)

When class marks are large numbers, the direct multiplications $f_i x_i$ become tedious. Choose an assumed mean $a$ (typically the class mark of the middle class or the class with the highest frequency). Calculate deviations $d_i = x_i - a$ for each class. Then:

$$\bar{x} = a + \dfrac{\displaystyle\sum f_i d_i}{\displaystyle\sum f_i}$$

This gives the same result with much simpler arithmetic — the $d_i$ values are small (often negative and positive, cancelling out).

NCERT Example 9 — Mean by Direct Method (Daily wages of 50 workers)
Daily wages (Rs)Frequency $f_i$Class mark $x_i$$f_i x_i$
100–120121101320
120–140141301820
140–16081501200
160–18061701020
180–200101901900
Total507260

$\bar{x} = \dfrac{7260}{50} = \mathbf{Rs\;145.20}$

NCERT Example 10 — Mean by Assumed Mean Method (same data)

Let assumed mean $a = 150$ (class mark of the middle class).

Class interval$f_i$$x_i$$d_i = x_i - 150$$f_i d_i$
100–12012110-40-480
120–14014130-20-280
140–160815000
160–1806170+20+120
180–20010190+40+400
Total50-240

$\bar{x} = 150 + \dfrac{-240}{50} = 150 - 4.8 = \mathbf{Rs\;145.20}$ — same answer, far less arithmetic.

Tip: The choice of assumed mean $a$ does not affect the final answer. Choose it to make $d_i$ values as small as possible.

7. Median for Ungrouped Data

The median is the value that sits in the exact middle of the data when arranged in ascending order. It divides the distribution into two equal halves — half the values lie below and half above.

Formula

Step 1: Arrange the $n$ observations in ascending order.

If $n$ is odd: $\text{Median} = \left(\dfrac{n+1}{2}\right)\text{-th observation}$

If $n$ is even: $\text{Median} = \dfrac{\left(\dfrac{n}{2}\right)\text{-th observation} + \left(\dfrac{n}{2}+1\right)\text{-th observation}}{2}$
NCERT Example 11 — Median when n is odd

The heights (in cm) of 9 students: 162, 155, 160, 148, 152, 170, 165, 158, 175.

Arrange in ascending order: 148, 152, 155, 158, 160, 162, 165, 170, 175.

$n = 9$ (odd). Median $= \left(\dfrac{9+1}{2}\right)\text{-th} = 5\text{-th value} = \mathbf{160}$ cm.

NCERT Example 12 — Median when n is even

The weights (in kg) of 10 students: 55, 60, 65, 45, 50, 70, 45, 65, 55, 70.

Sorted: 45, 45, 50, 55, 55, 60, 65, 65, 70, 70.

$n = 10$ (even). 5th value $= 55$, 6th value $= 60$.

Median $= \dfrac{55 + 60}{2} = \dfrac{115}{2} = \mathbf{57.5}$ kg.

Why median beats mean when outliers exist: Consider salaries of 5 employees: Rs 10,000; 12,000; 11,000; 13,000; 1,00,000.

Mean $= \dfrac{1,46,000}{5} = \text{Rs }29,200$ — this misrepresents 4 of the 5 employees.

Median (3rd value of sorted data) $= \text{Rs }12,000$ — far more representative of a "typical" salary.

8. Mode for Ungrouped Data

The mode is the observation that appears most frequently in the dataset — the value with the highest frequency.

  • Unimodal: One value appears most often (most common case).
  • Bimodal: Two values tie for most frequent.
  • Multimodal: More than two values tie for most frequent.
  • No mode: All values appear equally often.
NCERT Example 13 — Finding the mode

Marks of 15 students: 14, 25, 14, 28, 18, 17, 18, 14, 23, 22, 14, 18, 14, 13, 14.

Count: 14 appears 6 times, 18 appears 3 times, 25, 28, 17, 23, 22, 13 each once.

Mode $= \mathbf{14}.$

NCERT Practical context — Shoe size production

A shoe manufacturer records daily sales. Size 7 sells 60 pairs/day; size 6 sells 40; size 8 sells 35; size 5 sells 20. The modal shoe size is 7. The manufacturer should produce most pairs of size 7.

The mean shoe size might be 6.7 — which is not a real shoe size and tells the manufacturer nothing actionable. Mode is the right measure for this manufacturing decision.

Mode for grouped data (Class 9 introduction): The class interval with the highest frequency is the modal class. The exact formula to find the precise mode within a class is studied in Class 10.

9. Choosing Which Measure to Use

Mean, median and mode each answer a subtly different question about "what is the central value?" Choosing the wrong one gives a misleading picture.

SituationBest measureReason
Exam scores, temperatures, no extreme outliersMeanUses every value; most mathematically precise
Incomes, house prices, data with extreme outliersMedianNot pulled by very high or very low values
Shoe size, shirt size, dress size (manufacturing)ModeTells which size is most popular to produce/stock
Qualitative / categorical data (blood groups, colours)ModeMean and median are undefined for non-numeric categories
Open-ended distributions ("Rs 10,000 and above")MedianMean is undefined; median can still be located

Empirical relationship (for moderately skewed distributions):

$$\text{Mode} \approx 3 \times \text{Median} - 2 \times \text{Mean}$$

This is an approximate empirical formula, not a definition. It is useful in CBSE problems when one of the three measures is unknown and the other two are given.

Summary comparison

  • Mean: Unique; uses all values; affected by outliers; need not equal any actual observation.
  • Median: Unique; unaffected by outliers; equals an actual observation (odd $n$) or average of two (even $n$).
  • Mode: May not be unique; unaffected by extreme values; always an actual observation in the dataset.

10. Common Mistakes to Avoid

  • Forgetting to sort data before finding the median — the position formula only works on ordered data.
  • Leaving gaps between histogram bars — histograms show continuous data; bars must be adjacent with no spaces.
  • Using class limits instead of class marks in the mean formula for grouped data — always compute $x_i = \dfrac{\text{lower} + \text{upper}}{2}$ first.
  • Applying HCF-LCM product rule to three numbers (unrelated but a common slip in exam pressure) — stay focused on the chapter's scope.
  • Wrong median position for even n — for $n = 10$ the median is the average of the 5th and 6th values, NOT just the 5th.
  • Not closing the frequency polygon — always bring the line to the x-axis using zero-frequency anchor points at both ends.
  • Computing the mean for qualitative (categorical) data — never take the mean of blood groups, colours, or other non-numeric categories; use mode.
  • Confusing bar graph and histogram — bar graphs have gaps (discrete categories), histograms have no gaps (continuous intervals).

11. Quick Revision Checklist

  • Primary data = collected by investigator; Secondary data = from existing sources.
  • Class mark $= \dfrac{\text{lower} + \text{upper}}{2}$; Class width $= \text{upper} - \text{lower}$.
  • Histogram: no gaps, continuous data, bars touch each other.
  • Bar graph: gaps between bars, discrete or categorical data.
  • Frequency polygon: join class marks at heights = frequencies; close with zero-frequency anchor points.
  • Mean (ungrouped) $= \dfrac{\sum x_i}{n}$; Mean (grouped) $= \dfrac{\sum f_i x_i}{\sum f_i}$.
  • Assumed mean shortcut: $\bar{x} = a + \dfrac{\sum f_i d_i}{\sum f_i}$ where $d_i = x_i - a$.
  • Median: sort first; odd $n$: middle value; even $n$: average of two middle values.
  • Mode: most frequent value; use for categorical data or "most popular size" questions.
  • Empirical relation: Mode $\approx 3 \times \text{Median} - 2 \times \text{Mean}$.
Practice MCQs
1. The class mark of the class interval $25-35$ is:
  1. 25
  2. 35
  3. 30
  4. 10
Answer: (C) Class mark $= \dfrac{25+35}{2} = 30.$
2. The mean of $6, 8, 10, 12, 14$ is:
  1. 9
  2. 10
  3. 11
  4. 12
Answer: (B) $\dfrac{6+8+10+12+14}{5} = \dfrac{50}{5} = 10.$
3. The median of $3, 7, 9, 2, 5, 8, 6$ is:
  1. 5
  2. 6
  3. 7
  4. 8
Answer: (B) Sorted: $2, 3, 5, 6, 7, 8, 9.$ With $n=7$ (odd), median $= 4$-th value $= 6.$
4. In a histogram, the bars:
  1. have equal gaps between them
  2. represent only categorical data
  3. have no gaps between them
  4. can be of different widths for the same class size
Answer: (C) Histogram bars are adjacent with no gaps because class intervals are continuous — one class ends exactly where the next begins.
5. The mode of $2, 3, 5, 3, 7, 3, 5, 2, 3, 8$ is:
  1. 2
  2. 5
  3. 3
  4. 8
Answer: (C) $3$ appears $4$ times — more than any other value.
6. Data collected by the investigator themselves, directly from the source, is called:
  1. secondary data
  2. raw data
  3. primary data
  4. arrayed data
Answer: (C) Primary data is first-hand data collected directly by the investigator for a specific purpose.
7. The mean of $n$ observations is $\bar{x}$. If each observation is multiplied by $3$, the new mean is:
  1. $\bar{x}$
  2. $\bar{x} + 3$
  3. $3\bar{x}$
  4. $\dfrac{\bar{x}}{3}$
Answer: (C) Multiplying every observation by a constant $k$ multiplies the mean by $k$. New mean $= 3\bar{x}.$
8. The median of $4, 8, 12, 16, 20, 24$ is:
  1. 12
  2. 14
  3. 16
  4. 15
Answer: (B) $n = 6$ (even). 3rd value $= 12$, 4th value $= 16.$ Median $= \dfrac{12+16}{2} = 14.$
9. In a grouped frequency distribution, the class width of the class $15-25$ is:
  1. 15
  2. 20
  3. 25
  4. 10
Answer: (D) Class width $= 25 - 15 = 10.$
10. Which measure of central tendency is best suited for finding the most popular shoe size in a shop?
  1. Mean
  2. Median
  3. Mode
  4. Range
Answer: (C) Mode gives the most frequently occurring size — exactly what a shopkeeper needs to know for stocking decisions.
Assertion–Reason
A: The median is a better measure of central tendency than the mean when the data contains extreme values (outliers).   R: The median is not affected by extreme values because it depends only on the middle position, not on the magnitude of each observation.
Answer: Both A and R are true, and R is the correct explanation of A.
A: The frequency polygon must always be a closed figure.   R: To close the frequency polygon, imaginary classes with zero frequency are added before the first class and after the last class, and the polygon is brought down to the x-axis at these points.
Answer: Both A and R are true, and R is the correct explanation of A.
Previous-year questions
PYQ 1. The mean of 5 numbers is 18. If one number is excluded, the mean of the remaining 4 numbers is 16. Find the excluded number. (CBSE, 2 marks)
Solution: Sum of 5 numbers $= 5 \times 18 = 90.$ Sum of remaining 4 $= 4 \times 16 = 64.$ Excluded number $= 90 - 64 = \mathbf{26}.$
PYQ 2. Find the median of: $17, 2, 7, 27, 15, 5, 14, 8, 10, 24, 48, 10, 8, 7, 18, 28.$ (CBSE, 2 marks)
Solution: $n = 16$ (even). Sorted: $2, 5, 7, 7, 8, 8, 10, 10, 14, 15, 17, 18, 24, 27, 28, 48.$ 8th value $= 10$, 9th value $= 14.$ Median $= \dfrac{10 + 14}{2} = \mathbf{12}.$
PYQ 3. Runs scored by 11 cricket players: $6, 15, 120, 50, 100, 80, 10, 15, 8, 10, 15.$ Find the mean, median and mode. (CBSE, 3 marks)
Solution: Mean: Sum $= 6+15+120+50+100+80+10+15+8+10+15 = 429.$ Mean $= \dfrac{429}{11} = 39.$
Sorted ($n=11$): $6, 8, 10, 10, 15, 15, 15, 50, 80, 100, 120.$ Median $= 6$-th value $= \mathbf{15}.$
Mode $= \mathbf{15}$ (appears 3 times).
Note: Mean (39) is much higher than median/mode because of outliers 100 and 120. Median better represents the typical score here.
PYQ 4. Find the mean literacy rate (%) of 35 cities using the assumed mean method: 45–55 (3 cities), 55–65 (10), 65–75 (11), 75–85 (8), 85–95 (3). (CBSE, 3–4 marks)
Solution: Class marks: 50, 60, 70, 80, 90. Let $a = 70.$ $d_i$ values: $-20, -10, 0, 10, 20.$ $\sum f_i d_i = 3(-20)+10(-10)+11(0)+8(10)+3(20) = -60-100+0+80+60 = -20.$ $\sum f_i = 35.$ $\bar{x} = 70 + \dfrac{-20}{35} = 70 - \dfrac{4}{7} \approx 70 - 0.57 \approx \mathbf{69.43\%}.$
PYQ 5. Draw a frequency polygon (without histogram) for: marks 0–10 (5 students), 10–20 (10), 20–30 (4), 30–40 (6), 40–50 (7). (CBSE, 3 marks)
Solution: Class marks: 5, 15, 25, 35, 45. Plot points $(5,5), (15,10), (25,4), (35,6), (45,7).$ Add zero-frequency anchor points at $(-5, 0)$ and $(55, 0).$ Join all 7 points in order with straight line segments to form a closed frequency polygon.
Want personal coaching in Dwarka?
Book a free demo class
More Class 9 Mathematics chapters