Describing Distributions of Scores

 

       After running an experiment, we are typically left with a large number of scores.  Knowing each of those individual scores, however, is not normally very informative.  What we’re really interested in is the characteristics of the distribution of scores.

 

 

       Later, we’ll learn how to characterize the distribution with a few calculated values, but it’s often informative to examine the entire distribution of scores as a first step.

 


So how do we determine the distribution of the data?  Consider the following simple example:

 

       A researcher asks 20 students how many hours of sleep they had the night before, and receives the following 20 responses:

 

       5            7            7            8            6    

       9            6            8            7            7

       8            6            9            10          5

       7            8            6            9            7

 

       We can characterize these data by constructing a frequency distribution.

 

·      Frequency distribution: shows the number of times each score occurs in a set of data.

 


We determine the frequency distribution by:

 

1.  Determining the unique scores present in the data (e.g., 5, 6, 7, 8, 9, 10 in our example)

2.  Counting how often each score appears in the data (its frequency, f)

a.   e.g.

 

Score

Frequency (f)

10

1

9

3

8

4

7

6

6

4

5

2

 

       Looking at this table, we can easily see that there were lots of students of who slept between 6 and 8 hours, but only a few who slept less than 6 hours or more than 8 hours.


       It’s even easier to visualize the data by displaying the frequency distribution as a graph.  In fact, graphs are a useful way of presenting data in general.

·      Graphs are useful for making complex data sets more understandable.

·      Graphs make it easier to visualize relationships between variables.

·      Graphs can be useful for identifying potential relationships and guided further analyses.


Types of Graphs

 

1.        Bar graph: used primarily with ordinal or nominal data.

 

2.        Histogram: used primarily with interval or ratio data.

 

3.        Polygon: also used for interval or ratio data; similar to histogram, but with points, rather than bars, plotted.

 

This method works fine when there are only a few unique scores, as in our example of number of hours slept.  When there are many unique scores, though, this method makes the data difficult to understand, even after constructing the frequency distribution.

      

       In that case, we group similar scores together, and construct a frequency distribution on the grouped scores.

 

      


To construct a frequency distribution for the grouped scores:

 

1. Determine the range of scores

·      Range = Largest - smallest score

·      Range = 155-80 = 75

 

2. Determine the number of intervals you need (typically 10-20 intervals for 100 or more scores).

·      We’ll use 10 intervals.

 

3. Determine the required interval width.

 

 

      

4.   Round i to the same precision as the raw scores in the data set (round to 8 in this example).

 


5.  Construct intervals of width i, starting with a lower bound that is lower than the smallest score, and is a multiple of i (in this example, the lower bound is 80).

 

6.  Tally the scores into the appropriate interval.

 

7.   Sum the tallies to determine the frequency for each interval.

 

 

Interval

Frequency

152-159

3

144-151

9

136-143

13

128-135

15

120-127

20

112-119

21

104-111

17

96-103

13

88-95

7

80-87

2

 

Again, it’s much easier to visualize the data using the frequency distribution than it was given just the raw data.  And, again, it’s even easier to visualize using a graph.


Graphing the data can help identify trends for further analysis.  For example, suppose the researcher had another group of 120 rats, who had been raised in an enriched environment, and she wanted to find out how they compare to the first group of rats.  Call the first group of rats “Group A” and the second group “Group B.”

       After running Group B through the maze, the researcher has 120 more scores.  How do the two groups compare?  Does Group B tend to be faster than Group A? 

       It’s hard to spot that sort of trend based just on the raw scores, but comparing graphs makes the trend more obvious.

 


       Sometimes it’s useful to know what percentage of scores lie above and below a particular score.  For example, after you’ve taken the SAT, you want to know what your raw score was, but also what percentage of people scored higher or lower than you did.

       To do this, we construct a cumulative frequency distribution or a cumulative percentage distribution.

 

       For each interval, the cumulative frequency is equal to the sum of the frequency in that interval, and all intervals that are below it.  The cumulative percentage is equal to the cumulative frequency divided by the total number of scores.


 

 

Interval

 

Frequency

Cumulative Frequency

Cumulative Percentage

152-159

3

120

100.0

144-151

9

117

97.5

136-143

13

108

90.0

128-135

15

95

79.2

120-127

20

80

66.7

112-119

21

60

50.0

104-111

17

39

32.5

96-103

13

22

18.3

88-95

7

9

7.5

80-87

2

2

1.7

 


Once we’ve constructed a cumulative frequency distribution, we can calculate percentiles or percentile ranks.

 

·      Percentile: a value below which a particular percentage of scores fall.

 

For example, we can calculate the value below which 75% of the maze completion times for our first group of rats fall (P75).

       To do this:

 

1.        Calculate how many scores will fall below the percentile (cum fp)

 

cum fp = (% of scores below) x

# of scores (N)

 

              cum fp = (0.75) x 120 = 90

 

2.        Determine the lower real limit (XL) of the interval containing the percentile.

a.   XL = 127.5

 

3.        Determine how many additional scores are required within the interval in order to reach the percentile.

a.   What’s the frequency below XL?   80

b.  What is the percentile point?  90

c.   Therefore, we need 10 more scores.

 

4.        Determine the number of units within the interval we need in order to get those extra scores.

 

 

5.        Determine the score (the percentile point) that corresponds to the correct percentile.

 

Percentile point = XL + additional units

P75 = 127.5 + 5.3  =  132.8

 

75% of the obtained scores are less that 132.8.


       Suppose what you’re really interested in is how a particular rat (your favorite rat, I suppose) did on the task, relative to other rats in Group A.  Did your rat do better than 50% of the others?  80%?

       To answer that question, you need to compute a percentile rank: the percentage of scores falling below a particular score.

 

       Suppose your rat escaped the maze in 110 seconds.

      

 

 

percentile rank = 29.8

 

       So, 29.8% of the rats had lower completion times than your rat.  When dealing with reaction time as a dependent variable, however, less is better, so your rat outperformed 71.2% of his cohort!


       One problem with using frequency distributions on grouped scores is that you lose some information about the individual scores within each interval.

       One method for overcoming this in some cases is to represent the data using a stem and leaf diagram rather than a histogram or polygon.

       In a stem and leaf diagram, each score is represented by two components: a stem (usually the first digit) and a leaf (usually the remaining digits).

 


       Consider the following 30 scores on a hypothetical memory test:

 

       85   90   64   73   94   82   67   78   89   98

       76   63   84   92   76   85   88   93   69   72

       84   66   78   82   94   75   86   95   63   78

 

       An example stem and leaf diagram for these data would be:

 

6        |  3 3 4 6 7 9

7        |  2 3 5 6 6 8 8 8

8        |  2 2 4 4 5 5 6 8 9

9        |  0 2 3 4 4 5 8

 


Note that you can stretch this out a little to make it more informative by repeating stems:

 

6 |  3 3 4

6 |  6 7 9

7 |  2 3

7 |  5 6 6 8 8 8

8 |  2 2 4 4

8 |  5 5 6 8 9

9 |  0 2 3

9 |  4 4 5 8

 

       Stem and leaf diagrams represent a nice compromise between the simplicity of graphical representation and the usefulness of retaining individual data values.  Their usefulness is limited, though, to cases in which there are relatively few scores (< 100).

 

 

Quantitative SAT Score

Final Exam Score

595

68

520

55

715

65

405

42

680

64

490

45

565

56

580

59

615

56

435

42

440

38

515

50

380

37

510

42

565

53

520

46

495

43

600

56

580

53

525

50

485

45

560

52

620

58

680

64

570

56