Digital Maestro Publications

View Original

Histogram charts with Google Sheets

Introduction

Histograms are used in statistics. They are used to show the frequency of a set of continuous data. Numerical data can be discrete or continuous. Discrete data is counted. Continuous data is measured. Examples of discrete data include the number of students in a class or the number of faces on a die. Continuous data can take any value. Examples of continuous data include a person’s height or weight.

The data used in the lesson is from a paper airplane contest. I love paper airplane contents; they involve all sorts of learning opportunities. The contest has one variable—a basic airplane design. The goal is to fly the farthest distance from a starting line.

The distances in this scenario are rounded to the nearest foot. Any plane landing 6-inches or above the nearest foot is rounded up. For example, a plane landing at 10-feet 7-inches is rounded to 11 feet. A plane landing 10-feet 5-inches is rounded to 10 feet.

Resources

Use the links below to get a copy of the working document and previews of the final product.

Airplane contest histogram preview

Airplane contest working document

Pre-requisite

To understand the concepts in the rest of this lesson, we need to understand how to prepare the data for a histogram. The sheet in the working document has a list of students and their attempts. The measurements are not in any order.

The data has a range. The range is the difference between the smallest and largest measurement. The smallest distance is 10-feet; the largest is 30-feet. We subtract these distances to get a range of 20-feet. This information is used to determine the range for each bar in the histogram. The range is the number of measurements held in each bin.

The bars in a histogram are called bins or buckets. The terms are descriptive of what they do. They are bins or buckets that hold content.

We are creating a histogram with five buckets. Five buckets tend to be the typical histogram format. It works well for our information because there isn't much of it.

Take the range and divide it by the number of buckets we want—(30-10)/5. This results in a range width for each bucket of 5. We want five buckets and it so happens that the range of each bucket is also 5.

This is how we construct the bucket ranges and frequencies. We begin with the smallest value; that value is 10. *We don’t want our buckets to begin with the value they are recording; you will see why this is important later.* To make sure this doesn’t happen we subtract .5 from the smallest value and add .5 to the largest value.

Bucket table

We make a table with a range of information. Look at the image below. The smallest value is 10 and we subtract .5 to get 9.5 for the starting value. We calculated each bucket range to be 5; we add 5 to 9.5 and get 14.5. The next range begins at 14.5 and we add 5. The next range is from 14.5 to 19.5.

We repeat this process until we have five buckets.

The information that goes into these buckets is the frequencies or the number of times a value falls within each bucket. In other words; how many times does a distance fall within the values of one of these ranges.

The table below shows that 7 distances fall between 9.5 and 14.5. Five distances fall between 14.5 and 19.5. We continue counting until all the distances are accounted for in each bin.

The histogram we will create needs to represent the information shown in the table.

Google Sheet histogram

We need the raw values for each distance. Histograms are created with one column of data.

Select the values; you can include the names. The names won’t part of the histogram chart.

Click the insert chart button.

Google thinks I want to create a line chart.

Click the chart selector; scroll to the bottom of the options; select the histogram chart.

The histogram looks very good. It creates a good basic chart. The titles come from the titles above the chart data. We will adjust these later.

The Chart editor panel opens on the right. The panel has several sections. Click the Histogram section. The histogram options are basic. The number of buckets is selected for us. The chart shows the data is divided into five buckets or bins. Place a checkmark on the Show item dividers. This helps count the items in each bin. This will help us throughout the lesson.

The bucket selector has options for 1, 2, 5, 10, 25, and 50. These are the common bucket options.

Open the Horizontal axis section. The minimum and maximum values are set from the data. The range of values falls onto the data points. This is not standard. This results in errors like the one in this chart. Let's take a look.

We have buckets with values from 10 to 15, 15 to 20, 20 to 25, 25 to 30, and 30 to 35. The bucket for the range of 30 to 35 shows two values. These are the values from the two 30-foot measurements. Our data doesn't have any values above 30. This is misleading information. The issue with ranges will pop up again later.

Let’s update the range.

Update the minimum and maximum values. Use 10 for the minimum value and 30 for the max.

The chart updates the bucket values. We still have the five buckets. The distribution of data has changed. The counts for each bucket are 7, 4,6,4, and 7. These don't match the ones from the table we created in the Sheet.

The bucket ranges are still misleading. Let's take a look at the count for the bucket that ranges from 22 to 26. The data shows the count of 4 is, that's correct, however, look a the count for the fifth bucket. It shows a count of 7, but 9 values technically meet this criterion.

This is the problem; we don’t know where to place the count for the 26-foot measurement. Does it fall under the fourth or fifth bin? This is why bin ranges should not fall onto values.

It is customary to set the minimum value to a percentage of the lowest and highest number; this is typically .5.

Use 9.5 for the min value and 30.5 for the max.

The ranges in the chart change and so do the counts in the bins. The counts are 7, 4, 8, 2, and 9. The lower and upper ranges are set to the values we placed. The range for each bin is adjusted for us. The counts are set to 4.2; not the 5 we calculated. This causes problems with the counts again.

The count in each bin isn’t exactly correct here. The measurements are in whole numbers. A bin of 4.2 forces us to split the difference when counting the distances and placing them into each bin.

Go to the histogram section; use 5 for the bucket size.

The bucket ranges update. The bucket counts update: 7, 5, 9, 7, and 2.

Use the chart title section to change the main title. Set the title to Paper Airplane Flight Contest.

Select the Vertical axis title option. Set the title to Distance in feet. We don’t need to change the horizontal axis title.

This is an excellent example for students. It demonstrates that they need to understand the manual process. The histogram would have been created with errors if they did not understand what needed to be done and what it needed to look like ahead of time.