Member-only story

How to choose the bins of a histogram?

Gianluca Malato
3 min readNov 22, 2021
Image by author

Histograms are a very useful tool when we want to give a quick sight to the shape of our data. However, we always have to choose the right number of bins.

What is a histogram?

A histogram is a representation of the probability distribution of a dataset. Given a bin width, the range of the variable is split into non-overlapping intervals of that width and, for each interval, we count how many values fall inside it. This determines the height of the histogram bar.

Histograms are very useful because they are able to give us a clear overview of the shape of the distribution of a variable. We can easily see if a variable is skewed, if it’s multimodal, if it has fat tails and so on. That’s why mastering the use of histograms is mandatory for any data scientist and analyst.

But there’s a problem: how to choose the number of bins in a histogram?

The number of bins

Let’s make a simple example in Python. Let’s simulate 6000 randomly generated points from a normal distribution.

import numpy as np 
import matplotlib.pyplot as plt
np.random.seed(0)
x = np.random.normal(size=6000)

If we plot the histogram using plt.hist function, the default number of bins is 10.

--

--

Gianluca Malato
Gianluca Malato

Written by Gianluca Malato

Theoretical Physicists, Data Scientist and fiction author. I teach Data Science, statistics and SQL on YourDataTeacher.com. E-mail: gianluca@gianlucamalato.it

No responses yet