Let’s see how to increase the stability of a model using bagging technique

Data scientists usually search for a model that has the highest accuracy possible. However, they should focus on another term too, which is stability. In this article, I explain what it is and how to increase it using a technique called “bagging”.

Bias-variance tradeoff

In machine learning, the prediction error can be…

Let’s see how to use Random Forest for feature selection

Feature selection has always been a great problem in machine learning. According to my experience, I can say it’s the most important part of a data science project, because it helps us reduce the dimensions of a dataset and remove the useless variables. Fortunately, there are some models that help…

Let’s see some differences between these two disciplines

Data Science and machine learning are two wonderful and exciting disciplines and are a great part of our lives. Sometimes people confuse them, but they are quite different things.

What data science is

Data Science is, like the name suggests, the science of data. It’s a set of techniques and tools that make the…

Myths and reality about passive income

There are several ways to think about passive income. Some people say it doesn’t exist at all, for other people it’s a goal and for somebody it’s like a tool to reach other goals. Here comes my personal definition of passive income.

It’s all about the work-money ratio

Passive income is often referred to as that…

Office Hours

Myths about what managers ask to data scientists and what they should ask instead

Data Science has entered the world of big companies, where data is. Managers of such companies often ask things that they don’t actually need and forget to pretend the only useful things to have.

“I want an algorithm per month”. Yes, I once heard somebody saying something like that and…

So what follows is an incredibly inspirational story of how I became a millionaire before the age of 30. Perhaps it’ll motivate you to be like me, and make a lot of money, as well. …

During this horrible period of the pandemic, every creative person must face the great problem to keep producing creative content while the mind of the people is filled with non-creative things. Masks, vaccines, social distance are all words that fill our heads not leaving enough room for our creativity. …

A simple Python library for dealing with collinear variables

Collinearity is a very common problem in machine learning projects. It is the correlation between the features of a dataset and it can reduce the performance of our models because it increases variance and the number of dimensions. It becomes worst when you have to work with unsupervised models.


Data scientists usually need to check the statistics of their datasets, particularly against known distributions or comparing them with other datasets. There are several hypothesis tests we can run for this goal, but I often prefer using a simple, graphical representation. I’m talking about Q-Q plot.

What is Q-Q plot?

Q-Q plot is often…

Statistical analysis of the correlation between vaccination campaign in Italy and Covid-19 infection

Vaccines are supposed to lower the impact of Covid-19 infection. Is it actually true? In this article, I show some results that point towards a correlation between vaccination and the decrease in the number of infections.

The analytical framework

The kind of analysis I want to do is meant to show a correlation…

Gianluca Malato

Theoretical Physicists, Data Scientist and fiction author. I teach Data Science, statistics and SQL on YourDataTeacher.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store