1 - With BIg Data you should use parallel computing frameworks as Spark. | by Gianluca Malato

Very useful post and code, thank you. I have a couple of questions if i may:
1
Pier Paolo Olimpieri
Gianluca Malato
·Follow
1 min read·
Sep 30, 2020
--
1 - With BIg Data you should use parallel computing frameworks as Spark. The code will change from the one I've written in this article, according to the framework you use.
2 - Yes, a well performed stratified sampling will keep the interactions as long as the destination dataset is large enough to make statistics converge properly.
--
--
Written by Gianluca Malato
3K Followers
·33 Following
Theoretical Physicists, Data Scientist and fiction author. I teach Data Science, statistics and SQL on YourDataTeacher.com. E-mail: gianluca@gianlucamalato.it
No responses yet
Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams