A short guide on how to create comparable training and test datasets — Data scientists usually split a dataset into training and test sets. Their model is trained on the former and then its performance is checked in the latter. But, if these sets are sampled wrongly, model performance may be affected by biases. Should training and test sets be similar? From the beginning of my career, everybody used to…