For those unfamiliar with Kaggle, Kaggle is the best and worst place to do data science on the internet. It’s the best place because you are in very controlled environment, you are given the exact training data, test data, and it’s a pure arms race to see who can minimize RMSE, MAPE, or some other metric in the hidden test response predictions. The purity of kaggle can also be overly simplistic and at some point the tricks involved in slight improvements are so far from real-world practical it can feel a little silly.

That being said, I had fun competing in the Backpack Price Prediction competition series. I’ll go over the three things I tried in this competion and what I learned.

  • Gradient Boosted Tree
  • Light GBM
  • KNN Approach