Kaggle Learning Log 1
Foreword
As a data science student, I started using Kaggle to practice machine learning and data analysis.
Kaggle is useful because it provides real datasets, public notebooks, competitions, and a community where beginners can learn from other people’s workflows.
Day 1
The first day was mainly about reviewing Python and Pandas.
Although I had learned some Python before, using it for data analysis is different from only learning grammar. In a data science workflow, Python is used to:
- load data,
- inspect columns,
- clean missing values,
- create features,
- train models,
- evaluate results.
loc and iloc
One useful difference I reviewed is loc vs iloc.
loc selects data by label:
1 | reviews.loc[0] |
iloc selects data by integer position:
1 | reviews.iloc[0] |
Simple memory:
locmeans label-based selection.ilocmeans integer-position-based selection.
Linear algebra review
I also reviewed some basic linear algebra.
For two vectors:
1 | v1 = (x1, y1) |
their dot product is:
1 | v1 · v2 = x1*x2 + y1*y2 |
This operation appears everywhere in machine learning. For example, linear regression, logistic regression, neural network layers, and attention mechanisms all involve dot products in some form.
Reflection
Kaggle is not just a competition platform. For me, it is a practice field.
The goal at this stage is not to win competitions immediately. The goal is to learn a complete workflow:
- Understand the task.
- Load and clean the data.
- Build a baseline model.
- Evaluate the model.
- Improve the model step by step.
This is also the mindset I want to build for future data mining, NLP, and recommendation projects.
Kaggle Learning Log 1