Kaggle Learning Log 1

Foreword

As a data science student, I started using Kaggle to practice machine learning and data analysis.

Kaggle is useful because it provides real datasets, public notebooks, competitions, and a community where beginners can learn from other people’s workflows.

Day 1

The first day was mainly about reviewing Python and Pandas.

Although I had learned some Python before, using it for data analysis is different from only learning grammar. In a data science workflow, Python is used to:

  • load data,
  • inspect columns,
  • clean missing values,
  • create features,
  • train models,
  • evaluate results.

loc and iloc

One useful difference I reviewed is loc vs iloc.

loc selects data by label:

1
2
reviews.loc[0]
reviews.loc[:, ["score"]]

iloc selects data by integer position:

1
2
reviews.iloc[0]
reviews.iloc[:, 0]

Simple memory:

  • loc means label-based selection.
  • iloc means integer-position-based selection.

Linear algebra review

I also reviewed some basic linear algebra.

For two vectors:

1
2
v1 = (x1, y1)
v2 = (x2, y2)

their dot product is:

1
v1 · v2 = x1*x2 + y1*y2

This operation appears everywhere in machine learning. For example, linear regression, logistic regression, neural network layers, and attention mechanisms all involve dot products in some form.

Reflection

Kaggle is not just a competition platform. For me, it is a practice field.

The goal at this stage is not to win competitions immediately. The goal is to learn a complete workflow:

  1. Understand the task.
  2. Load and clean the data.
  3. Build a baseline model.
  4. Evaluate the model.
  5. Improve the model step by step.

This is also the mindset I want to build for future data mining, NLP, and recommendation projects.

作者

RichardF

发布于

2023-11-20

更新于

2026-06-24

许可协议