Python for Data Science: Tutorial 1/10

— Why Your Setup Matters More Than Your First Program

Jun 11, 2026

You’re going to write your first Python program soon. It’ll be something small — a loop, a function, maybe a script that calculates something trivial. And then you’ll feel nothing.

Not because the code is boring. Most Python tutorials skip the actual reason you’re learning this. They treat the language like syntax to memorize, rather than a thinking tool. You memorize a for loop. You understand a list. You miss the whole point.

We’re not starting with code. We’re starting with setup, mindset, and the foundation that makes every program after this one matter.

*computer desk with laptop showing terminal window with Python environment — Photo by [Jakub Zerdzicki]* on Pexels*

## CONTEXT

Here’s what I notice: people learn Python in two ways.

The first way — the way most tutorials teach — is additive. You collect syntax rules. Variables, loops, functions, classes. Each one another rule to remember. By week six you’re drowning in details, and you never quite see how they fit together.

The second way is structural. You learn how to think about problems first, then you use Python as a language to express those solutions. Setup matters here. Your environment, your testing habits, your first script — these aren’t overhead. They’re the scaffolding that makes learning stick.

If you’re learning Python for data science, you’re in that second group whether you know it or not. You’re not learning a language. You’re learning to communicate with data. And that requires a different starting point than a web developer or an automation engineer uses.

This tutorial assumes you have zero Python experience but reasonable comfort with a computer. It also assumes you want to understand why we’re doing each step, not just follow a checklist.

## SECTION 1 — Environment Setup: The Non-Negotiable Foundation

Your environment is the difference between “I learned Python” and “I can use Python to think clearly.”

Most tutorials have you install Python, maybe pip, and you’re done. That works for toy projects. For data science, it doesn’t work. You need isolation. You need reproducibility. You need to know what version of what library you’re using, because a bug you have today might disappear when you upgrade something next month, and you need to trace that.

Here’s the reality: bad environment setup is why most people’s first data science projects fail silently. Code runs. You get an answer. But you can’t recreate it, you can’t extend it, and six months later when you need that analysis again, nothing works.

Install Python 3.11 or higher (3.12 if your libraries support it — check before you commit). Use Homebrew on macOS, the official installer on Windows, apt on Linux. Then immediately create a virtual environment:

# Create a project directory and move into it

mkdir python_data_science && cd python_data_science

# Create a virtual environment

python3 -m venv env

# Activate it

source env/bin/activate  # macOS/Linux

# or

env\Scripts\activate  # Windows

That `source env/bin/activate` line? That’s the most important line in your Python journey. Every time you sit down to code, run it. Every time you close your laptop, your environment is still clean. No accidental global installs. No version conflicts. No “it works on my computer” problems.

Install one thing: `pip install jupyter notebook`. Jupyter is where you’ll learn Python for data science. It’s not where you’ll *build* things, but it’s where curiosity lives.

I learned this lesson the hard way while building AI and data projects across different environments — local machines, Docker containers, cloud notebooks, even managed runtimes like Databricks. One project worked perfectly on my laptop and completely failed inside a container because the Python version and dependency tree didn’t match. Another time, a subtitle-generation workflow broke because a library update silently changed behavior between environments. I didn’t lose time because Python was hard. I lost time because the setup was sloppy. That’s when I stopped treating environments like optional cleanup work and started treating them like part of the actual engineering.

The takeaway: spend 15 minutes on this. It’s boring. It’s also the only part of this course that has to be done exactly right.

## SECTION 2 — Computational Thinking Before Code: The Mental Model

Before we write a single program, we need to talk about how to think like a programmer.

There’s a difference between running code that works and understanding why it works. The second skill is what you need.

Computational thinking has four parts: decomposition, pattern recognition, abstraction, and algorithm design. These words sound abstract. They’re not. They describe how you solve any problem.

**Decomposition:** Take a big problem and break it into smaller pieces. Instead of “analyze this dataset,” you break it into: load the data, clean it, explore it, find patterns, visualize results. Each step is smaller and manageable.

**Pattern recognition:** As you break problems down, you notice that similar steps repeat. You load data the same way each time. You clean common problems in the same way. These patterns become reusable.

**Abstraction:** Instead of writing the same data-loading code fifty times, you write it once, name it, and reuse it. That’s a function. That’s abstraction.

**Algorithm design:** Once you know what you need (decomposed), how you’ll do the repeated parts (patterns), and how to reuse solutions (abstraction), you have a sequence of steps. That’s an algorithm. That’s your program.

Here’s why this matters: Python’s syntax doesn’t teach you this. A tutorial that starts with “print(’Hello World’)” doesn’t teach you this either. But every line of Python you write from this point forward should reflect this structure.

When you sit down to solve a problem, ask: What’s the smallest piece I can solve first? What steps repeat? Where can I write code once and reuse it?

The code follows. The thinking comes first.

One of the biggest shifts in my own work happened when I stopped asking “What code should I write?” and started asking “What are the actual pieces of this problem?” I remember building workflows around long-form content repurposing — turning videos into subtitles, clips, structured posts, and metadata. At first it felt overwhelming because I saw it as one giant system. But once I decomposed it into stages — extract audio, generate transcript, clean text, identify highlights, render captions, export formats — the implementation became straightforward. Most programming becomes easier the moment the problem stops feeling like one giant blob.

## SECTION 3 — Your First Script: Hello Data, Not Hello World

Now we write code. But we’re not writing “Hello World.” We’re writing a script that touches data.

Create a new file in your project directory called `first_script.py`:

import json

# Define some data

people = [

    {”name”: “Alice”, “age”: 28, “field”: “data science”},

    {”name”: “Bob”, “age”: 35, “field”: “software engineering”},

    {”name”: “Carol”, “age”: 42, “field”: “product management”},

]

# Calculate average age

total_age = 0

for person in people:

    total_age += person[”age”]

average_age = total_age / len(people)

# Output

print(f”Average age: {average_age:.1f}”)

print(f”Total people: {len(people)}”)

# Save results

results = {

“average_age”: average_age,

“total_count”: len(people)

}

with open(”results.json”, “w”) as f:

    json.dump(results, f)

Run it: python first_script.py

You’ll see output in your terminal and a new file `results.json` with your results. Notice three things happening here:

First — you’re working with structured data (a list of dictionaries), not strings. That’s the data science part.

Second — you’re using a loop to accumulate a result. This is a pattern. You’ll do this thousands of times in data science: iterate, calculate, collect the result.

Third — you’re saving your output in a reproducible format (JSON). This matters because next time you run the script, you can reload this file and extend your analysis.

This is a real program. Not “Hello World.” A working piece of thinking.

This tiny script already contains the structure behind real-world data systems. Input data comes in, transformation logic runs, outputs get stored somewhere reproducible. The scale changes later — maybe the data lives in a warehouse, maybe Spark processes millions of rows, maybe an API triggers the pipeline automatically — but the mental structure stays surprisingly similar. That’s why learning small scripts properly matters more than jumping into “advanced AI” tutorials too early. The foundations scale farther than people think.

## SECTION 4 — The One Skill That Matters: Reading Error Messages

You’re going to write broken code. Today, tomorrow, forever. The only difference between someone who learns Python and someone who gives up is how they respond to errors.

Most tutorials don’t teach this because it’s not glamorous. But it’s the skill that compounds.

Python gives you error messages—they’re helpful. Let me show you how to read them.

Modify your first script. Change the line:

total_age += person[”age”]

to:

total_age += person[”ages”]  # Typo: “ages” instead of “age”

Run it:

python first_script.py

You get an error:

KeyError: ‘ages’

That’s your error message. Here’s how to read it:

**KeyError** — the type of error. Python is saying: “You tried to access a key that doesn’t exist.”

**’ages’** — the specific key you tried to access.

Now you know: I misspelled something. You fix it:

total_age += person[”age”]  # Fixed

It’s obvious when it’s a typo. But in real data science, you’ll get errors like `ValueError: could not convert string to float` or `IndexError: list index out of range`. The process is the same: read the error type, identify what went wrong, fix it.

Most of your learning will happen through errors. The people who give up are the ones who see an error and think “I’m not smart enough for this.” The people who succeed treat errors as feedback: “What did I do wrong, and how do I fix it?”

Start building this habit now. Run your code. When it breaks (and it will), read the error carefully. Google it if you don’t understand. Fix it. Run again. That cycle is 80% of programming.

## TAKEAWAY

You now have:

- An isolated environment where you can code without breaking anything

- A mental model for breaking problems into pieces

- Your first real program — one that reads data, calculates something, and saves results

- A way to debug when things break

You don’t have a lot of syntax memorized yet. You don’t need to. You have something better: a foundation.

Everything in the remaining nine tutorials builds on this structure. You’ll learn functions so you can reuse patterns. Classes so you can organize complexity. Libraries so you can stand on the shoulders of people who solved these problems before you. But none of it matters if your foundation is shaky.

Come back to this when things feel overwhelming. Re-run your first script. Remember that it worked because you set up your environment correctly and broke the problem down into pieces. That’s all programming is.

Run your script. Make sure `results.json` gets created. Reply with what you broke first (everyone breaks something — the fact that you broke it and fixed it means you’re learning correctly).

Ready for Tutorial 2? We’ll build on this foundation. Same structure: thinking first, code second.

Breath of Data Science

Discussion about this post

Ready for more?