AIRFLOW

What Airflow is, why orchestration, install & first look

Apache Airflow is an open-source platform for orchestrating data pipelines. Learn what it is, why orchestration matters, and familiarize yourself with the UI on a real VM.

What You Will Learn

You will get general information about Airflow, understand what orchestration means, why cron jobs don't cut it, and how Airflow solves that. Then, you'll run the VM abd open the UI for the first time.

Watch the video first before proceeding to the steps below.

Step 1: What is Orchestration?

Imagine a data pipeline that runs every day:

  • Pulls data from an API.
  • Cleans and transforms it.
  • Loads it into a database.
  • Sends a report when done.

You could run each step manually, but that doesn't scale. You could use cron jobs, but the problem is that cron has no dependency management, no automatic retries, and no visibility into what failed and why.

This is the exact problem orchestration solves. An orchestrator runs your steps in the right order, retries on failure, skips downstream tasks if something breaks upstream, and gives you a visual UI to monitor everything.

Airflow = orchestration for data pipelines, defined entirely in Python.

Step 2: Core Concepts

Before touching the UI, there are three terms you absolutely need to know:

  • DAG (Directed Acyclic Graph): This is your pipeline. It is a Python file that defines all your tasks and the exact order they run in.
  • Task: A single unit of work inside a DAG. It can be a Python function, a bash command, or an SQL query.
  • Scheduler: The engine that watches your DAGs and triggers runs on a schedule (every hour, every day, or using any cron expression).

Step 3: Open the UI

Airflow is already running. Click the Airflow UI link in the environment panel and log in using the following credentials:

  • Username: airflow
  • Password: airflow

You will land on the DAGs page. This is your main dashboard where every pipeline lives. Here, you can quickly see the last run, whether it succeeded or failed, and whether the pipeline is currently active or paused.

Explore the pipeline:

  1. Click on any example DAG.
  2. Click the Graph tab to see how the tasks connect to each other. This visual layout is exactly what Airflow sees when it runs your pipeline.

After Hibernation

If the VM hibernates, reconnect to your terminal and run the following commands:

cd ~/airflow 
docker compose up -d

What's next

Now go and try this out in a live environment — boot a fresh cluster and play with the manifests above.

Start Airflow
Spec 4 CPU / 12 GiB ·Disk 20 GiB ·Lifetime 7 days
Sign in to launch this environment
Required 1 VM · 4 CPU · 12 GB
Your plan (free) 1 VM · 1 CPU · 2 GB
Sign in