AIRFLOW

XComs and passing data between tasks in Airflow

Tasks in a DAG run in isolation and XComs let them share data. Learn how to pass values between real tasks and inspect them in the UI.

What we're doing

You'll learn why tasks can't share data directly, what XComs are, and build a DAG where one task produces a value and another consumes it.

Step 1: The problem — tasks are isolated

In a normal Python script you'd do this:

data = extract()
result = transform(data)

One function returns a value, the next one uses it.

But in Airflow this doesn't work. Each task runs in complete isolation — potentially on a different worker, at a different time, in a different process. A variable created inside one task simply doesn't exist for the next task. They don't share memory.

So how does the transform task get the data that extract produced? Lets try to understand step-by-step.

Step 2: What are XComs

XCom stands for cross-communication. It's a small storage layer inside Airflow's database where tasks can leave values for other tasks to pick up.

Two operations:
 

Push — a task stores a value. The simplest way: just return it from your function. Airflow pushes the return value automatically.
 
Pull — another task retrieves that value using the task id of the task that pushed it.

One critical rule — XComs are for small data only. A filename, a count, a date, a small dict. They're stored in Airflow's metadata database, which is not built for large payloads.

Step 3: Create the DAG file

Click VS Code in the environment panel. Right click on the dags folder and create a new file called xcom_dag.py.

Step 4: Add the imports and the first function

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def extract():
    record_count = 42
    print(f"Extracted {record_count} records")
    return record_count

The imports are the same as always. The new thing is in the function — return record_count.

When a function used by a PythonOperator returns a value, Airflow automatically pushes it to XCom under the key return_value, tagged with the task's id. The returning value is pushing an XCom.

Step 5: Add the function that pulls the value

def transform(**context):
    ti = context["ti"]
    count = ti.xcom_pull(task_ids="extract")
    print(f"Transforming {count} records...")
    return count * 2
  • **context — same context dictionary you saw in the scheduling tutorial
  • context["ti"]ti stands for task instance. It's the object representing this specific task run, and it has the XCom methods
  • ti.xcom_pull(task_ids="extract") — pulls the value that the extract task pushed. You reference the task by its task_id
  • return count * 2 — this function also returns a value, so it pushes its own XCom too

Step 6: Add the final function and the DAG

def load(**context):
    ti = context["ti"]
    result = ti.xcom_pull(task_ids="transform")
    print(f"Loading {result} processed records into the database")

with DAG(
    dag_id="xcom_dag",
    start_date=datetime(2024, 1, 1),
    schedule="@daily",
    catchup=False
) as dag:

    t1 = PythonOperator(task_id="extract", python_callable=extract)
    t2 = PythonOperator(task_id="transform", python_callable=transform)
    t3 = PythonOperator(task_id="load", python_callable=load)

    t1 >> t2 >> t3

load pulls from transform the same way transform pulled from extract. The DAG block and tasks are the same pattern you already know.

So the data flows: extract returns 42 → transform pulls 42, returns 84 → load pulls 84 and prints it.

Save with Ctrl+S.

Step 8: Trigger it and check the logs

Open the Airflow UI from the environment panel. Find xcom_dag on the DAGs page and trigger it with the play button.

Open the Graph view and wait for all three tasks to go green. Then check each log:

  • extract log: Extracted 42 records
  • transform log: Transforming 42 records...
  • load log: Loading 84 processed records into the database

Step 9: Inspect the XComs in the UI

Click on the extract task and look for the XCom tab. You'll see:

  • Key: return_value
  • Value: 42

Do the same on transform.

Here you debug data passing. If a downstream task gets None, check the XCom tab of the upstream task, if there's no value there, then it means that upstream function never returned anything.

After hibernation

If the VM hibernates, reconnect and run in the VS Code terminal:

cd ~/airflow
docker compose up -d

What's next

Now go and try this out in a live environment — boot a fresh cluster and play with the manifests above.

Start Airflow
Spec 2 CPU / 8 GiB ·Disk 20 GiB ·Lifetime 7 days
Sign in to launch this environment
Required 1 VM · 2 CPU · 8 GB
Your plan (free) 1 VM · 1 CPU · 2 GB
Sign in