AIRFLOW XComs and passing data between tasks in Airflow
Tasks in a DAG run in isolation and XComs let them share data. Learn how to pass values between real tasks and inspect them in the UI.
What we're doing
You'll learn why tasks can't share data directly, what XComs are, and build a DAG where one task produces a value and another consumes it.
Step 1: The problem — tasks are isolated
In a normal Python script you'd do this:
data = extract()
result = transform(data)
One function returns a value, the next one uses it.
But in Airflow this doesn't work. Each task runs in complete isolation — potentially on a different worker, at a different time, in a different process. A variable created inside one task simply doesn't exist for the next task. They don't share memory.
So how does the transform task get the data that extract produced? Lets try to understand step-by-step.
Step 2: What are XComs
XCom stands for cross-communication. It's a small storage layer inside Airflow's database where tasks can leave values for other tasks to pick up.
Two operations:
Push — a task stores a value. The simplest way: just return it from your function. Airflow pushes the return value automatically.
Pull — another task retrieves that value using the task id of the task that pushed it.
One critical rule — XComs are for small data only. A filename, a count, a date, a small dict. They're stored in Airflow's metadata database, which is not built for large payloads.
Step 3: Create the DAG file
Click VS Code in the environment panel. Right click on the dags folder and create a new file called xcom_dag.py.
Step 4: Add the imports and the first function
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime
def extract():
record_count = 42
print(f"Extracted {record_count} records")
return record_count
The imports are the same as always. The new thing is in the function — return record_count.
When a function used by a PythonOperator returns a value, Airflow automatically pushes it to XCom under the key return_value, tagged with the task's id. The returning value is pushing an XCom.
Step 5: Add the function that pulls the value
def transform(**context):
ti = context["ti"]
count = ti.xcom_pull(task_ids="extract")
print(f"Transforming {count} records...")
return count * 2
**context— same context dictionary you saw in the scheduling tutorialcontext["ti"]—tistands for task instance. It's the object representing this specific task run, and it has the XCom methodsti.xcom_pull(task_ids="extract")— pulls the value that theextracttask pushed. You reference the task by itstask_idreturn count * 2— this function also returns a value, so it pushes its own XCom too
Step 6: Add the final function and the DAG
def load(**context):
ti = context["ti"]
result = ti.xcom_pull(task_ids="transform")
print(f"Loading {result} processed records into the database")
with DAG(
dag_id="xcom_dag",
start_date=datetime(2024, 1, 1),
schedule="@daily",
catchup=False
) as dag:
t1 = PythonOperator(task_id="extract", python_callable=extract)
t2 = PythonOperator(task_id="transform", python_callable=transform)
t3 = PythonOperator(task_id="load", python_callable=load)
t1 >> t2 >> t3
load pulls from transform the same way transform pulled from extract. The DAG block and tasks are the same pattern you already know.
So the data flows: extract returns 42 → transform pulls 42, returns 84 → load pulls 84 and prints it.
Save with Ctrl+S.
Step 8: Trigger it and check the logs
Open the Airflow UI from the environment panel. Find xcom_dag on the DAGs page and trigger it with the play button.
Open the Graph view and wait for all three tasks to go green. Then check each log:
extractlog:Extracted 42 recordstransformlog:Transforming 42 records...loadlog:Loading 84 processed records into the database
Step 9: Inspect the XComs in the UI
Click on the extract task and look for the XCom tab. You'll see:
- Key:
return_value - Value:
42
Do the same on transform.
Here you debug data passing. If a downstream task gets None, check the XCom tab of the upstream task, if there's no value there, then it means that upstream function never returned anything.
After hibernation
If the VM hibernates, reconnect and run in the VS Code terminal:
cd ~/airflow
docker compose up -d
What's next
Start Airflow