“The more examples you can see, the more clearly you can understand.”
A Primer
Let me share a second example – sometimes, it helps to see other task breakdowns, to get a more holistic perspective. This breakdown entails a single day – 8 hours – worth of effort.
Your Task
Periodically execute two endpoints to scan records and to send records.
Ok, seems easy, right.
Well, not exactly. Especially for a someone new to Airflow and setting up task-based DAGs. Which even for a senior engineer, can be difficult.
Where do I begin? Can I start out with simple POCs [ proof-of-concepts ]. It’s hard to get an estimate for how long this will take. Will it take me a day? A week? A few hours?
Hmmmm, before we dive into code, let’s dive into a high-level breakdown. We can gather a couple of requirements and scope out well-defined units of work.
The Task knowns
- We identify the periodic execution mechanism – Airflow DAGs – who use is up-&-coming.
- We know the first endpoint has to query a database.
- The second endpoint has to send payloads, based on those queries.
- We have the endpoints working locally and remotely in our lower-level environments ; they’ve been verified from our API Testing clients and programatically from code.
Hmm – Lemme spend 10 minutes of my day and conjure up tasks!
The Task Breakdowns
- Task #1 ( A single print ): Set up a “Hello World” Airflow DAG, to print a single execution of “helloWorld” in our lower level env.
- Task #2 ( The recurring basis ) – Modify the DAG to print “helloWorld”, on a repeated, configurable basis of 5-15 seconds. Verify log files show execution.
- Task #3 ( Endpoint #1 ) – Modify the DAG to execute the first endpoint periodicially and print inputted records to log files
- Task #4 ( Endpoint 1 & 2 ) – Modify the DAG to execute the second endpoint as well, setting up the pipe from endpoint #1 code to endpoint #2 code. Verify plumbing works.
Alright, can I quickly set up a time table for focused blocks of work? YES!
The Task Time Table
| Task | Estimated Time To Task Completion |
| Task #1 : A single print | 2 hours |
| Task #2 : The recurring basis | 1 hour |
| Task #3 : Endpoint 1 | 3 hours |
| Task #4 : Endpoint 1 & 2 | 3 hours |
| Total | 9 hours [ buffered to 10 hours ] |
Woah. An insurmountable major task, hard to estimate, is suddenly … easier to estimate?
But the best part is the communicating benefits. You just made communicating progress and easier. Because it’s easier to confidently communicate “We’re making positive progress” when you completed 1/4 or 3/4 tasks and can say to a product owner “We verified that we can periodically print output in an Airflow DAG and that deployments to a given environment work”. The sense of completion is so much better defined, in place of saying the feature is “not done” or “partially done”.

Leave a comment