harisrid Tech News

Spanning across many domains – data, systems, algorithms, and personal

TOTW/19 ( Part Two ) – The Road to Tackling Ambiguity – Case Study #2

“The more examples you can see, the more clearly you can understand.”

A Primer

Let me share a second example – sometimes, it helps to see other task breakdowns, to get a more holistic perspective. This breakdown entails a single day – 8 hours – worth of effort.

Your Task

Periodically execute two endpoints to scan records and to send records.

Ok, seems easy, right.

Well, not exactly. Especially for a someone new to Airflow and setting up task-based DAGs. Which even for a senior engineer, can be difficult.

Where do I begin? Can I start out with simple POCs [ proof-of-concepts ]. It’s hard to get an estimate for how long this will take. Will it take me a day? A week? A few hours?

Hmmmm, before we dive into code, let’s dive into a high-level breakdown. We can gather a couple of requirements and scope out well-defined units of work.

The Task knowns

  • We identify the periodic execution mechanism – Airflow DAGs – who use is up-&-coming.
  • We know the first endpoint has to query a database.
  • The second endpoint has to send payloads, based on those queries.
  • We have the endpoints working locally and remotely in our lower-level environments ; they’ve been verified from our API Testing clients and programatically from code.

Hmm – Lemme spend 10 minutes of my day and conjure up tasks!

The Task Breakdowns

  • Task #1 ( A single print ): Set up a “Hello World” Airflow DAG, to print a single execution of “helloWorld” in our lower level env.
  • Task #2 ( The recurring basis ) – Modify the DAG to print “helloWorld”, on a repeated, configurable basis of 5-15 seconds. Verify log files show execution.
  • Task #3 ( Endpoint #1 ) – Modify the DAG to execute the first endpoint periodicially and print inputted records to log files
  • Task #4 ( Endpoint 1 & 2 ) – Modify the DAG to execute the second endpoint as well, setting up the pipe from endpoint #1 code to endpoint #2 code. Verify plumbing works.

Alright, can I quickly set up a time table for focused blocks of work? YES!

The Task Time Table

TaskEstimated Time To Task Completion
Task #1 : A single print2 hours
Task #2 : The recurring basis1 hour
Task #3 : Endpoint 13 hours
Task #4 : Endpoint 1 & 2 3 hours
Total9 hours [ buffered to 10 hours ]
Figure #2 : In actuality, it’s around 2-3 business days of work for the deliverable.

Woah. An insurmountable major task, hard to estimate, is suddenly … easier to estimate?

But the best part is the communicating benefits. You just made communicating progress and easier. Because it’s easier to confidently communicate “We’re making positive progress” when you completed 1/4 or 3/4 tasks and can say to a product owner “We verified that we can periodically print output in an Airflow DAG and that deployments to a given environment work”. The sense of completion is so much better defined, in place of saying the feature is “not done” or “partially done”.

Posted in

Leave a comment