harisrid Tech News

Spanning across many domains – data, systems, algorithms, and personal

TOTW/10 – Bolstering your logging matters. Your more seasoned senior engineers are on the dollar.

But logging work is so painful!

Detoinne, an eager-to-learn, strongly self-driven junior engineer, wants to submit his code quickly to meet a feature deliverable. But his more astute senior engineer, Youngju, hightails on code review and she notices a lack of solid logging, monitoring, and observability statements. She also observes this on the functions’ code paths – both the paths ( 200 OK ) and unhappy paths ( 429 Error responses ) – before returning HTTP responses and errors to the end user.

Detoinne feels frustration and bemoans in his head. “Ugh it’s so excruciating. My senior engineer is super pedantic about my code. Why do I even have to bolster the logging presence. This is so boring and painful.

I hear you out. JIRA tickets titled “Refactor our centralized logging posture.” and back-and-forth code reviewers telling you to amend your logging statements can feel annoying. Your developer velocity, a key metric used in performance reviews, precipitously drops. But hear me out on the value of good logging – a couple of extra minutes spent during code writing and code reviewing can save your hours of debugging, triaging, and executing root cause analysis during the worst of production bugs and SEV-incidents, where you’ll need to scour hundreds of files to find a needle-in-a-haystack error.

Not only in terms of saving time, but even with respect to upskilling as an engineer. Solid logging practices frame your thought process into thinking not just about a single run of execution on a local machine, but how your program interacts with multiple machines, failure modes, and assumptions. Better logging naturally engenders better readability and system design understandings.

What would a junior engineer log?

It’s not that junior engineers can’t write good logs, but they’re typically author log statements from the frame of reference of single executions on a local machine – typical of a classroom setting. They’re reminiscent of println(debug) or console.log(debug) statements – useful, but not well-suited for enterprise-grade production applications.

println("Noticed bad request {request}").

The debug statement – here, a println() statement’s – is not informative enough. An external developer will ask a couple of questions. Suppose for example, this request traces across multiple machines, triggering the calls of internal microservices or storage and retrieval of database-held enterprise records. An external developer naturally has a lot of questions.

What clarifying questions would a senior engineer ask?

  1. Request protocol – Is it an HTTP, RPC, or another protocol that’s failing?
  2. Enterprise assets – What enterprise assets does the request correspond to? Is it a request for customer records retrieval?
  3. Request type – What request type is failing? Is it a failing /GET, /POST, or /DELETE?
  4. Error Code – Did I expect to see error codes? Should I be seeing a 4xx error code? Is the error code HTTP specific or Enterprise-specific enum?
  5. Request Metadata – Do I need to see request metadata too? How do I correlate the request ID to the request body being sent?
  6. Timestamp of failure – What time was the request sent? I lack timestamp information at a granularity of HH::MM::SS
  7. Location of failure – What machine, process, and thread did the request fail on? We don’t have notions of PID ( processID ) or TID ( threadID )
  8. Log severity – what’s the log severity? Should I halt program execution on a LOG:FATAL or dispatch a LOG:WARN to AWS CloudWatch logging and carry on with processing?

What would a senior engineer log?

Alright, we’ve gone over the clarifying questions. Let’s analyze how a senior developer would log a failed request, and what makes this better


splunkLogger.log(LOG::FATAL, "Attempted to store customer financial record into database column {RESOURCE_URN} on database platform {DATABASE_PLATFORM}. Encountered malformed user request {REQUEST.METADATA} at timestamp {TIMESTAMP}. Failing on HTTP error code {HTTP_ERROR_CODE}.").
  1. splunkLogger.log – The usage of pre-existing logging framework libraries – such as those by third-party tools Splunk and New Relic – instructing (A) the logging software and (B) location of logs
  2. LOG::FATAL – incorporates enum/library-based warning message.
  3. store customer financial records – Explanation of Enterprise assets/business logic under execution in verbose message
  4. RESOURCE_URN – informs level of failure : database > schema > table > column
  5. DATABASE_PLATFORM – instructs DB type ( e.g. SQL, Amazon DDB )
  6. REQUEST.METADATA– dumps a lightweight representation of the failing request which can include other useful data such as customerId, customerName. We could scope down to just request.uid, but object hydration may be lacking.
  7. TIMESTAMP – informs the time of failure
  8. HTTP_ERROR_CODE – informs the error type ( e.g. HTTP 404 for Resource not Found cases )

Do we have other benefits to logging?

We absolutely do! It’s not just about saving developer cycles and resources whilst pre-empting future edge cases. There’s more!

  1. Enable fuzzy search or exact search – external developers can comb through log files and start typing in a few keywords to immediately locate an issue – especially across a collection of timestamp-ordered logfiles isolated to a single machine.
  2. Collection and aggregation of log metrics – do we need a count of how often we run into HTTP 404 errors? Do we need a count of how many LOG::FATAL versus LOG::ERROR messages we’re seeing and possibly reconsider changing log level granularity? Do we need to store these metrics into a database as part of upcoming business needs?
  3. Best practices standardization – if developers in a team or a company agree on unified logging practices and standards, shipping, reviewing, and debugging code speeds up.
  4. Formatting eases filling information – logging practices which follow well-defined formats enable engineering talent to quickly backfill in debug, warning, error, and fatal statements with the correct granularity of data.

The Silver Lining

I made a compelling justification for logging, introducing the many benefits across the domains of observability, monitoring, tracing, and debugging. To impart a silver lining, the earlier and the more frequent engineers practice good logging, the faster and more natural it becomes. It’s suddenly natural to think of good logging while writing source code before submitting to code reviewers. And like flossing, it’s better to learn this skill when starting out your tech career versus years down the road, where bad habits can develop.

Posted in

Leave a comment