harisrid Tech News

Spanning across many domains – data, systems, algorithms, and personal

  • PERSONAL – Prepping for Interviews – marathon runners beat sprinters

    “They have hunted like this for countless generations” – David Attenborough, The Intense 8 Hour Hunt, on the San Bushmen of the Kalahari Desert

    How I Ran a Marathon

    Hi all,

    I want to write another blog post – this time, on effective study habits ( because I’ve actually been asked this before by a couple other folks ).

    Ok, so let’s begin. When I practiced Leetcode/Algorithmic problems, I kept consistency with a predicate, regimented routine of 15-minutes to 30-minutes daily, every day of the week ( yep that often included Saturday and Sundays ). Now of course, there were days or weeks where I didn’t do problems ( trust me, I did not solve problems on days when I was sick, the days I spent on vacation with family, my hiking weekend day trips, or family quality time ).

    But here’s the thing.

    I’ve never sprinted and crammed!

    I’ve never been like other engineers out there in the world who talked loudly about spending three intense months of practicing 3-4 hours per day. I could never see myself doing that. I am the type who burns out after around the 30-minute to 45-minute mark ( except on a good day, where I give two fragmented 30-minute sessions , or the days I’ve done technical onsite, which operates differently). Everyone is “wired and build differently”, and to be candid, I am not built for this type of intensity.

    But I am good at being persistent. And I think there’s benefits to the slow-burning approach over the intensity. I’m confident that my volume is the same – to quickly tabulate the equivalencies, 3-hours over 2 months = 1 hour over six months = 30 minutes over 12 months ( of daily practice, assuming a month is 30 days ).

    And I think this approach actually turned out much better. Let me review them!

    The Benefits

    1. Leverages Spaced Repetition – There’s more days of practice and there’s more time in between each practice sessions to review and go over concepts. You’re less likely to forget what you learned if you have more time to learn material ( versus learning it all at once )
    2. Accrue More Thinking Time – if you space out your studying, you get more thinking time. All that other subconscious daily time – the shower, on a walk, talking to friends – also makes it way available over a long-term horizon. You get to leverage more time in your life, ironically, to finding solutions to your problems – subconscious and conscious.
    3. Reduced burnout probability – studying for concepts intensely typically leads to burnout. Spaced repetition and small units of practice are less mentally-taxing on the individual, so keep persistent is easier.
    4. Higher Success Rate – the more time you have at your disposal, the higher your probability of finding a solution to a problem. You’re also more likely to solve 1-2 problems in a day than solve five-six problems in three hours ( they used to do this in competitive programming, and it’s why I probably thought I was bad at in my first year of college )
    5. Builds up Healthy Studying/Practice Habits – studying and practicing interview skills isn’t something everyone in the world “naturally does” – most of us ( because there are exceptions ) are not born like this ; it takes time build the habits and the muscle memory to get into the groove of practicing problems. I think it’s because I spaced out my practice patterns over the long run that I always found it easy to immediately “get back into the zone”, as they say, and bust out a problem if I needed to practice again for interviews ( or do them for fun ).

    The Drawbacks ( which aren’t much IMHO )

    1. You ( may ) have to study ( a bit ) over weekends or holidays – the approach doesn’t give the most headway for “free days”. But think about the other benefits ( and I’m not asking that much time from you – trust me, I’m just as hedonistic and I want you to enjoy living. ) Also I took my approach to include spaced repetition practice on weekends, but I could’ve stuck and enforced boundaries on Mondays-Fridays only.
    2. You’re going to start thinking more analytically – which is a good thing in the hiding. I mean, the more time you spend doing smart things, the smarter you get ( seems to be the right adage )?
  • PERSONAL – Igniting Passion : What Inspired you to write your posts?

    “What really lights you up” – a good friend 🙂

    That’s an excellent question!

    Reasons and motives are aplenty – building a brand, monetization of secondary passive income streams supplementing active income, and practicing or honing my writing skills and my communication skills ( the written word is harder than the spoken word ). Sure, these are some of my personal motivators. But I have other motivators too – more intrinsic ones.

    The Mentor-Mentee Relationship

    I’ve always held strong mentor figures in my life as admirable – there’s something profoundly amazing about mentor-mentee relationships, where an older master of an experienced domain area teaches a younger student skills and shares their wisdoms and learnings accrued a long their way. Back in my undergraduate days – the days of learning and the days of research – I felt amazed at how passionate, inspirational professors who really knew their craft like the back of their hand could get their students to learn challenging skills. Some were so good, that it felt like the time that I spent with them literally made me “smarter” ( as the dictum goes, iron sharpens iron ). The time I spent with them still makes me think about how I like to approach work and how I like to operate as an engineer – to always be curious and interested, and to make sure to ask good questions.

    To Disseminate my Accumulated Wisdoms

    Presuming they are, that is to say :-P.

    And personally, I want to share and disseminate the knowledge I’ve learnt a long the way to the next generation of engineers. I think in the long-run, writing a blog ( perhaps even a future O’Reilly-esque book or two ) would prove far more personally fulfilling. I’ve spent quite a number of years in my life deep in the dense thickets of software engineering, wearing hats across multiple domains – infrastructure, dev ops, algorithms, systems, product road-mapping, and leadership. I’m personally antagonistic to witnessing the totality of my accumulated knowledge either (a) go to waste or (b) remain forever locked and isolated within a couple of individuals across the team’s I’ve worked on. Provided that anything I share doesn’t leak private or confidential information, I’m open to getting it out into the world.

    A Story of Teaching and of Learning

    There’s also something amazing about teaching – and that’s making someone else in the world bright.

    Let me be a raconteur – a narrator – and tell you a story. The setting is Middle Eastern restaurant back in Menlo Park, California, and it’s a cold, dry night : typically of the Bay area’s weather around that time of year. Fortuity had it that my practice buddy was able to meet up that night for a meal. I met him while I was preparing for my technical interviews – I used to practice mock 1:1 sessions with peers over algoexpert.io and pramp.io ( I made some lifelong friendships too ! ). He was one of the select few who I engaged in consistent back-and-forth 1:1 practice sessions.

    At the end of our meal, he said something along these lines :

    “I learned a lot from you.”

    and hearing that was meaningful. It communicated something to me. It communicated to me that the time I spent in someone else’s life made a positive difference. That I managed to help someone get somewhere in the world ( in this case, an big tech company offer letter ). If someone else in the world tells me “I learned a lot from you”, it tells me that I did my job well. These words are a powerful thing to say to someone else.

  • TOTW/13 – Don’t just return error codes on API methods. Please return thoughtfully-written error messages too.

    “Drill down”

    In the beginning, there’s just ERROR codes

    Hi all,

    I want to write a post inspired by real-life developer work that took place back at Capital One.

    Alright, let’s imagine that junior engineer Andrea needs to write APIs for Enterprise data-layer held objects. She’s been tasked to write APIs so that internal developers on other teams can execute CRUD – create, read, update, and delete – operations on Enterprise objects. The APIs not only enhance scalability, but allow for further processing and verification steps ( e.g. ACLs, rate limiting, resource sharing )

    When she initially wrote the APIs, she did think about error handling, but from the scope of HTTP status codes ( see https://en.wikipedia.org/wiki/List_of_HTTP_status_codes ), such as the 400 bad request error. This error is introduced for cases that backend servers can not process requests due to possible client-side errors ( e.g. badly formatted inputs or deceptive inputs ).

    In the initial version, everything seems dandy and working fine. And for a context of a standalone web application with a single client and a single server, it’s ( mostly ) sufficient to return numeric status codes. But problems arise up in the future.

    Trouble brews months later !

    Alright, so senior engineer Javier has to develop our his team’s application, which operates on the same set of Enterprise objects. This means that his application behaves as a client, and executes Andrea’s API calls, exposed as conventional HTTP /GET, /POST, /PUT, and /DELETE methods. Javier recognizes on client-side development that he needs to introduce error handling paths ( e.g. halting his program or engaging in graceful degradation and capturing bad payloads in a dead letter queue ) in the event of 4xx errors when executing /GET calls.

    But there’s a gotcha.

    He’s running into a problem where he runs into a 4xx issue, but it turns out that the client-side payload is actually well-formed. Everything is correct there. What’s going on? 4xx should be a client-side issue, right? I clearly shouldn’t be failing a request – or event the processing of my events – if things are all and good here.

    Yes … and no

    So it turns out that the payload is correct, BUT, the Enterprise asset the payload operates on is stale. The assets themselves are out-of-date by a couple of months, and underneath the hood, that /GET call executes a SQL query to filter out and retrieve an asset based on unique primary keys.

    And the problem is both the scaffolding – everything around the query – and the SQL query are “technically” correct. It’s actually an issue with the underlying database not being up-to-date ( due to a laundry list of reasons I will avoid delving into 😛 ). Perhaps a Database admin or other user forgot to purge out old, stale records.

    So tell me what the hot-fix is?

    After some back-and-forth conversations, Javier and Andrea recognize that a new version of the APIs are needed in production settings with modifications. The return of the 4xx is correct, but we need more information. We need a short, 150-character tweet-sized textual error response as well ( and maybe even an additional layer of API enum code ). The response and code should reveal more information such as :

    • “Failure code <A> : Did not locate Enterprise objects; client-side payload is valid.”
    • “Failure code <B> : Located Enterprise objects; badly-formed client-side payload.”

    Andrea wants to change the APIs, but changing APIs requires extensive approval and testing. As such, she arranges a meeting not only with Javier, but with his team’s product owners and other senior engineers to generate “buy-in” and get approval to release the same /GET call, but on a version two form that returns more information.

    After this release, Javier project get unblocked. He’s able to return back to his client-side code and adds the error-handling paths, based on a function of not only the 4xx status code, but also the textual error response and additional API enum Failure code. His failure-handling code resembles the following structure

    if(HTTP.STATUS_CODE = 4XX):
         if(HTTP.STATUS_TEXT.FAILURE_CODE == <B> OR HTTP.STATUS_TEXT.STR == "Located Enterprise objects, but ... "):
            logger.LOG_ERROR("Ran into malformed payload - delay processing of events.")
            logger.LOG_FATAL("Shutting down event streaming processing");
            return;
        elif(HTTP.STATUS_TEXT.FAILURE_CODE == <A> OR HTTP.STATUS_TEXT.STR == "Unable to locate Enterprise object; client-side payload is good."):
            logger.LOG_ERROR("Ran into stale asset case with assetId={assetId}. Logging into DLQ {DLQ_id} with name = {DLQ_name} and topic = {DLQ_topic}.")        
            DLQClient.appendToDLQ(DLQ_id, DLQ_name, DLQ_topic, assetId);
            continue;
    
    
    
    

    The real-life story

    In this scenario, I’m senior engineer Javier. I spent a couple of hours deep in debugging systems, triaging the root cause of failed Kafka stream events. I had to determine the cause of failure for events that stalled event stream processing – was attribution due to the staleness of downstream assets ( server-side issues ) or due to malformed inputs ( client-side issues )? In both scenarios, I introduced custom Cloudwatch-level logging – with 90 days of retention – to address the server-side issues ; this enabled enable human-in-the-loop intervention on stale assets. As for client-side issues, I biased towards graceful degradation and halting the event stream; I preferred notifying upstream producers as soon as possible that they sent malformed payloads.

    This logging had repercussions – I worked with a event stream processing +1 million events/second, and to many failing stalls meant that events remained unprocessed and backed up. The probability of stale asset events superseded the probability of malformed client payloads ( I don’t have a specific multiplier by how much, but intuition intimated it ). By changing error handling, I reduced the frequency of halting event stream processing, thus enabling the processing of more customer traffic.

    The Challenges I ran into

    There were a good number of challenges I ran into with the design – let me review them :

    • Generating buy-in and consensus – I had to get other senior engineers, a staff engineer, and my two direct managers to agree on the architecture & design approaches. I set the agendas and led the meetings
    • Communicating unexpected delays and blockers – #TODO
    • Spending time to look into alternatives to pre-empt future issues – #TODO
  • PERSONAL – The Plagiarism Post – If someone has concerns of content being plagiarized, let me know ASAP!

    All views expressed here are solely my own

    Always Give Credit where Credit Is Due!

    ( https://www.youtube.com/watch?v=dPtH2KPuQbs ) 🙂 !

    Also so as to avoid plagiarism, legal claims, litigation, court battles, or accusations that content is unoriginal or not properly attributed, I assert the following statements and the following dictums :

    1. That there are many other folks whose identities may or may not have contributed to ideas presented in these blog posts implicitly or explicitly, and for the purpose of journalistic integrity, I will not disclose their names here ( unless I explicitly ask them for permission that I can put this under disclosure )
    2. All views expressed here are solely my own. I do not intend to impose my views on other.
    3. I do not intend to profess 100% accuracy of information disseminated – if you feel that information is inaccurate, reach out to me and I will seek to correct errata or misinformation.
    4. If you believe – or have evidence – that an article has plagiarism, reach out to me ( using my provided contact information ) and I’ll look to addressing your issues personally. This may entail (a) amending citations (b) amending sources to attribute your name accordingly or (c) removing and moderating content. I do not wish to seek legal altercations – private settlements, third-party arbitration with a neutral party, or even a personal one-on-one session can remove much needed hassles. Nonetheless, please present STRONG evidence to back up and undergird your claims – I want to avoid seeing anything unsubstantiated.
    5. Any traces of posts which may have involved development done in a work environment have all enterprise-specific information removed and redacted ( to their best extent ) . All content is written to be as generic as possible. There is no leakage of sensitive information, PII ( Personally-Identifiable Information ), or SDEs ( sensitive data elements ) .
    6. Names of individuals in stories were chosen so as to represent a crowd to be as diverse as background, coming from multiple backgrounds – ages, genders, ethnic origin, national origin, and every other category under the sun, . They are inspired by people I met in real-life ( because a bank of names is hard to provision, even parents spend a lot of time figuring out how to name their kids 🙂 ).
    7. The set of legal disclaimers written here is last updated as of Monday, April 7th, 2025, and is by no means comprehensive. I’ll update them accordingly.

    Legal Disclaimers

    General Disclaimers:

    References

  • TOTW/1 – Create Wrapper Methods around API Calls – separate out your API logic from your non-API logic.

    An Intro

    Hi all,

    I want to give background on a frequently-encountered situation that I’ve seen across a few software engineering places.

    Alright, what’s the story?

    Junior engineer Kartono needs to quickly code up a feature, which involves the retrieve of GraphQL entites to operate on enterprise assets. He’s developing out a brand new business workflow, and the pseudo-code for his changelists resembles the following structure :

    def execute_business_workflow(...):
         ...
         unorganized_preprocessing_steps
         graphQLClient = makeGraphQLClient(params)
         graphQLResponse = graphQLClient.fetchData(datasetParams)
         unorganized_postprocessing_steps()
         ....

    This is a good start, BUT, senior engineer Quorra notices a couple of changes that can be put in to make the code better. Quorra immediately gets into her mentoring mode , starting with a few observations.

    Quorra’s Observations

    • Logging Posture – There’s a major lack of logging logic. If I have to onboard a new database type and execute a business workflow to verify that it’s working, how do I tell how much positive progress I’m making on the code? Which step am I on? Am I able to get through all the pre-processing steps? Or am I stuck on some odd step in the unorganized post-processing steps?
    • Profiling – What if I need to profile the speed of function execution and methods? If there’s latency issues coming up in calls to execute_business_workflow(), how do I filter and triage the location of performance degradations or failures? What part of that workflow behaves slowly? Is it in the pre-processing steps, the API calls, or the post_processing steps?
    • Single Responsibility Pattern – that graphQLClient really doesn’t belong in the entirety of executing a business workflow? Can I remove that away
    • Evolvability – what if I suddenly change my APIs ( e.g. today we’re executing GraphQL calls, but a couple weeks, we might be executing HTTP calls )

    What does better code resemble ?

    def executeGraphQLCallWrapper(params):
         graphQLClient = makeGraphQLClient(params)
         graphQLResponse = graphQLClient.executePost(params)
         return graphQLResponse
    
    
    def execute_business_workflow(...):
        preprocessing_steps()
        graphQLResponse = executeGraphQLCallWrapper(params)
        postprocessing_steps()

    In this code piece, we’ve (1) introduced functional decomposition and (2) Isolated the entirety of graphQL flow into it’s own wrapper call. We’re going to get a multitude of benefits, such as

    • Faster debug time – I can introduce a couple of log lines at the start and end of each method call and see if I entered or exited the function. I can also introduce logging lines at the granularity of code within those functions, to get a deeper view into which step fails
    • Profiling Latency Delays – suppose invocations of execute_business_workflow() consume 2 seconds, and I need to meet a tight-Enterprise SLA of 1.5 seconds. Which step is slow? Because we introduced functional decomposition, I can quickly put start-stop monotonically-increasing timers in three functions. Let’s suppose this breakdown :
    • preprocessing_steps() [ 0.3 seconds ]
    • postprocessing_steps [0.1 seconds ]
    • executeGraphQLCallWrapper() [ 1.7 seconds ].

    Alright, looks like the graphQL calls are taking up the most cycles – maybe we should throw a cache or other pre-computation structures here?

    What’s my Takeaway?

    Your takeaway is quick – functional decomposition governs good coding practices, and partitioning out the functions that introduce network calls or external dependencies will help you expedite your software development 🙂 !!!

  • TOTW/2 – Playing Detective and Debugging API Calls: Essential Tips for Developers

    But why should I care about this stuff? Isn’t it enough to write good code and assume released APIs work as expected?

    Hi all,

    I want to share a couple of tips-and-tricks up my sleave to help you debug your API calls when you’re testing out your applications.

    Frequently, you’ll run into needing to test API endpoints – typically /GET or /POST calls – for a customer-facing transactional application, for sending or retrieving data from an Enterprise data tier, or for triggering an event/action to an existing system. You’ll typically leverage open-source software or licensed-products such as ThunderClient ( VSCode ), Postman, or Insomnia to create small configurations to test API calls.

    A Street-Fighting Checklist of Good Tips-and-Tricks – the Concise Version

    1. Check VPN Connectivity
    2. Check Environment configurations
    3. Inspect SSL/Certificate Issues
    4. Enable/disable SSL verification
    5. Check API call layers
    6. Check certificates, CA Certificates, and certificate paths
    7. Headers check
    8. Check the parameter string / query correctness
    9. Install and update latest certificate libraries
    10. Install and update the latest langauge SDKs and libraries
    11. Leverage sandbox/isolated environments/containers ( Docker )
    12. Introduce isolation levels
    13. Verify payload correctness and comprehensiveness
    14. Check security information
    15. Set the correct method type
    16. Passwords and Accounts Check
    17. Compare to other calls
    18. Sync with other developers : brainstorm and get ideas
    19. Check certificate environment variables
    20. Leverage CoPilot/AI textual tools
    21. Generate code from Testing Clients ( e.g. Postman )

    A Street-Fighting Checklist of Good Tips-and-Tricks – with explanations

    1. Check VPN Connectivity
    2. Check Environment configurations – e.g. passwords ( DB_PASSWORD ) or tokens values. Check the value is correct for the environment and that the environment is correct ( e.g. DEV, QA, PROD )
    3. Introduce isolation levels
      • Network isolation – test if the issues persist when VPN connectivity is on versus off. Test
      • Client-side versus server-side isolation – if there’s an issue, ask if another developer can execute the same API calls from their machine. If they can execute the call, the issue is server-side – otherwise, there’s still the possibility of client-side issues.
    4. Verify payload correctness and comprehensiveness
      • Oftentimes, payloads can be mal-formatted – key:value pairs are incorrect or not comprehensive enough. Check that payloads are fully correct.
    5. Check security information
      • For AuthN, AuthZ, and other security reasons, API calls usually involve a subset of headers or BEARER tokens. Verify correctness
    6. Inspect SSL/Certificate Issues
      • Issues could be due to the SSL ( Security Sockets Layer ) protocol, used for network encryption. Sometimes, you’ll be able to disable SSL and have request flowing. But other times, you’ll need to update client-side or server-side certificates ( or even CAs : Certificate Authorities ).
    7. Enable/disable SSL verification – setting this parameter to FALSE can enable us to bypass client-side certificate checks or server-side certificate checks. It’s not recommended ( we loose encryption in-transit ).
    8. Adding certificates to library paths – you may need to append server-side certificates on the library path your client-side applications recognize. This may entail appends to your cacert.pem files ( or others ). Can we experiment around and manually specify paths such as certificate_path or ca_certificate_path? Leverage libraries like Python’s certifi.where() method to view where your virtual env/runtime loads certificates.
    9. Underlying Queries – are your queries correct? API calls include queries – either in their parameter string ( following the ? symbol ) or in request body payloads. The call itself can be correct, but the query executed can be incorrect, and you might be misinterpreting a 4xx error incorrectly.
    10. Experiment with isolated environments – if applications don’t run locally, can we run in a more isolated environment? A container can be a good example, where we can set up a barebones minimal configuration. This isolates and helps us detect if misconfigurations arose up on local machines
    11. Check your language’s SDKs and Libraries – maybe you need to upgrade or change the name of a web methods library ( e.g. requests for Python ). See commands like python3 -m pip install <latest_library>
    12. Check API call layers – e.g. say you’re working with a REST client to contact another part of an Enterprise, but before doing so, you first need to connect to said Enterprise. Can we diagnose the underlying connection ( the layers below )? Issues can be upstream versus downstream.
    13. Set the correct method – you’d be surprised how often a dev struggles with an API call because they’re executing a /GET that should be a /POST, or even a /POST that should be a /PUT. This happened with me on an attempt to execute /GET calls to GraphQL which should have actually been a /POST call.
    14. Headers check – There’s a good number of headers – Authorization, Contnet-type, Accept, Content-Encoding, Keep-Alive, Accept-Language – passed on API calls. Make sure they are correct ( even ask if they’re needed – a lot of headers are superfluous and can be deleted ).
    15. Passwords and Accounts Check – passwords are frequently rotated ( typically on a 90-day period ) within organizations.
    16. Compare to other calls – if your current call doesn’t work, test two other calls whose structures share similarity. Like a puzzle, making equivalent changes until your call starts working ( or still doesn’t ). We can build off of what already works, from client-side.
    17. Certificate environment variables – explicitly set the paths to certificates in your code ( or your local environment ).
    18. Sync with other engineers – every engineer brings their different domain knowledge and ways of looking at problems. You might not see what another engineer sees, and vice versa.
    19. Leverage Co-Pilot/AI Tools – we can provide code context on what an issue can be and get (nearRT) feedback within 1-2 minutes of what’s going on. On each iteration, add more debug and logging output – provide more context and help the AI tool output more accurate results.
    20. Client-code generation – manual-developed calls have a high success rate from Postman or Thunderclient. What if we can use their auto-gen coding capabilities to write up quick unit tests?

    More Best Practices

    1. Folder structures – introduce a folder structure to API calls. Start organizing and grouping them. E.g. if you have API calls ( /GET and /POST ) for application one and for application two, introduce two folders with corresponding titles.
    2. Upload your collections – surprisingly, you’ll also run into the scenario where multiple developers on a team use the same set of configurations, so go ahead and create collections and then share them on a central location!!!
    3. Create environments – creating environments ( for DEV/QA/PROD ) and setting env-specific configurations ( e.g. passwords of hostnames ) is useful for quick testing. We can avoid creating duplicate API call tests if we leverage global configs.
  • TOTW/3 –  Boost your efficiency with Solid Documentation Practices

    Documents. Documents. Documents. Show me the documents! – some detective you see in a popular crime TV show

    Hey all!

    I briefly want to go over documentation, and why this matters! I’ve taken my turn across quite a number of big tech organizations, and I’m surprised how little we document. Yeah I get it, documentation feels like flossing your teeth – there’s little technical value to it, because it’s mostly writing. But wait, there is value. There’s value is making is easier for other folks on your team – or across teams – to understand what’s taking place in your code. And there’s value in leaving a cleaner codebase and making it easy for your future maintainers and your future developers to know what they are doing.

    But What the Heck Does Good Documentation Even Resemble? And across different levels?

    You know it when you see it.

    That’s a good question!

    Good documentation should be concise but comprehensive. For example, if I’m designing a system, I should be able to create a brief FRD – Functional Requirements Document – which presents a top-level view spanning a couple of sections : background, non-functional requirements, functional requirements.

    If I am developing a code in a single microservice, per say, I should be able to open a quick ReadMe.md file explaining the feature – the business reasons for why a feature was built, key high-level methods, test cases, and detailed steps for unit test and end-to-end tests.

    At project/platform level, documentation needs less granularity. I need to see a slightly different ReadMe.md file . It should explain the business reasons justifying the project, a list of relevant microservices, a breakdown of data flows and workflows, identified dependencies, and owners/points of contact ( the individuals who develop and maintain the services ).

    The Many Benefits!

    • Authoritative Source of Truth – if a group of developers lacks consensus on what is a team best practice or a piece of the system, they can reference what’s been historically written and approved.
    • Centralization – documentation lends to the pattern of a central source of truth, and it’s easy for other personas to quickly locate pertinent information.
    • Expedited onboarding – the better the documentation, the less time a dev spends in onboarding. This ties in with reduced KT session frequency.
    • Reduced Knowledge Transfer ( KT ) Session Frequency – Devs don’t have to spend as much time learning tribal knowledge and asking other engineers clarifying questions on how a system operates or how to test a system, if the information is already available in a easy-to-access form.
    • Shareability – builds a top centralization. Information can be quickly shared across multiple engineers or personas. We can reduce the number of clicks and web page navigation depth.
  • TOTW/12 – System Designers and Architects, please be considerate to your supporting cast! Don’t just think about the stage actors!

    “I fight for the users!” – Rinzler

    Hi all,

    I want to share huge tips for system design – how to think about building for your actors, your personas, and your users. Because when we built Enterprise-grade applications, we don’t just built for ourselves – we build for end customers, the employees of the industry we build for, third-parties, auditors, etc.. And in these 40 – 50 minutes, knowing end customers we built for majorly influences design evolution.

    What a junior / inexperienced candidate does

    Let’s imagine someone needs to build out a ride-sharing service. Ok, TC – the candidate – will probably think about two personas – matching drivers and riders. They’ll come up with business workflows such as :

    1. Riders can making a booking
    2. Riders can view their history of bookings
    3. Riders can cancel a booking
    4. Drivers can register and earn money on the platform

    It’s a good start, and for the scope of 45 minutes, it’s justified. But what if I had more time or the interviewer ask the question of “Who else on the side might you build out for”? Ok, let’s dive!

    What a more senior candidate does

    A more experienced engineer recognized that there’s multiple parties in an organization.

    Administrators – maintainers of web apps and databases
    Business Analysts – execute ad-hoc analytical queries
    Security specialists – ( who enforce rate-limiting, enterprise-wide policies )
    DevOps Folks and Developers – they deploy changes, typically through CI/CD pipelines.

    In the real-world, we need to think about other parties. And parties can come-and-go or change. In that case, we also need to think about the accommodating plug-and-play functionality and consequentially, microservices. Services which can be built as interfaces an application.

    Alright, so what would their workflows resemble?

    • Admin – a dedicated UI – or a separate portal, on an existing UI – to view all data, including sensitive data elements.
    • Business Analysts/Data Analysts – they need minimal interfaces to retrieve analytical data, such as moving averages, running sums, or aggregations over different timeline granularities ( days, weeks, months ). A good designer would introduce a OLTP->OLAP ETL pipeline with a sequence of OLTP -> Read Service -> Kafka Queue -> Worker Pool write service -> OLAP. Afterwards, they’d talk about (A) Leveraging the OLAP DB’s pre-existing capabilities in the or (B) Build a UI/API to access data records.
    • Business Employees – think of Design a Hotel Reservation System. We need to build not just for customers, but also the staff for hotels or a hotel’s employees, who need to update booking information in some database ( e.g. number of hotels, type of hotels, type of hotel rooms ).
    • Content Creators – this comes up for questions like Design Netflix or Design Youtube, but we can’t just think about flows for the end users ( the viewers ). Turns out there’s complex pipelines and video chunkification that takes place for the creators
    • DevOps – We need to think about spinning up microservices to deploy the latest version of a binary or a ML model. Can I build tools so that developers can build and deploy the latest version of their code? Can we set up a ML model store ( using S3 or similar technologies )?
    • Financial Auditors and Regulators – you’ll see them come up for questions like Design a Stock Exchange or Design a Payments System. Financial ecosystems typically need a ledger-esque, immutable, append-only database showing a history of transactions – credits and debits.

    Good Questions to ask:
    1. Do we have to build UIs, or can we limit exposure to program APIs?
    2. Do we need to add ACLS, security, AuthN, or AuthZ?
    3. Do we need to think about SMS/JWT/other token-based mechanisms?

  • PERSONAL – The Blog’s Purpose

    I created my blog with multiple intentions in mind.

    Firstly – to disseminate technical information, interests, projects, and content to outside readers. Topics and themes encompassed include algorithmic interviewing, system design interviewing, niche architectural topics, useful command-line hacks, and others.

    Secondly – to share the knowledge and the skills that I’ve accumulated over the years working across multiple industries and verticals in software engineering. At some point, I want to see myself mentor and share more about what I know, as well as raise the next generation of technically-savvy engineers to be competent enough to take the world’s challenges.

    Thirdly – to encourage a knowledge sharing world. At some point, I may or may not engage in content monetization on all or a targeted segment of articles, but in its current inception, posts are free content.

    My Blog’s Value Statements

    I am open to any criticism or feedback regarding this blog too. I’m also open to commenters, readers, or viewers who want to share interesting articles or other forms of knowledge.

    I ( If I do plan on enabling commentary ) want to abide by content moderation and keep all commentary positive. Humor is welcome, BUT, I ( or an AI-tool ) will flag comments with negative sentiment and ban users identified as mistreating others based off of religion, national origin, gender, race, sexual orientations, and so on. All consumers of material are to be treated equally and deserve fair and impartial equity.

  • PERSONAL – Senior SWE – harder skills to master

    Tell me a skill that you struggle to learn in your role? What’s been harder for you to master?

    That’s an excellent question!


    I think learning how to communicate to different audiences is a hard skill. It’s hard because a lot of us are well, engineers – we’re so used to communicating to other engineers. It’s a running joke that most of us spend our time “glued to our machines” – we’re deep into code. Not only that, but the majority of weekly meetings usually involve communicating with other engineers. It’s not that a lot of us don’t practice communication, per say, but most of that practice is in the technical domain. And getting to practice to talk to other parties is harder, because well – there’s just not as many opportunities.

    It’s very easy for engineers to understand and to talk to other engineers. But talking to non-engineers – management & leadership, producer owners, stakeholders, customers, and vendors – is harder.

    Engineers are very “detail-oriented” people – we tend to get “granular” to effectively deliver our points. The granularity serves a solid purpose during code reviews or design reviews, where deep, technical explanations are needed to explain the justifications undergirding decision-making or why teams abide by specific best practices. Engineers also bring a lot of deep-thinking excitement, passion, and excitement to their jobs – as a consequence, they tend to start talking “super fast”.

    But when we talk to someone else in an org – someone who isn’t an engineer, someone who “thinks”, but they don’t “think like us” – ell, we need to communicate differently. We need to adopt a different form of communication, where we communicate the right level of context in the clearest, most concise way. There’s the old adage of a two-minute elevator pitch, or even “five minutes or I’m out”; this holds truth – those first few minutes really matter when speaking to someone else.

    Let me share useful tips :

    • Top-Down Level View – focus on presenting a top-down level view – what’s the big picture view. The eye in the sky. The 30K+ foot view. It’s tempting to give all the details right away, but can we save details for later?
    • Concise – Keep your communication concise and present the right context
    • ELI5 – When in doubt, leverage reddit-esque “ELI5” ( Explain-Like-I-Am-Five ) or the Feyman Method.
    • Leverage Additional Communication Modes – Leverage visuals, tables, or diagrams in place of textual form.
    • Speak slower – remind yourself to consciously speak slower
    • Solicit Feedback – ask for feedback or input from others for how to convey or to communicate messages. Join a toastmasters session. An improv class. Or public speaking classes.