harisrid Tech News

Spanning across many domains – data, systems, algorithms, and personal

SYSTEMS – Testing Strategies Used For Large Volume Financial Networks

A Primer

Hi all,

I want to briefly talk about my work back at S.W.I.F.T. srcl – the Society for Worldwide International Financial Transfers ; the secure bank-to-bank organization that undergirds and enables large volume, cross-border FoReX ( foreign exchange ) transactions and settlements. It was both my first job out of college and a place of tremendous learning and growth.

My first task set involved a more Q-&-A ( Quality & Automation ) type of testing work; I had to verify that existing and upcoming financial systems, in production, met end customer SLAs1 and SLIs2. Doing so entailed test execution for two payload types – messages ( the smaller ) and files ( the larger ).

In its production setting, the application processes ( an estimated3 ) one million + messages per day – testing at this scale is difficult. So what do we do instead? And due to strict requirements – 100% comprehensive, correctness, and determinism – in processing financial payloads, there’s no way to execute single one-off tests to justify that “things work”.

Identify what’s at our disposal

So what existing industry practices can we leverage? What can we do? Do we have tools for testing? Can we get creative and deviate from typical testing norms? There’s elements of both human intuition and machine-based verification at play.

Luckily, we had two types of pre-production environments available:

  1. A local dev environment ( at the individual level )
  2. A shared benchmark environment ( at an organization level ) with the capability of execute closer-to-large-volume tests ( around 100,000 – close to 10% – of production use cases ).

And a group of at least four developers on a team. Let’s get to strategizing!

The Testing Strategy

a. Partition test types ( step #0 ) – execute tests across each payload – messages and files, across financial environments ( e_1,…,e_n); individual developers can test payloads in isolation.
b. Local tests ( step #1 ) – Individual developers execute one-off, visual tests of 5-10 payloads on their local machines.
c. Benchmark tests ( step #2 ) – ( At least two )4 separate developers execute benchmark environment tests.
d. Deploy to production ( step #3 ) – Once a payload passes the benchmark test, obtain approvals from approving parties and production-ize.

By setting up staged testing across multiple environments, we’re able to assert with confidence that a feature works as expected.

But there’s a couple of caveats – it’s not the most “ideal” testing strategy. There are issues, and future testing could’ve been expedited. Let’s dig deep and look at what’s going on.

My Learnings – What slows testing? How can we speed testing?

a. Using a common shared environment – the benchmark environment is more compute intensive, so it’s a shared environment across engineers ; one engineer can use it for a hour-long test from 8:00 a.m. – 10:00 a.m., but this means that other engineers can not use it. Hence, less available time for testing – code releases are set to the cadence of benchmark environment availability.
A Solution : Can we leverage pre-production sandbox, isolated environments? A production environment is always limited to a single, commony shared environment, BUT non-production environments are customizable and under our control. Hence, they are faster to execute tests in.
b. Limited testing time windows – if tests can be executed during certain hours ( e.g. 9:00 a.m. – 5:00 p.m. ), this means only eight hours of availability. Now for legal & compliance ( or external factors ), the windows have to be limited.
A Solution : Find workarounds and fixes – temporary or permanent – to extend the windows.

Footnotes

  1. SLA – Service Level Agreement ↩︎
  2. SLI – Service Level Indicator ↩︎
  3. I never got the exact figures at the time ( most likely due to proprietary reasons OR legal & compliance reasons ). ↩︎
  4. The rule of two – a benchmark environment can experience intermittent failures – a second piece of verification lends stronger credibility ↩︎
Posted in

Leave a comment