‘Consistency’ has about as much sex appeal as preparing one’s tax return. With the likes of machine learning & AI, RPA, DLT all jostling for our attention, little wonder that few around the top table are giving thought to such a dull-sounding concept…
The alluring young woman above might have thought otherwise. Born in the first decade of the last century, the stark reality of two world wars defined her worldview. She grew up in a world that prized stoicism, ‘keep calm and carry on’ and the stiff upper lip. For her, ‘consistency’ was a worthy quality, something to be aspired to…
Who was this young woman? – An early pioneer of women in finance or technology? Sadly not. As a prize-winning scholar, she did not lack for intellect, but sadly she belonged to an era in which women’s education was considered dispensable, and her potential was never realised.
She was my grandmother. For me, she epitomises everything that we commonly associate with the term ‘consistency’, but which has fallen so out of vogue.
It seems a world apart from today’s reality, in which we prize spontaneity and ‘authenticity’ and we consider it perfectly normal for pretty much anyone from hormonal teen to Head of State, to vent uncensured to the world their innermost musings. (Granny must be turning in her grave…)
This is a pity, because in the world of technology, ‘consistency’ has a very specific meaning that I don’t believe that many non-techs (like myself until recently) grasp. In my view the negative associations around this term are holding us back from talking about something that should be high on the agenda for organisations with complex, data-driven operations looking to modernize their tech. Thus with this article I hope to bring back a little ‘allure’ to this maligned term.
So what is ‘consistency’, from a tech standpoint?
Consistency is a property of a system (typically delivered from the database) which requires that whenever data is viewed, it will be accurate. In a consistent system, it is impossible to partially update a collection of changes (usually grouped as a ‘transaction’); the transaction is a contract that guarantees that either all changes are completed, or none at all. (Note that the term “transaction” refers to any chunk of work which collectively creates a coherent unit. A financial transaction would be an example, but there could be many others). The concept of a transaction is very important to data quality; the ability to load up a partial transaction could be at best meaningless and at worst highly misleading. For example, if just one leg of a composite derivative were committed to the database, without the hedging leg, it might cause limits to be erroneously breached.
Getting a consistent view of a relational database on a single server is easy; all the data is in one place. Things get slightly harder when you have data replicated across multiple locations (e.g. disaster recovery). Typically there is a slight delay between the 2 datastores, so things can get out of synch for a brief period.
It gets really tricky in a fully distributed environment, where data is being changed simultaneously across a large number of nodes, data is also replicated, and users or processes may simultaneously be reading that data. There are essentially two approaches to ensuring consistency in a distributed environment:
(1) Pessimistic in which the system needs to request permission to update a specific set of data, and then locks down that data until the changes are complete.
(2) Optimistic in which the system makes changes in parallel, but prior to performing any update, it first reads the current version of the data that is to be changed. When the system then attempts to write the updates, it will check that the record has not changed in the tiny window of time since the first read. If the record is ‘dirty’ (i.e. has a different version), then the updates are aborted. These exceptions may either be handled via users, or using bespoke business logic.
Pessimistic is clearly the simpler approach to implement, but has the disadvantage of creating a bottleneck and potential for deadlocks within an otherwise distributed architecture. This puts limits on scalability, a ceiling on the volume of data the system can handle and leads to a higher volume of operating issues. For this reason, at Cyoda we’ve gone for optimistic through distributed transactions.
It’s also important to differentiate between two distinct types of consistency: ACID (strong) consistency and eventual consistency.
Eventual consistency basically says that, provided no new updates are made, all views on that system will eventually give the same value. In reality, this term is a bit of a misnomer, because the ‘eventual consistency’ will never be achieved if a process fails part way through updating a collection of changes. Eventually consistent systems are ignorant of whether or not they are consistent at any given point in time, so they simply can’t be relied upon where pinpoint accuracy is needed. However, for many applications, eventual consistency is quite sufficient. For example, maybe you’ve noticed you’ve had a new Twitter follower, but the follower count hasn’t ticked up? No big deal – you know the chances are that it’ll sort itself out in time, and you don’t really need to know when that happens.
But when you want to report on financial transactions, you typically want known, strong consistency. This means that consistency is either immediate, or available within a miniscule time delay with the last consistent view known, enabling you to report and analyse in human real-time.
For example, as a retail customer, if you make a transfer between accounts, you don’t want your funds to seemingly disappear into thin air as the first account is debited, with no sign of the funds in the second account. Within investment banks, the stakes are far higher, spanning huge breadth of operational requirements. (Limit checking, intra-day reporting, KYC….).
For a particularly disturbing example, consider this:
13.07:35 Notification received on new US sanction on Jon Smith Enterprises
13.07:45 $50m drawdown on Jon Smith Enterprise’s $75m facility
13:08:01 Update filters through to loans system & blocks Jon Smith Enterprises’ account.
Fancy explaining that one to the US authorities?
So why don’t we just make sure everything is immediately consistent?
‘Everything’ is a big word.
As we approach the 2020s, the lion’s share of systems in banks today are still based on the relational database, a technology of the 1980s. The reason why this technology has been so sticky is precisely because it does consistency extremely well (offering gold standard ‘ACID’ consistency).
But relational databases have serious drawbacks in terms of scalability and performance – which is why a separate one is needed for pretty much every requirement – creating a massive proliferation of systems.
Whilst each individual system is consistent, there are no such guarantees across a bank’s many systems. In a typical bank, there are 100s or even 1000s of systems, with data flowing between them, stored in multiple locations. Often data amendments don’t flow seamlessly from source; updates are often partial, and there may be failures or interruptions in these flows. Different systems also accept data in different ways, meaning data has to be manipulated along the way, introducing further discrepancies.
Back in the ‘80s this lack of consistency across a systems architecture wasn’t too much of a problem; data volumes and complexity were far lower, and businesses were largely run as independent silos. In today’s heavily regulated post-GFC world, that approach is no longer viable. Banks want to know their business in near real-time, with timely views of their positions, risks and liquidity at both firm-wide and granular levels.
Does data inconsistency really matter? – What’s the impact?
The lack of cross-system consistency has several consequences:
Firstly, banks need to check the data across their disparate systems for discrepancies. Given the 100s (or more typically 1000s) of systems in a typical bank, this creates an entire industry of reconciliations. And whilst the checks may be automated, much of the exception handling remains manual….A big operating overhead.
These checks are typically end-of-day at best; it’s near impossible to reconcile independent systems that are not time-series databases on a near real-time basis.
As a result, there is an acceptance that during the course of the trading day, disparate systems will be marginally out of line. Does this matter? – well yes – rather a lot. The bank will need to hold significant buffers of liquidity to ensure there is no chance that they may be in breach of their regulatory requirements. For large European banks, the annual cost of this buffer can run into the 100s of £million or even £billions.
Add to this the fact that inconsistency problems tend to go hand in hand with system proliferation. The more systems in place, the greater the maintenance overhead, the more interfaces to be kept up-to-date, the more vendors to be managed – all leading to a mushrooming of cost, complexity and risk.
But what about ‘Big Data’? Doesn’t that handle scale & complexity?
Sadly the remarkable innovations in terms of scalability and performance brought by the big data revolution were conceived to solve a very different problem. These technologies are essentially about harnessing the hidden meaning buried in extremely large datasets in order to create business value. (Hence the term, ‘big data analytics’). They are concerned with the quality of the data in aggregate, rather the accuracy of each individual piece of data. They are eventually consistent, because that is perfectly adequate for their intended purpose. Hence, within banks these technologies are predominantly found in applications such as algorithmic trading, understanding customers’ buying patterns, market trends etc, whilst core operations remain dominated by the relational database.
There have been attempts to adapt ‘big data’ technologies for use cases that require consistency, but these have had limited success. The resultant architectures have been slow, complex and failed to yield the anticipated benefits. For example, several major banks attempted this using Hadoop a few years ago. Most have been abandoned or de-scoped after investing 100s of £m
What’s the solution?
There’s no quick fix to the lack of firm-wide consistency as it’s ultimately a consequence of legacy architectures. Nevertheless, every major new or replacement system offers an opportunity to take a step in the right direction. Factors to consider include:
- Reduce data duplication: Whenever the same data is stored in more than one place, there is an opportunity for data inconsistency. Ideally all data would be stored, processed and reported upon, from a single location.
- Extensibility/Flexibility: Could the chosen technology be adapted to deliver more than you need right now? – If so, then it may enable you over time to reduce the total number of systems and consequently the consistency problem.
- Scalability: Is the amount of data that can be stored, processed on the technology unlimited? – This is another characteristic which will enable a reduction in the total number of systems over time.
- Time series: Does the chosen system store data as an immutable series of facts? i.e. updates are stored sequentially, with a timestamp? As more systems become time series-based, it will become feasible to call data ‘as at’ a specific millisecond point-in-time, reducing synchronisation issues.
- Report accurately on live changing data: Some distributed technologies offer ACID consistency, but don’t enable large-scale, accurate reporting on that live changing data. To run reports, the data needs to be exported to separate reporting system…Ergo a new opportunity for inconsistency!
Of course the reality is that all technology decisions are inevitably trade-offs; most technologies offer some but not all of the above. But this is beginning to change. At Cyoda, we believe it’s critical to have all the above, and it’s something we’ve built into our proprietary technology. But there are open source options too, such as FoundationDB.
Once firms start using technology with these attributes for new systems, and gradually extend their scope and scale, they will ultimately find they have a single, consistent, timely, firm-wide view of their data – opening the door to jaw-dropping savings in both operations and capital costs.
(A quick aside – It’s worth noting that the budget holder for a new system may have little direct gain from the long-term goal of firm-wide consistency – so there needs to be strong encouragement from senior levels of the organisation to make this happen)
A quick aside on microservices
Microservices is very much on-trend within the tech world. It’s an approach to architecture design in which requirements are broken down into small, independent services, which are then developed along the agile methodology. The approach confers significant benefits: faster time-to-market, flexibility/agility, re-usability – and is often hailed as the successor to 20th century ‘monolith’ architectures which are complex, opaque and hard to change.
The microservices approach has undoubtedly proven transformational in many contexts – but I have some doubts about its potential in banking. There are a few reasons for this which I’ll cover in a future article, but the main point to raise here is that by separating individual small services, each with independent data stores, consistency becomes extremely tricky – if not impossible.
At Cyoda, the approach we’ve taken borrows from microservices, in the sense that we develop small services along agile principles, but advocate running those services from a single, consistent, distributed datastore, ensuring consistent outputs across the entire architecture.
To conclude:
My hypothesis is that ‘consistency’ is a term that does not receive the attention that it should, because whilst it’s an exceptionally important concept in the world of technology and banking, it has fallen out of vogue as a human quality.
In any industry, such as banking, which lives and breathes on the accuracy of complex, inter-related operational data, consistency should be a paramount concern. Inconsistent data means bad decisions, mis-allocation of resource and value left on the table. In today’s always-on, 24/7, data-driven world, the opportunity cost of inconsistency is astoundingly high.
As we approach 2020, it’s no longer a pipe-dream to envisage enterprise-wide consistency for large, global institutions. The technology is out there. Today.
The challenge that remains is how do we get people talking about ‘consistency’? – Is it time for a re-brand?
I’d love to hear your thoughts below.
Post-script: And what of Granny, my epitome of consistency?
Well after a brief rebellion in early adulthood (she eloped to marry into a bohemian family led by a matriarch who had the audacity to pursue a career through motherhood; an achievement I thoroughly applaud but find hard to picture from the Granny I knew), she settled into a very consistent and measured existence. She lived to 96, raised three children, all girls, all born in October, each 4 years apart… Remarkable consistency in an era that long pre-dated modern family planning!