Adoption of AI is a top priority for business.  According to Accenture, AI could increase business productivity by 40 percent.

Why, then, do so many AI projects stall or fail? 

The challenges derive from the volume, velocity and variety of data. Estimates are that 80% of data is dark (value unknown), redundant, obsolete or trivial. It’s also widely accepted that AI model builders typically spend at least 80% of their time preparing the data: translating it from different formats, reconciling across different sources, cleaning, interpreting and more…In other words, the ability to get the right data into AI models is limiting the value that firms are getting from their AI investments.

Until recently, firms’ main response to the data challenge was to throw people at the problem – but that’s changing. Emerging data technologies are now capable of significantly reducing the data challenge associated with implementing AI. 


the data sourcing conundrum 

An AI engine will need to source many different types of data from a myriad of internal and external sources – most of which will be in completely different formats. Financial data is typically complex, with convoluted logic often required to extract precise subsets needed for a specific purpose.  To meaningfully consolidate disparate data, it must be well defined, understood and synchronised. 

Firms typically choose to use one of two approaches. Either they employ modern, big data analytics technologies – which offer scalability and are designed for fast, agile development projects, but are built for analytical rather than transactional workloads. This means they lack the data consistency which is critical to many financial applications (in which data is meaningless if ‘out-of-synch’). Alternatively firms adopt a more traditional relational database, which guarantees data consistency, but typically requires a fixed central data model to which all required data must be translated via ETLs (Exchange Transform, Load). This approach is slow to build, inflexible and cannot handle the pace of change that typifies AI projects.

Even when users are able to find and bring together the required data, they may struggle to make business sense of it. Legacy data stores may be opaque, undocumented and difficult to interpret.

A further challenge is that many systems, (except market data) are current state only, with updates overriding previous information. This means it can be virtually impossible to perform historical queries, or run ‘what if’ scenarios on real-life data.


Solution: semantic tools and virtual data models

Imagine enhancing your platform to semi-automate both the access and interpretation of raw data. Model builders could deliver prototypes much faster, without needing to understand the complexity of the underlying disparate formats and ETL processes. More time modelling, less on data prep.   

This is now possible thanks to two exciting innovations.  Firstly, semantic technologies such as those offered by Model Drivers allow firms to build the tools or ‘digital twin’ that will automate much of the data modelling and interpretation, saving vast amounts of manual effort. 

Secondly, ‘schema-on-read’ data consolidation enables firms to efficiently harness data from different sources with diverse formats, dispensing with onerous ‘ETLs’.  For example, Cyoda’s Virtual Data Models may be assembled for each use case and iterated rapidly, with no inter-dependency on other projects. Aliases then enable users to work with simple business terms to prepare inputs to their AI models, without worrying about the complexity of the underlying raw data. Paired with a modern distributed, yet consistent database,  this approach offers the scalability, speed and flexibility that AI demands, without the compromise of inconsistent data.


Data mastery: accelerating AI

With these approaches, you will save time and money; most importantly, your data scientists will not be distracted from the AI mission.


To Learn More: 

  • Watch our interview with two data experts, Greg Soulsby of Model Drivers and Patrick Stanton from Cyoda, discussing the AI data challenges and the emerging technological solutions: