Two recent additions to the field of data analytics have been the concepts of data fabric and data mesh. At first glance, one might mistake the two for synonyms. They both involve data, and the second word in each is interchangeable in another context. However, they refer to two distinct concepts, and together, they represent the future of data architecture.
Before I distinguish these two concepts, I should set some context. Since the beginning of digitisation, companies have sought out the most efficient ways to store data. This has led to a variety of methods being tried and discarded over the course of twenty years, from databases and warehouses to data lakes and cloud stores. Businesses still use many of these monolithic storage methods, in varying capacities.
No matter their chosen data storage method though, collating all of a company’s data in one location is an enormous challenge, and it is bound to fail. Even if the sheer volume of information is manageable, the process of transferring that data from its point of origin to a central repository, and then extracting a relevant piece of data when required, are time-consuming, resource-heavy, and expensive. Data fabric and data mesh offer solutions to this problem.
The Alternative to the Centralised Approach
One approach to the hurdles posed by data centralisation is to provide a pathway to decentralisation. Logical data integration connects users to decentralised data by leaving all gathered data at its source and enabling users to access it virtually, without the need for physical data replication. Through this architecture model, data users are freed from the limitations posed by data that has been gathered in one enormous repository, and are instead able to view what they require, as needed, through a virtual connection.
Data accessed in this way leaves the source data untouched and enables individual information silos to be maintained at manageable levels. This decentralised approach offers the best of both worlds – all the benefits of an enormous, central store of data coupled with the nimbleness and rapid response of data housed in smaller individual repositories.
Logical data fabric and data mesh are both representative of this approach, relying on progressive, distributed models that connect a series of data silos, rather than attempting to gather everything in one location. The difference lies in the way they approach this task.
Examining Data Fabric
Data fabric is made up of data drawn from a variety of locations, and consisting of various formats and types. The one commonality is that the data remains physically integrated through traditional replication. A logical data fabric, however, replaces physical data integration with logical data integration, in the form of data virtualization. Logical data fabric is a modern data integration method that offers access and visibility to various data sources on demand, without requiring the data to be moved.
Logical data fabrics are structured to integrate data from various sources by serving as a conduit, without actually carrying the source data itself. This is achieved by maintaining a minimal amount of metadata, with details on the data storage location, access requests, viewing authorisation, and so on. This streamlines data access and approvals and offers a detailed overview of all gathered data. With centralized access to metadata across an organization, data virtualization provides businesses with a built-in security and data governance framework. These capabilities can be augmented by artificial intelligence (AI) and machine learning (ML).
Another advantage offered by logical data fabric is the ability to append business-specific information to gathered data without changing the core information. This empowers businesses to create custom-made data stores for specific projects and requirements, or to experiment with new models without changing the source data. The ability to access data regardless of storage location also means that data can be accessed even when it is being migrated to a new location. The end result is a unified system that is faster to access and easier to use.
The Structure of a Data Mesh
While a logical data fabric is a technological solution, data mesh is a structure that helps businesses organise their data, employees, processes, and tasks in one unified configuration. However, a logical approach to data integration can also be applied to a data mesh.
First conceptualised by Zhamak Dehghani, director of emerging technologies at Thought works, in 2018, the data mesh approach to decentralisation was created to address the issues that arise from giving control of enterprise data to the IT department of an organisation.
The primary drawback of this approach is that all data-related queries from other departments are routed through a single point, causing a delay in access requests. This problem only grows in scale with the size of a company, especially in a multinational firm with tens of departments and thousands of employees.
A data mesh addresses this obstacle through the creation of multiple ‘domains. These are individual organisational units at the individual departmental level, responsible for the data of one organizational group or department. Each decentralised unit is tied together as part of a larger collective data network. Stakeholders within each individual domain are capable of creating their own data ‘products’, which can be sent organisation-wide, with information such as customer data, financial transactions, and employee information. The only element of a data mesh that remains centralised is a provisioning and governance function, to oversee the interoperability and compatibility of data products.
Similar to logical data fabric, data virtualization can play a critical role in the implementation of a data mesh by providing an abstracted data access layer that unites the various data sources. Data virtualization maintains a separate layer of access and business metadata, which enables company-wide data governance across the various domains.
Ultimately, logical data fabric offers a model through which data can be seamlessly integrated, and data mesh provides a business-wide organisational model. Each addresses different issues and provides unique solutions to them. Companies that use both can enjoy a significant improvement in overall operating efficiency. Organisations eager to future-proof their data infrastructures would do well to consider these models closely in the years to come.