We live in a fast-paced world, with information available at our fingertips all the time. Not that long ago, the world was a very different place. Have you ever considered what happened to the office typing pool? Even as recently as the 1990’s, most written work was done via typists. But as technology evolved and people’s need for instant answers pushed forward, so did the downturn in the need for office typing pools.
Unfortunately, despite technological advances, data is not always available to all stakeholders when they need it, as it relates to enterprise data access. In modern enterprises, there are a broad set of users with varying levels of skill sets, who take data-driven decisions on a daily basis. To truly empower these users through democratization of data not only means data at their fingertips through a set of applications, but also should mean better collaboration among peers and stakeholders for data sharing and data recommendation, metadata activation for better data search and discovery, and securely providing the right kind of data access to the right set of individuals. Deploying an enterprise wide data infrastructure with legacy technologies such as ETL, to serve that broad set of users, is proving to be costly, slow to deploy, resource intensive, and lacks the ability to provide data access in real-time. It does not stop there, because constant replication of data puts companies at risk of very costly compliance issues related to sensitive and private data such as PII.
However logical data fabric architecture holds the promise of seamless access to data, enabling democratization of the data landscape.
Most of today’s organizations are spreading their operations worldwide, not only to grab worldwide market share but also for cost advantage, local talent pool and many other advantages. We already know that. But in the process, their enterprise data gets distributed across various locations across the globe, across cloud and on-premises, and that creates obstacles for seamless real-time data access for business users. When organizations adopt a logical data fabric architecture, they create an environment in which data access and data sharing is easy, and business users can access data with minimal IT involvement. A logical data fabric, if properly constructed, also provides the necessary security and data governance in a centralized fashion. For this reason, the provision of the data virtualization layer and the adoption of the logical data fabric architecture are the foundation for the democratization of data.
Let’s take a look at a few very important capabilities and characteristics of a logical data fabric that are critical for data democratization, keeping in mind the limitations of a physical data fabric, wherever applicable.
- Augmentation of information and better collaboration using active metadata – Marketplaces are important for users to find what they need in a self-service manner. Think about both Amazon and Netflix user experience. You not only are able to find what you are looking for, but you get recommendations for what you may like or which items have better pairing with which other items. That’s where enterprise data access related user experience is moving towards, with a logical data fabric in place. Because a logical data fabric is built on the foundation of data virtualization, access to all kinds of metadata and activation of metadata based machine learning is easier to deploy compared to a physical data fabric. In a single platform logical data fabric, the data catalogue is tightly integrated with the underlying data delivery layer. That helps a broad set of users in fast data discovery and exploration process. With a logical data fabric in place, business stewards can create a catalogue of business views based on metadata, classify them according to business categories, and assign them tags for easy access. A logical data fabric with enhanced collaboration features can help users to be able to endorse datasets or register comments or warnings about datasets. That helps users to further contextualize dataset usage and better understand how their peers experience them.
- Seamless data integration in a hybrid or multi-cloud environment – These days It is very easy to find many organizations, specifically the large companies that have their data spread across multiple clouds and on-premises. This may happen because of mergers and acquisitions, various functional departments cherry picking cloud providers based on certain services or applications, or just because they want to avoid vendor lock-in. Whatever the reason may be, there are many business stakeholders such as LoB Owners or Executives or even BI Analysts who need data that is spread across multiple geographic locations and multiple clouds. Only a logical data fabric can provide an enterprise wide view of data without an iota of data replication. A physical data fabric is unable to synchronize two or more systems in real time. In contrast, a logical data fabric is able to access the data from multiple systems, spread across multiple clouds and on-premises locations and integrate the data in real time in a way that is transparent to the user. <A line or two on semantics>
- Broader and better support for advanced analytics and data science use cases – Data scientists and advanced analytics teams like data lakes as their playground. The latest trend around data lakehouse is to make sure IT teams can support their BI analysts or line of business users as well as data scientists. But there are some inherent limitations to lakehouses – It still means a lot of data replication, it involves exorbitant egress charges to pull data out of lakehouses, it is impractical to assume one physical data lakehouse can hold the entire enterprise wide data and the list goes on. Because a logical data fabric enables seamless access to a wide variety of data sources and seamless connectivity to a wide variety of consuming applications, data scientists can work with a much wider variety of models and tools, allowing each to work with the tools with which they are most familiar. A logical data fabric can enable data scientists to work with quick iterations of data models and fine tune those data models that better support their efforts. It also allows them to focus less on the data collection, preparation, and transformation because this, too, can be handled by the logical data fabric itself.
While these are some of the most important considerations for deploying a logical data fabric, there are many other reasons. For example, physical data fabrics cannot handle real-time integration of streaming data with data at rest for data consumers. As it relates to data security, governance and compliance, physical data fabric makes enterprise data prone to non-compliance with rules such as GDPR and UK Data Protection Act, for example. Data security rules cannot be centralized in case of a physical data fabric. WIth all these considerations in mind, many Fortune 500 and Fortune 1000 companies are deploying logical data fabric with data virtualization as the data integration approach, to make data available and self-serviceable for all their data consumers and data stakeholders.
Insighful sharing Ananth, thanks.