Enterprises today are decision factories like never before(to paraphrase author Roger L Martin). Decision-making requires us to have access to data and “the more the merrier” seems to be the case in these days of big data and artificial intelligence.
Traditionally, data has been managed in silos and with structure imposed on the data that often takes away the usefulness. Sometimes, enterprises don’t even know that data of a specific type exists within the enterprise and what value it has. While structure is not without its merits, most often the structure is imposed quasi-arbitrarily and driven by the limitations of the technologies in use or the capabilities of the people managing the data, not necessarily from the perspective of the users.
Several factors are driving change today. First, there is access to huge amounts of data. Second, the data available are of multiple types and formats, including time series, discrete data and everything else in between. Third, structuring of the data makes it lose a lot of its value. Fourth, unidimensional views of data are useless. And fifth, data is used for understanding behavior and prediction purposes, for which it helps to look at all available types of data.
Of course, we live in the era of big data. Then there is the shift in the way data is being crunched and analyzed due to the rise of artificial intelligence(AI) tools and models, helping to extract much more value from data. AI is expanding what data is considered and removes the need to consider linearity or one specific dimension of customer data, for example. AI enables everything to be connected and correlated in multiple ways, freeing data from a unidimensional view. All this means that in the era of big data and AI, enterprises will need new approaches to managing data and making sure that values can be extracted from the data. Enterprises are increasingly turning to data lake strategies to use, manage and store data effectively.
Data lakes seem to be the panacea to all data problems because they allow different types of data, e.g., video, audio, logs, texts, social media, sensor data, and documents, to be stored and handled without too much artificial structure being enforced on the data. It may be possible to store customer data including name, contact details, addresses, preferred modes of purchase, time of purchase, social media behavior, click stream data, likes and dislikes, and anything else required without worrying too much about table structures, primary keys, foreign keys and the like. It may be possible to make sense of data using tags and with Artificial Intelligence(AI) in the picture, this information can be applied to incoming data without too much loss due to improper validation.
Data is exceedingly important in this era of AI, and this data must be always accessible to the AI models and algorithms so that learning and other sense-making of the data can proceed without being affected by human fallibilities and errors. It has been proven that data lakes can support learning models better, and with the simple application of random separation of data into test and non-test data, it’s possible to teach these models and then unleash them on other data to evaluate if results are in line with what is seen in the real world.
But the true power of data lake strategies comes in handy in cases where AI systems are used for unearthing previously unseen patterns and flows in the data, which is only possible when the system has access to all types of data without the constraint of a human lens that may skew the insights that emerge.
Especially where AI-enabled systems are used, data lakes can gather and store raw data in its original form, which can be used for supervised and unsupervised learning. Depending on the architecture of the data lake, it can store data without the need for a predefined schema enabling the retention of original attributes of data, which can be very useful for AI-enabled systems, especially when used for learning purposes. Various correlations and connections between data can be identified or explored, leading to potentially unexpected insights. The advantage of such storage can be the ability to use queries during the Extract-Load-Transform (ELT) process, which can be very illuminating because the data retains its original attributes.
Using a well-architected data lake can be helpful in multiple ways, including:
- Enabling reuse of data
- End-to-end data management
- Optimized ingesting of data
- Generation of business-relevant insights
- Access to the right data at the right time
- Scaling to meet business needs, especially when combined with the cloud.
All this and more are possible with the appropriate integration of data lake strategies, especially in AI-enabled systems to support enterprise decision-making.
The authored of the article is R. V Raghu, Isaca Ambassador.