Manosiz Bhattacharyya, Chief Technology Officer, Nutanix.

Enterprise data, particularly unstructured data, is exploding. Gartner estimates, as cited by LUMIQ, that “unstructured data represents an estimated 80 to 90 percent of all new enterprise data and is growing three times faster than structured data.”

Furthermore, humans and applications generate this unstructured data everywhere, from remote edges to the public cloud. More than humans, applications are increasingly consuming and generating data at an astounding pace. To manage this explosion, we need a global data management solution to manage, secure and analyze this data and gain deep insights that are critical to our businesses.

In our current state, and I say this as an enterprise data owner, we have no global visibility. Our data is siloed, leading to all sorts of problems, from setting security policies that pervade multiple data systems to figuring out where secured data is placed to data replication requests across on-premises and the cloud.

In many situations, we have manual ways to manage this data or segregated policies to define these complex policies. Critical policies like access control, data placement, retention and replication are managed through tools that work independently of each other. We must bring this together and simplify.

Addressing Data Visualization And Management With Tags

To manage this data deluge, we need a central place (a proverbial single pane of glass) to visualize and manage all our data. The public clouds have addressed this problem and have provided a solution through object storage, where most of their unstructured data is kept.

Just like what public clouds are doing across its various regions, what we want is to create a data management system that could do this across multiple clouds—from the private edge and core data centers to the public cloud and support formats. This data management system would let us manage massive volumes of data through intent instead of through action. Instead of doing all this work for every piece of data, we could just tag the data, and the system would apply our policies automatically. That would indeed be data management nirvana!

However, the current enterprise does not work like that. We have to manually set up policies on object buckets or file shares and then ensure that the correct type of file or object lands in the proper share or bucket. If a critical object or a file accidentally lands in a share without right protections, we have a security hole. This manual process is ridden with security holes and manual mistakes. We need to streamline this.

Automatically Generating Tag Types

Managing data through tags can give tremendous flexibility to the enterprise data owner. Imagine organizing your data on a flat structure. When your system ingests a file, you analyze it and perform some compute to know what type of file it is and what type of information it contains. Then, you could automatically tag it and have the right policy kick in.

This is why we need to bring automation to global data management. At the end of the day, you don’t really care whether a piece of data is in a particular bucket or a particular share. You care about the data type and the data’s access policy, placement policy, retention policy and so on. We must move toward a world where all these different policies are part of the data itself.

The public clouds have created mechanisms to provide compute through lambdas or functions where, on ingestion or update of an object, a custom compute can examine the data, figure out the type of data and set the right tags on it. We need to provide similar mechanisms for all enterprise data in all its locations. With a global data manager to create policies based on tags and functions (lambdas) that can create tags, we can create a geo-distributed data management system that can scale enormously.

Where Does AI Fit In

In the traditional use cases, AI has been an excellent classifier. It can find patterns in your data that are not visible through normal rule-based classification algorithms. For a long time, we have used AI just for that—and in our use case, we can use complex AI to tag data. The question remains: How do we generate tags for these models?

AI models fundamentally summarize all the data they have been trained on. In fact, if a model has seen a data, there is no way for it to forget it. The model needs the same security policies as the data on which it has been trained.

This is where the tagging can help. Instead of remembering all the data that a model has seen, if we could retain the composite set of tags associated with the data it has seen, we can provide the same policy controls to the model as a conservative composition of these tags. Once your AI model has seen your data, however, it needs to be governed by the same policy, which is the union of all the policies on which you trained the AI model.

Final Thoughts

As the security and privacy of all this data are only becoming more important, we must shift from data systems that define policies based on storage constructs to data policies and access policies based on data types.

We could do global data tagging in two ways. First, move all the data to a central compute domain (like the public cloud) and analyze and categorize it there—or second, move the associated compute close to where the data is and categorize it there. We should always prefer the second, as data has gravity and is very expensive to move, while compute is orders of magnitude lighter and less complex to move. In order to do this correctly, you need a platform that can run sophisticated compute uniformly from the edge to the public cloud.

Thus, if you want to solve your global data management problem, you need a platform that doesn’t offer just data or just compute, but data and compute.

Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Share.

Leave A Reply

Exit mobile version