AI penetration is deepening. We’re starting to put AI-fuelled advancements in the hands of everyone from nurses to construction engineers. These are advancements based on deep learning mechanisms that draw upon pattern recognition, massively complex extended algorithmic logic and custom-aligned specialization intelligence stemming from techniques such as retrieval augmented generation.
When a new part rolls down a production line today, manufacturers use imaging devices to scan the interior of components, then run these data files through AI models for analysis. Better products and services are emerging every day as a result of the data analytics we apply across industries and the avenues we identify for AI implementations to make parts, processes, people and products work better.
These systems discover impossible-to-find flaws in real-time, improving product quality and reducing costs. The benefits are crystal clear. Yet enterprise deployment of AI is rarely this straightforward.
Why AI Isn’t Magic
There is an expectation, given the tremendous capabilities of consumer-facing AI tools, that all we need to do as a company is add a bit of corporate data to an AI solution and the result will be magic. In reality, AI enterprise deployment is a very complex data engineering challenge points out Jim Liddle, chief innovation officer for data intelligence and AI at cloud file service platform company Nasuni. So what’s the big challenge here?
“One of the largest and least understood obstacles to successful AI deployment is ‘file data’ related to working applications inside any given enterprise IT stack. Structured data is familiar territory for most organizations, but unstructured [file] data such as documents, images, videos and other files account for 90% of all data generated,” explained Liddle. “This is the raw material that organizations want AI tools to work. So how is it done?”
How should organizations go about working with this level of data if it is to be of productive use for AI engine consumption?
Synthesize & Curate… Then Make
First, we need to remember that data engineers need to source this unstructured data and find out where it lives… whether it resides on local storage devices within an office premises, in the cloud, or if it is spread across various software platforms. Next, the same data engineering team needs to “synthesize and curate” these files.
“This is where the deep data engineering work of exploring, cleansing, normalizing and organizing data in a scalable, repeatable fashion becomes critical,” clarified Liddle.
“Once a business has a grasp on the data engineering job, the next major question is which AI tool or type of model they should deploy. This depends on the particular needs of an organization and there are associated concerns which the data engineering team might not usually need to prioritize.”
Onward from this point, the data team needs to look at whether there is any presence of “latent bias” in the data being used (or built into the foundational model of your chosen AI provider) and this checking mechanism runs in line with the fact that data privacy and security must be evaluated in the context of the business plans to use an AI model with its data.
Questions to ask include:
- Will it the AI model be executed strictly be in-house?
- Will it embed an AI-driven tools inside a product?
The latter lower point is important because a firm will need to consider the regulatory landscape and adhere to the rules governing the territories in which it operates. A business can easily fall foul of some of the new regulations that call for companies to disclose to their customers if they are using AI in their product set.
Validation Break Vs. AI for AI’s Sake
“Last but not least, an organization will likely be asked to detail (or at least attempt to project) the business value associated with its data engineering efforts. These projects are not going to be AI for AI’s sake. The executive leadership board or team will want to see how every data engineering and AI implementation plan will drive revenue, reduce costs, or both. With something like the manufacturing use case described above, the value is clear. An AI-aided inspection tool helps companies minimize product flaws and improve quality, thereby reducing costs and increasing customer satisfaction,” detailed Liddle, in an effort to really explain how these projects need to be deployed in real world scenarios.
He provides another example, that of a global media and marketing company with studios based around the world.
If each of this company’s offices has its files stored independently, then a creative team pitching or developing a new project cannot tap into the larger firm’s institutional knowledge base. But if that global marketing giant’s files have been consolidated, curated and made securely available to AI-enhanced search and indexing tools, then the creatives in one office can quickly access previous work on projects in similar industries or locations, or even past work for the same client. This rapid access to institutional knowledge will help them generate more informed work in less time.
“The key step here is not exactly which AI tool you select or develop,” said Nasuni’s Liddle. “This is of course important. But first, a firm needs to do the data engineering work that allows these tools to be successful. That starts with a strong data management framework for files and unstructured data. This framework should give visibility into the dataset with a rich understanding of that dataset and global access to those files. Finally, the data management framework should be able to ingest new data and make it available to AI tools without forcing the data engineering team to jump through technical hoops. The fresher and more relevant the data AI can access, the better the results.”
AI Is Not Magic
Data engineering has never been an easy job and in many ways, it is more difficult than ever. Sadly, laments Liddle, there is no magic involved, but there are strong data management frameworks that could help firms get the most out of AI and generate true, measurable returns.
In reality – and as essential as data management frameworks no doubt are – a holistic approach to total data engineering along with an exhaustive approach to data provenance control and onward data propagation, preparation and proliferation is also needed. Just remember, AI looks like magic, but the guts, gears and grease show that it’s really not.