Laying the Groundwork for AI Transformation Through Infrastructure

By Carm Taglienti, Chief Data Officer and Distinguished Engineer
3/30/2023

Artificial Intelligence (AI) offers substantial business benefits, but transformation doesn’t happen automatically. Carm Taglienti, Distinguished Engineer and product portfolio director at Insight, recently spoke with the “AI in Business Podcast” by Emerj to discuss some of the challenges organizations face — both when laying the groundwork for AI transformation from scratch and when developing existing systems in a way that avoids technical debt.




The transcript below is edited and condensed for clarity.

What are the best strategies for building AI infrastructure?

Carm: When we think about building an AI infrastructure, normally, we would think about things like models, and you know, how do we create a cool model that's going to allow us to realize some value within the organization or create competitive advantage. But fundamentally, it's mostly about the data, which is interesting.

So, when I think about infrastructure, I usually think about building up the data assets. It’s the data assets that really allow us to figure out what data we want to store. Where do we store it? In what form do we store it? And there are a lot of different facets that are associated with that. How do I get the data I need? How do I know that it's fit for purpose? And then how do I move forward?

When you're designing these systems, you want them to fit the organization, but you don't want them to be bespoke to the point where they instantly become dated. How do you avoid that?

What you're describing is really the idea that just because we have had something historically, that should be what we use moving forward. But we have to think about the time at which the particular data asset or infrastructure was created. And unfortunately, during the time when we were looking to provide value to our data, we might have written our code, or our architecture might have been written quickly and without a lot of focus on the future. And so that contributes to this technical debt concept.

Fit for purpose is an important characteristic. So, we really do have to go back and look at how much of the data is really valid or valuable to us moving forward and then to look at what some of the strategies might be for allowing us to create the set of data that we would require, in order to allow us to be able to realize the benefit of AI within our organization.

Let's say we're starting from scratch. What are some of the challenges? What's a typical sort of front door?

We typically see customers struggling to know where to start. Because there's a lot of fanfare about AI. Then you start to dig in a little bit more deeply, then you say, “Okay, that's great. I can do all kinds of cool analysis, but maybe I don't have the data.” Now, suddenly, you have to build out a data infrastructure.

So, I think, a good place to start, what makes things kind of easy is if you're using existing products, for example, PaaS or SaaS services that you already have, you can leverage some of the AI services that they provide. For example, something like Salesforce would apply some AI services to help you understand how to write your customer base a little bit more effectively. So that's a good entrĂ© into this whole concept and to help you just sort of dip your toe in the water. And then think more deeply about how to create a program that allows you to then be able to do things in a more effective way.

How do teams grow alongside infrastructure, especially for in-house operations that need to build the systems from scratch themselves?

I like to think about this as a culture of analytics. And so most organizations, certainly over the last decade, have become more data-driven organizations. And so, understanding the real value of data allows you to be able to look at the things that you might be implementing, and then really put that lens on and look around and ask, “How much value am I getting out of this data?” or “What is the asset that's made available to me, so I can make decisions in a more effective way?”

And what I've seen is that when companies have a culture of analytics and emphasize learning as you go, it becomes a model where people are interested in participating in the overall process. And so, it's creating their own services, or using low-code, no-code solutions, for example, to be able to create their own infrastructure to be able to do the analysis on their own. And so, it really sort of opens this door to how do I take advantage of the infrastructure and services that are made available to me to become more impactful to the organization.

What are their typical challenges for businesses, like financial services legacy institutions, that already have an existing infrastructure?

Organizations in those sectors seem to be doing a good job of collecting data these days, or at least understanding that this information could be valuable to me in the future, and they put it someplace. And so now what we see is this concept of dark data. So how do we shine a light on it? How do we allow ourselves to be able to take advantage of those data assets?

And there are a couple of ways that we can think about doing that. The first one is to engage with our IT teams to help us to understand how to inventory, what information might be available, and then produce things like metadata so that we can say, “Well, these are the data assets that we have, and here's what's included in them.” And then make that available to the consumers or the people in the business that will use that information.

And then from there, the sky is the limit. Once people understand that data is available, then they can use a wide variety of different tools, whether they're AI tools, or even BI tools, or even Excel for that matter, to be able to extract and access that data, and then really realize the value of that information in terms of driving the business forward.

But that's a hard thing to do, to create the sort of service orientation related to underlying data. But like I said before, usually organizations are good at collecting data, not so good at being able to make it available to the people that really need it.

Can you provide a more concrete definition for dark data?

Basically, it's data that is sitting on a system that's been tucked away someplace where nobody really uses it or knows where it is or if it’s available. And so that's why I call it dark data; it’s because it hasn't really seen the light of day. When it's finally made available, then suddenly, it's like, “Wow! We didn't realize we had 10 years of customer data” or whatever it might be. So that's where the term dark data comes from.

Where are the most common places for dark data to come from?

Well, I think it's probably coming from internal systems, mostly because data is being generated, and it's being stored. I like to call it “the Vapor Trail” of the ongoing operational processes, certain information that might be, for example, you know, social media data, for example. Maybe new programs that are being run, you know, by a bank or financial institution. And so that data is being kept, and people are very aware of it.

But it's all the operational systems where that data is being generated someplace. But it's not really being made available to the end users through a service interface, or even from a metadata and data governance perspective. People don't really know it's there. Business folks don't really know that it's there, and they don't know how to take advantage of it.

What do you recommend for strategies for choosing subject matter experts that might know those places even better than business leadership?

Normally, it's driven from the business units, and these kinds of things start by asking “What is the business goal? What are the business problems we're trying to solve?” And then usually, you start digging your way back to where the data is coming from, and you uncover all of this interesting information that's made available to you. So, and I don't want to go so draconian as to say, “Everybody needs a data governance environment.” But everybody does need a data governance environment.

So, it basically has to do with what is the data? Who is the steward for the data? How is the data made available to us? And then how is it going to be curated? How is the data going to be kept up to date? What's its valuable lifetime? What is the asset value to the organization? In general, that's how you would do it. Ensure that there's a business purpose first, and then sort of drive your way back into those operational systems. Because you know that that data probably exists somewhere within the organization.

What are you looking for in the AI capability to build proper AI infrastructure where you are doing proper and regular assessments on your technical debt?

A lot of times people think, “Oh, well, if I'm starting an AI program, or I want to start doing more with the data I have, I have to hire a team of data scientists.” Well, over the last four or five years, we've seen a lot of auto ML models come into play with things that are low-code, no-code solutions, by using some of these advanced features.

So, that’s a convenient way to put some of the power into the hands of your business users, where you don’t have to focus as much on having a PhD in statistical methods or algorithmic studies. But then you get back to this concept of fit for purpose.

Can you say more about the fit for purpose concept?

I would say it’s more of a philosophy. When you think about if it’s fit for purpose, you would look at it from the perspective of what technologies can help me to create the kind of data that I need. And then can I automate it and make it presentable for the time in which it’s required?

In general, you also must think beyond “I built something. Therefore, I'm done, and I can move on.” It's more of a constant evolution of determining what data you need — and for what purposes. And then determining its useful lifetime.

So, sometimes there's a temptation to adopt a hoarder’s mentality and keep the data longer than it's useful. Do you run into that problem?

Yes, definitely. And the interesting thing about it is that sometimes that hoarding of data occurs in sub pockets where we end up with shadow IT. Because people are like, “Oh, I fixed the data. And it looks exactly like I need to drive my AI process or my BI process.” And suddenly, now you get the shadow IT program that's sitting off to the side where it's like, “Oh, yeah, don't worry about it, we fixed everything. And it's in an Excel spreadsheet.”

It’s not maintainable. The IT team doesn't even know that that exists in the grand scheme. And you know, I'm not faulting anybody in IT groups, but sometimes IT says they're not even aware, or it takes too long. So really, mastering that agility model is super important to prevent shadow IT popping up.

So, it's an interesting dilemma. You really do need a cohesive kind of governance strategy that takes into consideration the things that we've been talking about. Because without it, then you end up with these pockets of either aging or useless data or making decisions off the data that's not even really relevant to the business any longer.

To what extent are a lot of these choices policy driven by the company versus the soundest data strategy?

It's a great question, and I've got two answers for it. The first answer is related to the regulatory environment in which your industry or company works. In that case, you have a lot of different kinds of data-driven policies that are related to your regulatory compliance standards. And it’s what you must do. The second answer is about companies focusing on the value proposition and the value of information and data. And so, you know, if you do align your ultimate business strategy to using data-driven design and decision-making processes, then that can be considered a culture of innovation, or a culture of data-driven decision-making.

And that really, I think, is where some of these policies come into play, where companies are focused on understanding the value of data, of accessibility to the information and data, and then just having everybody participate in the overall process. When I do see organizations with a culture of data-driven decision-making that has really taken root, everybody knows where to go to find data, and they make decisions based off the data. There's not speculation in the process. And those organizations are light-years ahead of other organizations that just operate by their gut, if you will.


Optimize your business data.

Our team can help your organization create a culture of data-driven design and decision-making. Click here to contact us, and we can set up a meeting where we’ll discuss how to help your business realize the full potential of data and AI.