How To Improve Your Data Science Project Maturity

By Kris Schroeder, Business Architect & Agilist
9/13/2022

Due to a lack of maturity in data science delivery approaches, companies are missing out on valuable business insights. In this second part of our three-part series, we're offering advice on how you can improve your company's data science project maturity.

In our last blog, we outlined common reasons why data science projects fall short and fail. Here, we’ll explore tactics to improve project maturity.

You need to know what question you are trying to answer.

Does your product vision align with the company’s goals? How does your product strive to address your customers’ needs? Has this vision been clearly communicated to the team, including the data scientists? Stakeholders must be engaged with the data and Artificial Intelligence (AI) teams to formulate the question they want to solve. The data and AI teams need to be equipped with business and relationship skills that allow them to ask questions that uncover customer goals, clarify the business priority, and establish ways to validate adoption and impact.

Our advice:

  • Clearly communicate the product vision and how it aligns with company goals.
  • Encourage close collaboration between the data science team and the business units they support.
  • Ensure your team has the skills necessary to ask the questions that result in an initial hypothesis.

Promote experimentation.

Perfection is the enemy of done. Data scientists are known to be perfectionists and for good reason — you don’t want bad data going into production. But overprioritizing perfection often leads to investing too much time and effort on hypotheses that don’t yield results. It’s a shift in mindset: It’s better to accept the potential for rework than it is to be perfect at delivering the wrong thing. One way to help teams focus on short feedback loops and on getting work done is to use experiments.

An experiment starts with your hypothesis. Your hypothesis should indicate what you will explore and the benefits you hope to achieve. After you confirm your hypothesis, develop experiments that will test it. Documenting the tests ensures the team has a shared understanding of the work to be done and a record of what has been tried. Next, set a timebox that will limit the amount of churn and establish a reflection point when a decision will be made to either continue investing time in the hypothesis or not. When defining your experiment, it’s also helpful to identify who will be leading the experiment as well as any dependencies to get started. Our next blog of the series will discuss how experimentation not only assists with clarifying work and shortening feedback loops but also drives continuous improvement.

  • Use an experimentation format for defining and clarifying work.
  • Adhere to a timebox to consistently reflect on what you have learned.
  • Intentionally decide if further investment in the hypothesis is valuable.

Ensure all employees have a basic comprehension of data science.

I'm not implying you need to enroll your organization in statistics courses or machine learning boot camps — but I am suggesting that some basic comprehension of what data science is and how it is inherently unpredictable is important to your company's success with model delivery. Man Chan, founder of WeR.ai, shared that Netflix spent nine months to achieve a breakthrough in its Machine Learning (ML) recommendations. After working on hundreds of hypotheses, the team found that few yielded results. And although Netflix invested tens of millions of dollars during that timeframe, the net result was billions gained. This short but powerful story emphasizes the gamble and risk involved in each and every hypothesis that is investigated, and the patience required for results.

  • Prepare stakeholders for long lead times.
  • Set expectations that many experiments will fail.
  • Get comfortable with an unclear roadmap.

Include stakeholders in data governance.

Bad data leads to dangerous results. Business leaders will be resistant to turning over business decisions to model predictions if they don't trust the data. It's important for stakeholders to actively participate in data governance to build trust. Just as it's important to have business engagement in the integrity of the data, it's equally important for business involvement in training and improving the models to ensure adoption.

  • Establish checks and balances to enforce proper data usage.
  • Deploy monitoring services to alert for suspicious data.
  • Engage business stakeholders in data validations, model training, and monitoring.

Foster collaboration.

A single team member who is an expert statistician, mathematician, machine learning specialist, data architect, DevOps engineer, data visualization developer, data translator, business domain master, business analyst, and storyteller doesn’t exist. You are unlikely to find a unicorn who can do it all. Data science is a team initiative. That's why it's important to establish a collaborative environment for the solution and product teams. Data scientists need to work together to share discoveries, offer advice, and identify opportunities to improve. They need to work closely with the business team to get deeper context on business problems and gain more trust and adoption of data models. Data scientists need to work with data and DevOps engineers to collect the data needed to test their hypotheses and deploy models to production.

As data science has grown in popularity, so has the number of ML models built by organizations. According to a survey by SAS, organizations deploy less than 50% of the best models and take more than three months to deploy 90% of them. The marrying of DevOps and data science has created a burgeoning field within data science called Machine Learning Operations, or MLOps, along with a brand-new role of ML engineer. This field blends the traditional CI/CD (Continuous Integration/Continuous Delivery) from application development and adds CT (Continuous Training): CI/CD/CT. Characteristics of a mature data science team now include structures such as Feature Store, Model Repositories, and Model Monitoring. Each is a key piece of the puzzle to help organizations serve their customers.

  • Coach and mentor teams to reflect on ways to improve their interactions with one another in addition to their work methods.
  • Get data scientists, engineers, and business stakeholders working together.
  • Create shared libraries and tools to allow the data science team to deploy results faster.

Concluding advice

Influencing change in culture, processes, and relationships is necessary to mature your company's ability to deliver data science projects. Focusing on collaboration and experimentation will provide a deeper understanding of the data you’re working with, increase the probability of more models making it into production, and grow adoption and trust once the models are in place.

Read our third and final part of this series, Lasting Success With Data Science: Fostering Agility Across Data Science Teams.

Or you can jump back to part 1: Why Data Science Projects Fail — And What To Do Next.