Companies and teams want to extract more business value from their data, and stay competitive by data-driven business decisions. For this, they rely on Machine Learning (ML) and Artificial Intelligence (AI). Therefore it’s important to reduce technical debt, and apply best practices from closely related fields such as Software Engineering. This is easier said than done, because when many stakeholders from different business, technical, and scientific backgrounds start to collaborate, it’s not easy to operationalize what might have started as an ad-hoc analysis on the laptop of a single data scientist. Moreover, blindly applying best practices from Software Engineering isn’t enough: the intricacies of ML and AI systems should be taken into account.

As we approach 2020, there are new open source tools such as DVC for version control of data sets, and MLflow for tracking data science and machine learning models & experiments, but it is still challenging to keep complexity under control for many teams. Full ML production lifecycle management is a necessity for wide-scale adoption and deployment of machine learning and deep learning across industries and for businesses to benefit from the core ML algorithms. The ongoing challenges have only recently started to get more attention, and the first “USENIX Conference on Operational Machine Learning” (OpML ’19) was only held in 2019.
As TM Data ICT Solutions, we help our clients with technology & process solutions for complex ML and AI system development, and you can see below a few questions to consider in your ML/AI projects. Our consultants are ready to listen to the pain points you encounter, so that we can work towards a solution:
- How well do you keep track of your data set versions?
- Do you annotate your data sets?
- Do you have lineage of your data sets?
- Can you link your ML and AI predictions back to a particular version of a data set?
- Can you get a “diff” between two different data set versions?
- How well do you keep track of your ML / AI models?
- Can you get a “diff” between two different models?
- Can you link a prediction to a particular model version, and couple it to a data set version, together with hyperparameters involved?
- How well do you track the features used in your model:
- are they up to date?
- Is there any unused feature? How do you track these?
- Do you have metrics that track the cost of a feature relating this to their predictive benefit?
- How do you handle Data Governance and Data Quality for your data sets?
- You want reliable and trustworthy results from your AI / ML systems, but what is your awareness about the data issues pertaining to the data that were used to train and build those systems?
- How well do you do code reviews?
- Did you know that “Thousands of Scientific Papers May be Invalid Due to Misunderstanding Python“? How would you prevent a similar thing happen in your process and team?
- What about the some other popular open source systems you might be using such as Scikit-learn? Do you apply good code review and documentation practices?
- Do you have undeclared consumers of your ML / AI prediction system?
- Without access controls, it is possible for some of these consumers to be undeclared consumers and they are expensive at best and dangerous at worst. The problem is the tight coupling of model A to other parts of the stack. Changes to A will very likely impact these other parts, sometimes in ways that are unintended, poorly understood, or detrimental to model B.
- Is your AI / ML system and results reproducible?
- What’s the cost for automating this in your environment?
- Do you keep the configuration of your system under control?
- What do you consider as “configuration” for your ML / AI system?
- What about “configuration” for model training part?
- How do you monitor your AI / ML system that is in production?
- Do you go beyond low-level infrastructure monitoring that’s provided out-of-the-box?
- Do you have an easy way to design statistical monitoring of your ML / AI system, e.g. to detect model drift and have early warnings with potential risk management actions?
- What are the testing strategies and techniques used by your ML / AI / Data Science?
- Do you go beyond unit and integration testing?
- Do you use property-based testing?
- How do you deal with the statistical and randomness aspects of your ML / AI systems when it comes to testing?
- How do you deal with failures in your ML / AI data pipelines?
- What are your error handling and retry strategies?
- How do you handle intermittent failures?
- How do you evaluate the robustness of your end-to-end AI / ML system?
- Do your engineers and scientists take enough care for creating idempotent components, so that running them more than once don’t lead to unpredictable results?
Further Reading
- Software Engineering for Machine Learning: A Case Study
- Machine Learning: The High-Interest Credit Card of Technical Debt
- Hidden Technical Debt in Machine Learning Systems
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction
- Conference proceedings of 2019 USENIX Conference on Operational Machine Learning: OpML ’19