7 minutes read
March 17, 2023
Data to AI to ROI: Managing Machine Learning in the Enterprise
Explore machine learning's impact on enterprise ROI, the ML lifecycle challenges, and the power of automation tools. Dive into real-world use cases like fraud detection.
The transformative power of Machine Learning (ML)
There is no shortage of headlines about how artificial intelligence (AI) has the potential to transform enterprise, and how machine learning (ML) is the key to unlocking this potential. In parallel the success of early-adopting small and medium-sized companies who are already making use of AI to speed up laborious tasks, boost decision-making, and increase scalability, productivity, and cost-efficiency is fueling wider interest in the sector.
However, the reality is that ML is a complex and challenging field that requires significant investments in resources and specialist expertise, and companies may be hesitant to make those investments without a clear understanding of the potential risks and rewards.
This post explores:
+ The challenge of the ML lifecycle for enterprise
+ The value of using automation tools to manage and deploy ML models
+ Why the potential benefits make it worth the effort for enterprises looking to gain a competitive advantage in their respective industries.
Managing ML in the enterprise
The considerations for good ML infrastructure management are:
- Fast and scalable architecture design and;
- Simple but powerful model management.
The ML lifecycleThe machine learning lifecycle is a framework that describes the various stages involved in developing, deploying, and maintaining a machine learning (ML) model. As deploying ML is a dynamic process the lifecycle is iterative. The following are the typical stages;
Fig 1. The machine learning lifecycle is dynamic and iterative which makes intrastructure management a challenge.
Fig 2: The machine learning pipeline environment is a dynamic iterative cycle that calls for careful management, and a new approach to operations
The Operations Challenge
As productionised ML models need to be continuously re-trained on the latest data to stay optimal, the deployment process calls for careful management, and a new approach to operations.
For enterprises that are deploying technology, DevOps is a well-known set of practices that combines software development (Dev) and IT operations (Ops) to improve collaboration, communication, and automation between the two. DevOps focuses on delivering software quickly and reliably through continuous integration, delivery, and deployment (CI/CD) pipelines, which are used to automate the building, testing, and deployment of software on CPU hardware.
Machine Learning operations (MLOps), on the other hand, involves the integration of ML models into the software development lifecycle, with a focus on managing the end-to-end ML workflow, from data preparation and model training to deployment and monitoring.
Fig 3: DevOps is focused on software development and deployment, while MLOps is focused on managing the end-to-end ML workflow, from data preparation to deployment and monitoring.
Once the model has been trained and is ready to be in production, the data scientist hands over to the engineering team all the code required (ML pipeline) to run their model. The engineers then take care of packaging and optimising the pipeline to run at scale inside their infrastructure.
As well as having to manage an iterative workflow and maintain high speed, low latency performance, there are other challenges that ML engineers face. These include;
Challenge 1: Managing models
In many scenarios, a single ML system may have hundreds or even thousands of models, each with its own set of configurations, dependencies and requirements. Sometimes this is a specific ML framework, but other times is a combination of binaries, python libraries and other non-ML dependencies. Managing these models and requirements can be challenging, as they need to be deployed, updated and maintained in a way that ensures optimal performance, accuracy and scalability.
Challenge 2: Infrastructure optimisationThe exponential compute requirement of running ML requires higher performing hardware for deployment ie. GPU’s rather than CPU’s. Managing GPU-based hardware or hybrid hardware (CPU and GPU) can greatly improve ML performance (including minimising ‘cold start’ which is the time it takes for the model to warm up, and in optimising the array for scale). When controlled manually, specialised technical expertise is required to optimise and manage these systems.
Challenge 3: Finding the right tools for the job
The productionization of ML is a relatively new requirement within many enterprises and, as an emerging technology, the deployment stack is still being developed. Existing solutions like Kubernetes help manage some of the problems. But engineering expertise is required to build, customise and maintain these Kubernetes-based systems. Furthermore, Kubernetes' strengths lie in CPU-based workflows, whereas ML workflows run better and faster with GPU-based hardware.
Challenge 4: Data security and compliance
Machine learning models can be vulnerable to security attacks due to the nature of the data they rely on, the complexity of the algorithms used, and the potential for malicious actors to exploit these vulnerabilities. It is important to take steps to mitigate these risks, such as using secure data storage, using robust algorithms, and regularly testing and auditing machine learning models. In addition, using open source software (like Kubernetes) as part of the deployment stack can open the stack up to vulnerabilities if patches and updates are not employed immediately.The way to address all of these challenges is to automate the deployment process using a fit-for-purpose ML deployment solution.
Choosing a ML deployment solution
As the deployment lifecycle for ML is relatively new to most technical operations teams, it either falls on an existing DevOPs team to manage, or means that the enterprise needs to look into other solutions.
The complex nature of the ML lifecycle can put a lot of strain on inhouse OP’s engineers, who may be servicing a number of different deployment workflows across an organisation. This can lead to a deployment backlog which is frustrating for data scientists who want to iterate and deploy quickly.
An automated deployment solution is key to unlocking this backlog and empowering teams to deliver faster. But what are the key features to look for in a solution?
Feature 1: Managing multiple models
The need to deploy multiple models simultaneously (from +1 to 1000’s) or to orchestrate real-time data-flows and inference, require fast and scalable architecture design that is optimised for ML.
// Look for: a platform that has been built specifically to handle machine learning models and compute
Key features include;
+ Multi model and multi environments handling
+ Preemptive model caching
+ Horizontal model access, eg. a platform that brings this initialisation time down to a few milliseconds
Feature 2: Infrastructure optimisation
When building scaling policies it requires a deep understanding of the underlying hardware and the specific requirements of each model. If scaling policies are not optimized, it can result in under-utilization of resources, leading to unnecessary costs, or over-utilization of resources, leading to sub-optimal performance and increased costs
// Look for: a serverless solution that is optimised for ML workloads across an array of CPU’s and GPU’s.
Key features include;
+ Cloud agnostic; hosted anywhere [on-prem, 3rd party cloud or hybrid]
+ Heterogeneous hardware handling
+ Premium task allocation of workloads to minimise ‘cold-start’ times
Feature 3: Finding the right tools for the job
The machine learning tech stack typically includes programming languages, machine learning libraries and frameworks, data storage and processing tools, cloud computing platforms, DevOps tools, and visualization tools. The specific tools and technologies used will depend on the specific machine learning application and the preferences of the developers and data scientists involved. The key is that the tools are compatible, and that the choice of partner does not tie the enterprise into expensive proprietary deals.
// Look for: A ‘deploy anywhere’ approach which enables the enterprise to be flexible on options for compute (price and location), as well as support for a wide range of common tools including;
+ Python, as the most commonly used language for machine learning due to its wide range of libraries and frameworks.
+ Machine learning frameworks, libraries and languages eg. TensorFlow, PyTorch, scikit-learn and Hugging Face
+ Databases or big data platforms such as Hadoop or Apache Spark
+ Data preprocessing and cleaning tools such as Pandas or NumPy
Feature 4: Data security and compliance
The National Cyber Security Centre (NCSC) recommends two key approaches to security;
A deployment solution, therefore, should have robust security features, such as data encryption, access control, and auditing as well as real-time monitoring and analytics capabilities.
// Look for: Zero trust for workloads; fine-grained security policies to restrict communication between workloads, third-party applications and the internet.
Key features include;
+ Compliance reporting and alerts – Continuously monitor and enforce compliance controls, easily create custom reports for audit.
+ Intrusion detection & prevention (IDS/IPS)
+ Data encryption and access control
+ ISO 27001, HIPPA, SOC2 compliance depending on the needs of your data set
How to measure ‘value’ in ML automation
Measuring the value (and potential ROI) of ML automation tools requires a holistic approach that considers both the financial and non-financial benefits that the tools provide.
For enterprises that are already manually deploying ML as part of existing workflows, the value in a bespoke automated solution will be quickly realised through the enhanced speed of delivery.
For R&D teams or as part of digital transformation, ML automation empowers data scientists to gain end-to-end control of their projects. This autonomy removes friction and increases resource efficiency which reduces the total cost of ownership for delivering applied AI.
Speed of delivery
Time savings: ML automation tools can significantly reduce the time required for data preparation, model development, and deployment, leading to faster time-to-market and increased productivity. The time savings can be measured in terms of hours, days, or weeks.
Cost savings: ML automation tools can reduce the cost of model development and deployment by automating repetitive tasks and reducing the need for manual intervention. The cost savings can be measured in terms of direct costs, such as hardware and software, as well as indirect costs, such as labor and training.
Hardware optimisation: Serverless orchestration tools enable organizations to scale their machine learning initiatives more easily, allowing them to process larger datasets and handle more complex models. The increased scalability can be measured in terms of the number of models and data points that the organization can handle across the same software and hardware array.
Talent: Machine learning can require significant investments in specialised personnel. As a relatively new technical profession with a global shortage of talent, new hires are at the top end of the technical pay scale. With automation as part of the solution, data-scientists are able to own the entire productionization of their ML pipelines without the overheads of engineering expertise.
The Innovation ‘edge’
Business impact: ML automation tools can drive business impact by improving decision-making, identifying new revenue streams, and reducing risk. The capacity to fast track R&D, and increase productivity means staying ahead of the competition in fast evolving markets. The business impact can be measured in terms of metrics such as revenue, customer acquisition, risk reduction.
In order to leverage the transformative power of AI, it is becoming imperative for companies seeking ROI from driving digital transformation to make the investment long before market turmoil requires it. As outlined in this white paper, there is a clear path to delivering a cost effective ML solution through using automation tools as part of a technology stack.
CASE STUDY: PAYPAL
PayPal, a leading online payment platform, processes billions of transactions every year, making it a prime target for fraudsters. In order to combat fraud, PayPal implemented a machine learning solution that analyzed transaction data and used predictive models to identify and prevent fraudulent transactions. The machine learning algorithm analyzed a wide range of data points, such as the location of the transaction, the device used, and the user's transaction history, to identify patterns and predict whether a transaction was likely to be fraudulent. By using machine learning to combat fraud, PayPal was able to reduce its fraud rate to just 0.32%, which is well below the industry average. This saved PayPal millions of dollars in chargebacks and other costs associated with fraud.
CASE STUDY: Lemonade
Lemonade, a US-based insurance company has started using machine learning algorithms to automate its claims processing and fraud detection. By automating these processes, Lemonade is able to save time and reduce its operational costs. In addition, ML can also help the company optimize its pricing and risk assessment processes. By analyzing data on customer behavior, demographic information, and other factors, the company can identify trends and patterns that may indicate higher or lower levels of risk, allowing it to adjust its pricing and underwriting practices accordingly. By using machine learning, Lemonade has been able to reduce its operational costs and improve its profitability. The company has reported that its use of machine learning has helped it achieve an expense ratio (the ratio of operating expenses to premiums earned) of just 27%, compared to the industry average of around 60%
CASE STUDY: ETSY
Etsy, the e-commerce website focused on handmade or vintage items and craft supplies, uses machine learning algorithms to improve search results and product recommendations for its customers. Its algorithms analyze customer data in realtime, including search queries, clicks, purchases, and other engagement signals, to provide personalized search results and recommendations. Their ML platform also enables them to analyze data on market demand, pricing trends, and other factors to recommend optimal pricing for each product based on seller preferences and goals. By using machine learning, Etsy has been able to improve its customer experience, increase sales, and enhance its marketplace operations. For example, the company has reported that its use of machine learning in search and discovery has resulted in a 20% increase in sales conversion rates.
About MYSTIC AI
Mystic AI is a fast growth, venture backed enterprise delivering solutions for managing machine learning (ML) in the enterprise and at the edge. We provide tailored solutions and expert guidance to companies looking to build and scale a robust machine learning infrastructure.
Having developed and launched our flagship MLOps automation toolkit and cloud solution; Pipeline AI, we have first hand experience of the ML challenge for enterprise. We are experts in the end-to-end ML hardware and software stack, and in supporting complex, high speed, distributed and real-time deployments.
MYSTIC AI is based in the UK, although we operate globally. Contact us to learn more about how we can help you integrate emerging ML technologies into your workflows and leverage the power of AI to solve business-critical challenges.
Pipeline AI - Enterprise Platform for ML
Pipeline AI is a fully managed enterprise-grade platform designed to deploy ML models at scale with speed, high-throughput, and consistent performance in the cloud of your choice. Upload, run and manage thousands of ML pipelines without the hassle of building and maintaining your infrastructure.
With a simple python and API-first interface, data-scientists can package their ML pipelines using our open-source library and deploy them with confidence on our platform that handles scheduling and auto-scaling on their cloud of choice (cloud, hybrid or on-premise) while optimising for cost and performance depending on their use-case.
Our platform provides transparency, monitoring and alerting of deployed ML pipelines via our Dashboard, CLI, API and integrations with the wider MLOps and infrastructure landscape.
With Pipeline AI you can reduce the cost of your infrastructure, save engineering time and empower teams to build and maintain an inhouse end-to-end ML deployment solution.