This is our first case study on how ExaDeploy has helped cutting edge ML companies save money and accelerate their time-to-market. Look out for more in the near future, and reach out to us if any of this case study resonates with you!
ExaDeploy was able to improve Motive's GPU resource utilization by 4x, translating to $100k+ in annual savings, all within only a week of engineering integration time.
Motive is the industry leader in automated operations, focusing on vehicle fleets across all verticals of the physical economy. Motive utilizes AI-powered applications to automate tracking and telematics, driver safety, compliance, maintenance, spend management, and much more. One product that stands out is their AI Dashcam. Deployed over hundreds of thousands of vehicles, these dashcams improve driver safety by running deep learning-powered monitoring, entity tracking and kinematics estimation with a dual facing camera monitoring the driver and the road ahead. Motive processes hundreds of thousands of videos, or hundreds of millions of frames, each month on the edge and in the cloud.
Early on, Motive used AWS Sagemaker for their ML serving needs, but moved off of it to a custom Kubernetes-based platform in 2020. Some of the initial issues with Sagemaker included limits on maximum payload sizes, no gRPC interface, AWS charging a premium over basic instance costs, inability to use spot instances, and orders of magnitude slower end-to-end inference.
Pains and Concerns
However, there were still painful learnings and shortcomings building their own in-house solution. Some pain points were obvious, such as using empiric heuristics to autoscale rather than dynamically responding to ever-growing loads. And other pains were hard to notice, such as the low GPU resource utilization due to lack of GPU virtualization.
Other challenges that maintaining an in-house solution posed included:
- Scaling and resource optimization of heterogeneous compute workloads between CPUs and GPUs
- Supporting diverse libraries (PyTorch / Tensorflow / etc.) and model architectures
- Reducing latency and network utilization for real time serving
- Developer strain (both on the platform team and end users) from continual maintenance costs
Motive is a leader in proactive improvement of ML deployment, which has allowed their engineering team to scale their workloads and technology without being hindered by infrastructure. So naturally, Motive's team was thinking about how to level up and address these still-existent pains. With the accelerated growth of their AI dashcam products, Motive faced an increased pressure to scale while managing costs.
Motive's past experience with Sagemaker instilled general concerns with third party solutions, even if they could solve GPU utilization issues. Now, table stakes for a good solution included the ability to use cloud provider commitment discounts, no restrictions on model size or architecture, the ability to leverage spot instances, and optimal network compression.
Our video workloads were exponentially growing. High cost, performance and maintainability of an in-house solution drove us to seek out other industry experts and ML Serving architectures in the market. And we were lucky to find those industry experts in the team at Exafunction.Duy Tran, Senior Software Engineer, Motive
Discussions between Motive and Exafunction began in January 2022, and ExaDeploy on paper seemed to address all of the pains and concerns that Motive had.
ExaDeploy was a step up in GPU utilization:
- Scalable serverless deployment model: ExaDeploy's autoscaler has been field-tested to scale from zero to thousands of GPUs
- Shared GPU resources: GPUs can be shared by multiple clients, with automatic rebalancing and node draining to maximize utilization without incurring wait latencies
And because ExaDeploy has been built to already address known concerns with third party services, Motive was confident that ExaDeploy would not create the same problems as before:
- ExaDeploy runs in Motive's cloud: Motive could use all cloud discounts they could get from their cloud provider (and eliminated cloud provider lock in)
- No limits on model size, architecture, custom ops, etc: Motive's development would not be constrained by the serving solution
- Fault-tolerant pipelining: ExaDeploy has inbuilt fault tolerance, which allowed Motive to (a) not worry about custom complex retry logic and (b) run their workloads entirely on spot instances
- Point to point gRPC-based execution: ExaDeploy clients connect directly to the remote GPUs, which allows for zonal awareness, and pass data over gRPC rather than HTTP, minimizing network costs
The ExaDeploy solution even had features that could create unexpected wins for Motive:
- On the fly streaming compression: With large image data, this further reduces network egress costs on top of end-to-end gRPC
- Performant Python APIs: This improves per-frame end-to-end latencies even though model inference latency is unaffected
We were initially hesitant with a third party solution given our past good-faith efforts, but it became quite evident in our first conversations with Exafunction that they truly understood the pains that companies like Motive face in ML deployment. They've impressively translated this understanding into a product that actually solves the issues without creating new ones.Duy Tran, Senior Software Engineer, Motive
The Motive team is a great example of people that understand both the pains they currently face and the ones they likely would face as they scale. It was clear from our initial conversations that Motive's requirements and Exafunction's solution would match well.Varun Mohan, CEO, Exafunction
The integration of ExaDeploy into Motive's stack took less than a week of engineering effort, with the Exafunction team responding to troubleshooting questions in an average of 15 minutes.
Exafunction is committed to addressing even very customer-specific issues with robust solutions. These solutions are immediately rolled into the standard product offering to benefit all future customers as well. As examples:
- Motive developers use different hardware configurations (Mac, Linux) and libraries (Tensorflow, Pytorch) across their ML work, and Exafunction quickly provided libraries for all combinations
- Exafunction exposed an appropriate set of tunable parameters for Motive to get the best ExaDeploy performance on their specific traffic and infrastructure
- Motive's frame-level batching was complex and required a lot of network overhead, so Exafunction added low latency network compression to ExaDeploy, allowing Motive to eliminate their batching logic
Working with the strong team at Motive allowed us to learn how to continue to improve ExaDeploy using the learnings from a real integration of ExaDeploy into a customer's stack.Varun Mohan, CEO, Exafunction
In terms of performance, the tl;dr in terms of utilization and cost savings:
- GPU utilization: 4x improvement, 15% to 60%
- GPU pods requested: > 75% reduction
- Network egress: > 75% reduction
- Per frame latency: > 50% reduction
Motive has cut their machine learning cloud compute costs by 30% (including ExaDeploy pricing) and sped up their inferences dramatically, all while maintaining the same levels of accuracy.
It is rare to see such a large step function improvement in performance of anything with such little effort, let alone something as complex and nuanced as ML deployments. The results we have seen so far have reinforced our hypothesis that it is a no-brainer to use ExaDeploy for our serving needs.Saam Talaie, Head of Platform Engineering, Motive
Motive and Exafunction started their partnership at an inflection point in the former's product. Motive was facing a complexity explosion in their ML use cases and growing costs from ever-increasing scale. ExaDeploy helped Motive avoid a year or more of intense work on their own in-house ML serving framework, and has provided an elegant ML serving solution that saves money through a simple developer experience.
We no longer have to allocate multiple developers to managing and developing ML serving infrastructure. We can confidently rely on ExaDeploy to do the heavy lifting of model serving while our ML Engineers and Data Scientists can concentrate on solving the business problems our customers care about.Saam Talaie, Head of Platform Engineering, Motive
ExaDeploy has enabled Motive to take on projects that were previously financially unfeasible such as further deep-learning work around model testing/reliability and new products that apply computer vision to improve safety in other parts of the supply chain operations life cycle, not just fleets.
Moving forwards, Exafunction also plans on working with Motive to make their models run even faster on different kinds of GPUs, among other potential avenues of exploration.
We are excited to know that Exafunction will continue to find even more ways to improve ExaDeploy, and we are looking forward to our continued collaboration.Duy Tran, Senior Software Engineer, Motive
We are proud to be supporting Motive in their mission to leverage ML to improve safety in fleet operations.