Published on

Exafunction Raises $25M Series A

We are excited to announce our $25 million Series A financing, led by Greenoaks with participation from Founders Fund. This follows a $3 million seed round that we raised last year, also led by Greenoaks and a number of angel investors, including Spencer Kimball (CEO Cockroach Labs), Neha Narkhede (ex-CTO Confluent), Sahir Azam (CPO Mongo), Carlos Delatorre (ex-CRO Mongo, CRO TripActions), Howie Liu (CEO Airtable), Richard Socher (CEO, ex-chief scientist Salesforce), and more.

Deep learning is one of the most important technologies of our time, with applications from autonomous vehicles, to fraud detection, to machine translation, and more. But progress in deep learning comes at a steep cost: as models get more capable and complex, they also demand exponentially more compute.

That presents at least three related challenges. First, today more than ever, GPUs are expensive and scarce; second, most deep learning workloads underutilize hardware accelerators, bottlenecked by CPU or network that leave precious resources idle; and third, the world’s most advanced companies are scaling compute so quickly that they rapidly run into capacity ceilings, impeding their progress.

At Exafunction, we’re building software infrastructure that solves these problems, abstracting away the difficulties of running deep learning at scale — starting with the virtualization of GPUs. By using our platform, customers achieve 5-10x improvements in hardware utilization, getting more from their existing resources, speeding up their models, and cutting costs. We're already working with some of the world's largest autonomous vehicle companies, which use our technology to massively scale their simulation workloads and reduce their time to deploy fleets on the road!

Our product today

At its heart, Exafunction offers GPU virtualization that can efficiently offload any arbitrary computation, from model inferences using the major deep learning frameworks, to Python functions, all the way down to single CUDA kernels in C++. This enables complex applications that previously required a GPU to be run entirely on CPU-only machines, all without worrying about GPU utilization. The Exafunction service can multiplex many of these applications on remote GPU and CPU machines, autoscaling the cluster to ensure optimal resource utilization.

The Exafunction client is also designed to be resilient to remote machine failures or preemptions — even for stateful computations like video decoding. If a failure occurs, the client can reconstruct the necessary state on another remote machine without interrupting the application. This allows the Exafunction scheduler to take advantage of cheaper spot instances and aggressively migrate clients from underutilized instances, scaling down the cluster whenever possible. This is in contrast with other current solutions that only support stateless model inference and coarsely autoscale static groups of models, causing infrequently used models to leave the cluster underutilized.

There's a lot more we're working on to efficiently scale out deep learning workloads for the most sophisticated companies in the world. We'll be going into more depth on the core technology in future posts! If this kind of work excites you, please visit our careers page.

Looking forward

Over the next year, we’re focused on deepening and broadening our product: helping our existing customers squeeze even more performance out of their available hardware, and expanding the range of verticals that we can serve. We’re also working on creating the fastest runtime to deploy models at scale, improving performance for companies with latency-sensitive applications. We’re quickly staffing up a world-class engineering team with knowledge in machine learning and distributed systems.

In time, our vision is to become the reference serverless platform for all deep learning workloads. For too long, deep learning’s fullest benefits have been reserved for only the most well-resourced organizations in the world. At Exafunction, we’re building an infrastructure layer to change that.