https://www.gstatic.com/bricks/image/59473a52-c645-4be1-ba53-64887e11a3f9.png

Resemble AI unlocks 2x faster model cycles using AI Hypercomputer

Google Cloud Results

2x faster epoch cycles by combining A3 VMs and Hyperdisk ML
99% reduced deepfake detection fine-tuning time (7 days to 1 hour)
100+ inference requests per second sustained
Sub-250ms response times for voice model
Faster pipelines enabled more experimentation and fast delivery

Resemble AI doubled training speed with Google Cloud’s AI Hypercomputer, built to power next-generation AI workloads.

What happens when your models outpace your stack

Resemble AI isn’t your average generative AI startup. They build advanced voice and audio models that go beyond voice cloning – think deepfake detection, invisible watermarking, and media identification. With roots in entertainment and gaming and growing traction in sectors like banking, telecom, and law enforcement, Resemble is a powerhouse for enterprise-ready voice AI.

As the company grew, so did the sophistication and size of their AI/ML workloads. Training and tuning models with increasingly large datasets became a logistical puzzle. Datasets ballooned to over 60 terabytes, and that growth led to a challenge: How do you keep high-powered accelerators productive when you’re constantly waiting on data to move?

Up to 90% of the engineering team's time went into data prep like moving, cleaning, and organizing files – instead of building and fine-tuning. Additionally, inconsistencies in how different team members accessed and trained on datasets made version control tricky. When models misbehaved, it wasn’t always clear if the issue was the data or the code.

Google Cloud was the natural fit to modernize infrastructure with AI-optimized tools and architecture.

Zohaib Ahmed

Co-founder and CEO, Resemble AI

Resemble’s momentum had outpaced their stack. But their relationship with Google Cloud paved the way for a scalable foundation. In 2020, they adopted Google Cloud’s ML Engine to support early model development. By 2022, they had moved to Vertex AI to scale training. So in 2024, says Zohaib Ahmed, CEO and co-founder of Resemble AI, “Google Cloud was the natural fit to modernize infrastructure with AI-optimized tools and architecture.”

They adopted Hyperdisk ML , a high-throughput storage solution built for AI workloads, alongside Google Compute Engine and Google Kubernetes Engine (GKE) to support full-scale development and production. This offered improved throughput, orchestration, and responsive infrastructure at scale. In 2025, they expanded with Gemini and Gemma to accelerate data labeling and support deepfake detection workflows. Each step in that journey reflects the value Google Cloud provides as Resemble scales further.

How Resemble got back in sync

Resemble’s engineering team finally had the infrastructure to match their ambition. That started with a dedicated Google Cloud account manager who helped improve performance and cost efficiency. “We knew where we wanted to go, and Google Cloud helped us get there faster,” said Ahmed. “Their team helped us match the right tools to our training and serving workflows.”

Our team doesn’t have to think too much about infrastructure – they just go do the work they’re supposed to do. And we have the flexibility to scale as needed, whether retraining models with 70 terabytes of data or serving requests at high concurrency.

Zohaib Ahmed

Co-founder and CEO, Resemble AI

From there, the pieces clicked into place. Resemble combined components of Google Cloud’s AI Hypercomputer – including A3 VMs and Hyperdisk ML. Hyperdisk ML loaded large static training datasets, while Local SSDs hosted smaller dynamic datasets. This mix provided high performance and data immutability alongside easy write access. Together, these choices improved data throughput, accelerated training, and enabled faster epoch cycles.

They used N2 VMs for upstream data cleaning and transformation, feeding that output into Hyperdisk ML volumes for high-throughput access during training on A3 VMs. For model serving, they ran inference on both A3 and G2 instances to balance performance and cost, storing model weights in Cloud Storage and loading them using Cloud Storage FUSE . Vertex AI handled fine-tuning jobs, often using spot instances to optimize costs. The team relied on committed use discounts across Compute Engine for longer-running workloads.

One of the biggest gains was simplicity. No one had to stitch together complex workarounds or juggle multiple tools. “Our team doesn’t have to think too much about infrastructure – they just go do the work they’re supposed to do,” said Ahmed. “And we have the flexibility to scale as needed, whether retraining models with 70 terabytes of data or serving requests at high concurrency.“

Even the setup process was smooth. “It was mostly clicking buttons on a console,” Ahmed said. “The longest part was just transferring the data over.”

The results speak volumes

Resemble saw immediate gains in speed, efficiency, and cost savings after adopting Hyperdisk ML and Google Cloud’s AI infrastructure. Hyperdisk ML helped eliminate Resemble’s biggest training bottleneck: getting data to the accelerators fast. Paired with A3 VMs, it doubled epoch speed, which accelerated every step of training.

One standout example: Model training that once took a week could now be completed in an hour. That 99% acceleration proved critical when new generative models hit the market. After Veo 3 launched, Resemble updated their deepfake detection system with full coverage in under 60 minutes.

“We’ve doubled training speed at Resemble AI by combining Google Cloud’s A3 VMs and Hyperdisk ML. We can experiment faster, and we have an easier path from prototype to production,” said Ahmed. That means more chances to improve model quality, less time idling on expensive GPU infrastructure, and more time for work that matters. “If every experiment takes half the time, you’re doubling your research velocity,” he added.

Before Hyperdisk ML, our training workflow was 90% data prep and 10% modeling. Now it’s flipped. Our team can actually focus on building.

Zohaib Ahmed

Co-founder and CEO, Resemble AI

Improvements in performance at scale were huge. “We’ve scaled to over 100 inference requests per second without hitting a ceiling,” said Ahmed. “That would’ve been impossible with our previous setup.” With better throughput and sub-250 milliseconds in many cases, models feel faster and more conversational. Resemble can move quickly without sacrificing quality. They continue to release new models, like their open-source voice model Chatterbox, while scaling to meet growing demand across industries.

“Before Hyperdisk ML, our training workflow was 90% data prep and 10% modeling. Now it’s flipped,” said Ahmed. “Our team can actually focus on building.” Most recently, the team has started using Gemini and Gemma to support data labeling and deepen their work in deepfake detection – another step in expanding what their infrastructure makes possible.