Introducing Lightning Engine — the next generation of Apache Spark performance. Read the blog.

Lightning Engine

Accelerate Apache Spark performance

Our vectorized engine is an easier way to optimizing Spark with a smarter engine that delivers over 4.3x faster Spark performance*, reducing compute costs.

*The queries are derived from the TPC-DS standard and TPC-H standard and as such are not comparable to published TPC-DS standard and TPC-H standard results, as these runs do not comply with all requirements of the TPC-DS standard and TPC-H standard specification.

Apache Spark is a trademark of The Apache Software Foundation .

Features

Reduce job runtimes and lower costs

Experience a faster way to run Spark. Accelerate your large-scale ETL, data science, and SQL workloads over 4.3x faster than open source Apache Spark. This dramatic reduction in job runtime lowers the total cost of ownership for your Spark workloads by reducing compute time.

Accelerate Spark performance

Discover an easier way to improve performance. Reduce spending valuable engineering cycles on optimizing Spark.

Intelligent data access and caching

Leverage a smarter architecture. Lightning Engine automatically caches hot data in memory and utilizes high-throughput, optimized connectors for Cloud Storage and BigQuery, significantly improving I/O latency and throughput for large-scale Spark data processing.

The core technology: Vectorized execution

Lightning Engine leverages a native C++ vectorized execution engine to process data in batches, dramatically improving CPU efficiency over traditional row-by-row processing. This is a core component of its breakthrough Spark performance.

Availability

Availability	Lightning Engine is for your most demanding Spark workloads. You can access it with the premium tiers of Dataproc and Serverless Apache Spark
Product	Availabilty	Access
Google Cloud Serverless for Apache Spark - Premium tier	Generally available	Start here
Dataproc on Google Compute Engine	In preview	Coming soon

A decision guide for Dataproc and Google Cloud Serverless for Apache Spark

Availability

Lightning Engine is for your most demanding Spark workloads. You can access it with the premium tiers of Dataproc and Serverless Apache Spark

Google Cloud Serverless for Apache Spark - Premium tier

Availabilty

Generally available

Access

Start here

Dataproc on Google Compute Engine

Availabilty

In preview

Access

Coming soon

A decision guide for Dataproc and Google Cloud Serverless for Apache Spark

How It Works

Lightning Engine accelerates Spark data processing with a native C++ vectorized engine, intelligent caching, and optimized I/O. It processes data in batches for maximum CPU efficiency, reducing job runtimes, and compute costs. This suite of optimizations delivers breakthrough Spark performance.

Common Uses

Ideal for your most demanding jobs

Large-scale ETL

Dramatically reduce the runtime of your most complex Spark data processing and transformation pipelines. This means you can meet tighter data freshness SLAs, shrink overnight batch windows, and significantly lower the TCO of your most resource-intensive data pipelines.

Slide reading 4.3x improved performance compared to open source Apache Spark

Learning resources

Large-scale ETL

AI/ML data preparation

Accelerate the feature engineering and data preparation steps that are critical for your machine learning lifecycle. By speeding up the most time-consuming part of the ML workflow, your data scientists can run more experiments, iterate on models faster, and get valuable AI applications into production sooner.

Learning resources

AI/ML data preparation

Interactive analytics

Power fast, interactive SQL queries directly on your data lake for ad-hoc analysis and business intelligence. Empower your data analysts to maintain their train of thought with quicker query response times, leading to faster data exploration and more effective insights.

Accelerated Spark, your way	Lightning Engine is a feature of the premium tiers of Dataproc and Google Cloud Serverless for Apache Spark.
Product	Pricing
Google Cloud Serverless for Apache Spark	Pricing details
Dataproc	In preview, coming soon.

Lightning Engine

Accelerate Apache Spark performance

Product highlights

Reduce job runtimes and lower costs

Accelerate Spark performance

Intelligent data access and caching

The core technology: Vectorized execution

Lightning Engine accelerates Spark data processing with a native C++ vectorized engine, intelligent caching, and optimized I/O. It processes data in batches for maximum CPU efficiency, reducing job runtimes, and compute costs. This suite of optimizations delivers breakthrough Spark performance.

Ideal for your most demanding jobs

Large-scale ETL

Learning resources

Large-scale ETL

AI/ML data preparation

Learning resources

AI/ML data preparation

Interactive analytics

Learning resources

Interactive analytics

Pricing calculator

Custom quote

Accelerate your Spark

Turbocharge your Spark jobs

Have a large project?

Start using Serverless for Apache Spark

When to use Lightning Engine for Apache Spark

Compare Dataproc and Serverless for Apache Spark