Lightning Engine

Accelerate Apache Spark performance

Our vectorized engine is an easier way to optimizing Spark with a smarter engine that delivers over 4.3x faster Spark performance*, reducing compute costs.

*The queries are derived from the TPC-DS standard  and TPC-H standard  and as such are not comparable to published TPC-DS standard  and TPC-H standard  results, as these runs do not comply with all requirements of the TPC-DS standard  and TPC-H standard  specification.

Apache Spark is a trademark of The  Apache Software Foundation .

Features

Reduce job runtimes and lower costs

Experience a faster way to run Spark. Accelerate your large-scale ETL, data science, and SQL workloads over 4.3x faster than open source Apache Spark. This dramatic reduction in job runtime lowers the total cost of ownership for your Spark workloads by reducing compute time.

Accelerate Spark performance

Discover an easier way to improve performance. Reduce spending valuable engineering cycles on optimizing Spark.

Intelligent data access and caching

Leverage a smarter architecture. Lightning Engine automatically caches hot data in memory and utilizes high-throughput, optimized connectors for Cloud Storage and BigQuery, significantly improving I/O latency and throughput for large-scale Spark data processing.


The core technology: Vectorized execution

Lightning Engine leverages a native C++ vectorized execution engine to process data in batches, dramatically improving CPU efficiency over traditional row-by-row processing. This is a core component of its breakthrough Spark performance.


Availability

Availability Lightning Engine is for your most demanding Spark workloads. You can access it with the premium tiers of Dataproc and Serverless Apache Spark
Product Availabilty Access

Generally available

Dataproc on Google Compute Engine

In preview

Coming soon

Availability

Lightning Engine is for your most demanding Spark workloads. You can access it with the premium tiers of Dataproc and Serverless Apache Spark

Availabilty

Generally available

Access

Dataproc on Google Compute Engine

Availabilty

In preview

Access

Coming soon

How It Works

Lightning Engine accelerates Spark data processing with a native C++ vectorized engine, intelligent caching, and optimized I/O. It processes data in batches for maximum CPU efficiency, reducing job runtimes, and compute costs. This suite of optimizations delivers breakthrough Spark performance.

Common Uses

Ideal for your most demanding jobs

Large-scale ETL

Dramatically reduce the runtime of your most complex Spark data processing and transformation pipelines. This means you can meet tighter data freshness SLAs, shrink overnight batch windows, and significantly lower the TCO of your most resource-intensive data pipelines.

    Large-scale ETL

    Dramatically reduce the runtime of your most complex Spark data processing and transformation pipelines. This means you can meet tighter data freshness SLAs, shrink overnight batch windows, and significantly lower the TCO of your most resource-intensive data pipelines.

      AI/ML data preparation

      Accelerate the feature engineering and data preparation steps that are critical for your machine learning lifecycle. By speeding up the most time-consuming part of the ML workflow, your data scientists can run more experiments, iterate on models faster, and get valuable AI applications into production sooner.

        AI/ML data preparation

        Accelerate the feature engineering and data preparation steps that are critical for your machine learning lifecycle. By speeding up the most time-consuming part of the ML workflow, your data scientists can run more experiments, iterate on models faster, and get valuable AI applications into production sooner.

          Interactive analytics

          Power fast, interactive SQL queries directly on your data lake for ad-hoc analysis and business intelligence. Empower your data analysts to maintain their train of thought with quicker query response times, leading to faster data exploration and more effective insights.

            Interactive analytics

            Power fast, interactive SQL queries directly on your data lake for ad-hoc analysis and business intelligence. Empower your data analysts to maintain their train of thought with quicker query response times, leading to faster data exploration and more effective insights.

              Pricing

              Accelerated Spark, your way Lightning Engine is a feature of the premium tiers of Dataproc and Google Cloud Serverless for Apache Spark.
              Product Pricing

              In preview, coming soon.

              Accelerated Spark, your way

              Lightning Engine is a feature of the premium tiers of Dataproc and Google Cloud Serverless for Apache Spark.

              Pricing

              In preview, coming soon.

              Pricing calculator

              Estimate your monthly costs, including region-specific pricing, and fees.

              Custom quote

              Connect with our sales team to get a custom quote for your organization.

              Accelerate your Spark

              Turbocharge your Spark jobs

              Have a large project?

              Start using Serverless for Apache Spark

              When to use Lightning Engine for Apache Spark

              Compare Dataproc and Serverless for Apache Spark

              Google Cloud
              Create a Mobile Website
              View Site in Mobile | Classic
              Share by: