About 50 results
Open links in new tab
  1. Documentation | Apache Spark

    The documentation linked to above covers getting started with Spark, as well the built-in components MLlib, Spark Streaming, and GraphX. In addition, this page lists other resources …

  2. Downloads - Apache Spark

    Spark docker images are available from Dockerhub under the accounts of both The Apache Software Foundation and Official Images. Note that, these images contain non-ASF software …

  3. Spark SQL & DataFrames | Apache Spark

    Spark SQL includes a cost-based optimizer, columnar storage and code generation to make queries fast. At the same time, it scales to thousands of nodes and multi hour queries using …

  4. Spark SQL and DataFrames - Spark 4.1.0 Documentation

    Spark SQL is a Spark module for structured data processing. Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more information about the structure …

  5. Getting Started — PySpark 4.1.0 documentation - Apache Spark

    There are more guides shared with other languages such as Quick Start in Programming Guides at the Spark documentation. There are live notebooks where you can try PySpark out without …

  6. Spark Connect | Apache Spark

    Check out the guide on migrating from Spark JVM to Spark Connect to learn more about how to write code that works with Spark Connect. Also, check out how to build Spark Connect custom …

  7. Structured Streaming Programming Guide - Spark 4.1.0 …

    Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. You can express your streaming computation the same way you would express a …

  8. News | Apache Spark

    Jan 9, 2026 · We’re proud to announce the release of Spark 0.7.0, a new major version of Spark that adds several key features, including a Python API for Spark and an alpha of Spark …

  9. MLlib | Apache Spark

    You can run Spark using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on Kubernetes. Access data in HDFS, Apache Cassandra, Apache HBase, Apache Hive, and …

  10. MLlib: Main Guide - Spark 4.1.0 Documentation

    “Spark ML” is not an official name but occasionally used to refer to the MLlib DataFrame-based API. This is majorly due to the org.apache.spark.ml Scala package name used by the …