Support Ukrain
Adaltas logoAdaltasAdaltas logoAdaltas

Apache Spark

Apache Spark is a unified in-memory analytics platform for Big Data processing, data streaming, SQL, Machine Learning and graph processing.

The open source project, classified by the Apache Foundation as a top-level project since 2014, originated from UC Berkeley in the AMP Lab. It has since become an major actor of the Big Data ecosystem as an alternative and an evolution of MapReduce.

Due to its distributed architecture in a cluster, Apache Spark execute in a cluster to process large amounts of data with high performance and in parallel. Apache Spark processes the data in memory and is optimize to limit the usage of disks.

Many users use Spark DataFrames, which have been integrated in Scala, Python and Java since Spark version 2. Spark DataFrames, comparable to R DataFrames or Pandas DataFrames, enable data to be queried in a table structure. Its integration with Machine Learning enables analytical models to be applied to Big Data with Apache Spark. This is why the system is often referred to as the Swiss Army Knife of data processing.

Spark executes on various platforms including in standalone hosts and clusters, in Hadoop clusters with YARN and in the Databricks platform.

Related articles

Comparison of database architectures: data warehouse, data lake and data lakehouse

Comparison of database architectures: data warehouse, data lake and data lakehouse

Categories: Big Data, Data Engineering | Tags: Data Governance, Infrastructure, Iceberg, Parquet, Spark, Data Lake, Data Warehouse, File Format

Database architectures have experienced constant innovation, evolving with the appearence of new use cases, technical constraints, and requirements. From the three database structures we are comparing…

Gonzalo ETSE

By Gonzalo ETSE

May 17, 2022

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.