PySpark
- Learn more
- Official website
Related articles
![H2O in practice: a protocol combining AutoML with traditional modeling approaches H2O in practice: a protocol combining AutoML with traditional modeling approaches](/static/1fa130ce2060d6d5efe8fdbedc6ed3d8/0fd76/h2o-automl-protocol.png)
H2O in practice: a protocol combining AutoML with traditional modeling approaches
Categories: Data Science, Learning | Tags: Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python, XGBoost
H20 comes with a lot of functionalities. The second part of the series H2O in practice proposes a protocol to combine AutoML modeling with traditional modeling and optimization approach. The objectiveā¦
Nov 12, 2021
![H2O in practice: a Data Scientist feedback H2O in practice: a Data Scientist feedback](/static/12298157086f95aa3e94c715d4f08041/0fd76/h2o_puzzle.png)
H2O in practice: a Data Scientist feedback
Categories: Data Science, Learning | Tags: Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python
Automated machine learning (AutoML) platforms are gaining popularity and becoming a new important tool in the data scientistsā toolbox. A few months ago, I introduced H2O, an open-source platform forā¦
Sep 29, 2021
![Faster model development with H2O AutoML and Flow Faster model development with H2O AutoML and Flow](/static/ffdcb08fdc4a81159054a6ecdda2cfe3/0fd76/h2o.png)
Faster model development with H2O AutoML and Flow
Categories: Data Science, Learning | Tags: Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python
Building Machine Learning (ML) models is a time-consuming process. It requires expertise in statistics, ML algorithms, and programming. On top of that, it also requires the ability to translate aā¦
Dec 10, 2020
![Introducing Apache Airflow on AWS Introducing Apache Airflow on AWS](/static/358020001e61e6c5b394b2947b20fc2d/0fd76/apache-airflow.png)
Introducing Apache Airflow on AWS
Categories: Big Data, Cloud Computing, Containers Orchestration | Tags: PySpark, Learning and tutorial, Airflow, Oozie, Spark, AWS, Docker, Python
Apache Airflow offers a potential solution to the growing challenge of managing an increasingly complex landscape of data management tools, scripts and analytics processes. It is an open-sourceā¦
May 5, 2020
![Spark Streaming part 1: build data pipelines with Spark Structured Streaming Spark Streaming part 1: build data pipelines with Spark Structured Streaming](/static/3ec2e1f537e802c7e408b778ee49a1cc/0fd76/spark-streaming-data-pipelines-with-structured-streaming.png)
Spark Streaming part 1: build data pipelines with Spark Structured Streaming
Categories: Data Engineering, Learning | Tags: Kafka, Spark, Apache Spark Streaming, Big Data, Streaming
Spark Structured Streaming is a new engine introduced with Apache Spark 2 used for processing streaming data. It is built on top of the existing Spark SQL engine and the Spark DataFrame. Theā¦
Apr 18, 2019
![What's new in Apache Spark 2.3? What's new in Apache Spark 2.3?](/static/46372f50096c115597b7bfa686c38035/0fd76/spark-2-3.png)
What's new in Apache Spark 2.3?
Categories: Data Engineering, DataWorks Summit 2018 | Tags: Arrow, PySpark, Tuning, ORC, Spark, Spark MLlib, Data Science, Docker, Kubernetes, pandas, Streaming
Letās dive into the new features offered by the 2.3 distribution of Apache Spark. This article is a composition of the following talks seen at the DataWorks Summit 2018 and additional research: Apacheā¦
May 23, 2018