Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI
May 26, 2021
- Categories
- Data Engineering
- Learning
- Tags
- Cloud
- Data Lake
- Databricks
- Delta Lake
- MLflow [more][less]
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers and qualified partners. The list of available courses is presented in alphabetical order and might evolve over time. At the time of this publication, there are 71 courses, designed for the five main personas in Big Data and AI. This article will act as a guide to help you build your own learning program.
Thanks to the Databricks team, in particular Taggart McCurdy, for feedback, review and contribution to this article. Adaltas is a Databricks partner located in France. Don’t hesitate to contact us for complementary information.
We propose you the following approach:
- Pass the fundamental, high-level courses, which are included in all learning paths, and provide high level information about the data and AI space that is relevant to everyone.
- Choose a profession and its corresponding learning path among the following:
- Business Leader
- SQL Analyst
- Platform Administrator
- Data Scientist
- Data Engineer
- Pass the main, additional and accreditation training from the chosen learning path.
- Please note, that the Platform Administrator, Data Scientist, and Data Engineer personas also have certifications you can achieve, which include digital badges.
Each pathway we propose here is designed as a succession of several courses categorized into three groups: main, additional and accreditation. The main courses help you to progressively build fundamental knowledge for each related pathways. The additional courses shed light on very specific topics and will most often come handy to fill some gaps. Finally the accreditations are here to practice and validate the knowledge acquired during your training.
From a practical point of view, keep in mind that some of the following courses require some additional resources not provided by Databricks. We will specify the needed requirements when necessary.
The place of self-paced trainings inside the learning program proposed by Databricks is illustrated by the image below. The green outline shows the fields covered by these courses.
Self-paced courses give you a possibility to acquire three accreditations and to get ready for more advanced trainings and workshops. We could recommend the self-paced courses as a solid basis of your education.
Fundamental, high-level Courses for All Learning Paths
If you just started your Big Data & AI journey and do not know much about the solutions proposed by Databricks, you should start with this path. It will bring you all the necessary knowledge in Big Data & AI and the Databricks platform to move towards more advanced roles and leverage the possibilities of the Databricks platform. All these courses can be followed with a free Databricks community edition account.
Main Courses
- Fundamentals of Big Data (formerly titled Introduction to Big Data)
- Fundamentals of Unified Data Analytics with Databricks (formerly called Introduction to Unified Data Analytics with Databricks)
- Fundamentals of Delta Lake
Additional courses
- Databricks: How-To Videos
- Quick Reference: Databricks Workspace User Interface
- Quick Reference: Managing Databricks Notebooks with the Databricks Workspace
- How to Manage Clusters in Databricks
Accreditations
- Fundamentals of Unified Data Analytics with Databricks Accreditation
- Fundamentals of Delta Lake Accreditation (or Delta Lake Fundamentals Accreditation)
Business Leader Learning Path
Courses found in the Business leader learning path propose a high-level training on several topics related to Data Engineering and Data Science. This path fits perfectly with people with experience in dealing with Big Data & AI projects but want to acquire all the fundamental technical knowledge necessary to use the Databricks solutions. All the courses can be followed with a free Databricks community edition account.
Additional courses
- Fundamentals of Data Lakes and Data Warehouses
- Fundamentals of Lakehouse Architecture
- Fundamentals of Machine Learning
- Fundamentals of Structured Streaming
SQL Analyst Learning Path
If you want to gain knowledge in data analysis using SQL and Databricks products you should definitively tackle this learning pathway. It contains several courses describing how Databricks leverage Spark and SQL to perform ETL and data analysis. Some of the courses however require the use of Databricks SQL analytics which is still not open to the public but available on demand on their website.
Main courses
- Introduction to SQL on Databricks
- Quick Reference: Spark Architecture
- Delta Lake Rapid Start with Spark SQL
- Applications of SQL on Databricks
- SQL Coding Challenges
- Databricks SQL Analytics Fundamentals
- Data Visualization with Databricks SQL Analytics (requires an access to Databricks SQL Analytics)
Additional courses
- Fundamentals of Lakehouse Architecture
- Introduction to Apache Spark Architecture
- Databricks Workspace Fundamentals for Business Analytics
- How to Code-Along with Self-Paced Courses
- Just Enough Python for Apache Spark™
- Quick Reference: Relational Entities on Databricks
- What’s New In Spark 3.0
Accreditations
Platform Administrator Learning Path
The Platform administrator path provides the necessary knowledge on how to manage and administer clusters on Databricks. It is advised to have a good knowledge and practical experience with Big Data, Databricks and Cloud Engineering before picking this path. Because these courses require the use of Cloud providers and advanced privileges on the Databricks platform, they might lead to additional costs to provision Cloud resources. We gather and describe all the requirements on the table below. This learning path will also have a certification available in mid to late 2021.
Main courses
- Collection: AWS Databricks Workspace Administration
- AWS Databricks SQL Analytics Administration
- Collection: Azure Databricks Workspace Administration
- Azure Databricks SQL Analytics Administration
- Google Cloud Fundamentals
- Databricks on Google Cloud: Workspace Deployment
- Databricks on Google Cloud: Architecture and Security Fundamentals
- Databricks on Google Cloud: Cloud Architecture and System Integration
- Databricks on Google Cloud: Cluster Usage Management
Additional courses
- Fundamentals of Lakehouse Architecture
- Databricks Command Line Interface (CLI) Fundamentals
- Quick Reference: CI/CD
- Setting Up SQL Analytics
Courses which require special account
Courses | Requirements |
---|---|
AWS Databricks Workspace Deployment | Databricks account with Account Owner permissions |
AWS Databricks Identity Access Management | Databricks workspace deployment with administrator rights |
AWS Databricks Data Access Management | Databricks Premium Plan |
Collection: AWS Databricks Workspace Administration | Databricks Premium Plan; Administrator rights for an AWS Databricks workspace |
AWS Databricks SQL Analytics Administration | Databricks account on the Premium plan (with SQL Analytics enabled); Administrator credentials to your organization’s Databricks Workspace |
Azure Databricks Workspace Deployment | Access the Admin Console in the deployed Azure Databricks workspace |
Azure Databricks Data Access Management | Azure Databricks Premium Plan |
Collection: Azure Databricks Workspace Administration | Azure Databricks Premium Plan; Administrator rights for an Azure Databricks workspace |
Azure Databricks SQL Analytics Administration | Databricks account on the Premium plan (with SQL Analytics enabled); Administrator credentials to your organization’s Databricks Workspace |
Data Scientist Learning Path
The Data Scientist pathway is not about teaching you how to become a Data Scientist! Instead it will show you (i) how to leverage the Databricks platform to perform exploratory data analysis, (ii) train and test your models using Spark and (iii) track and deploy them using MLflow. Consequently this course fits perfectly with people who have experience in Data Science and want to sharpen their tools on the Databricks platform. This path contains a lot of additional content that will permit the learner to refresh some knowledge and fill some gaps if needed. Of note, one additional course also prepares the trainee to the Databricks Certified Associate Developer for Apache Spark Exam. All the courses can be followed with a free Databricks community edition account.
Main courses
- Apache Spark Programming with Databricks
- Scalable Machine Learning with Apache Spark
- Data Science on Databricks: The Bias-Variance Tradeoff
- Tracking Experiments with MLflow
- Deploying a Machine Learning Project with MLflow Projects
Additional courses
- Quick Reference: Spark Architecture
- Introduction to Apache Spark Architecture
- Applied Statistics with Databricks
- Quick Reference: Relational Entities on Databricks
- Lakehouse with Delta Lake Deep Dive
- Data Science on Databricks Rapidstart
- How to Code-Along with Self-Paced Courses
- Databricks with R
- Delta Lake Rapid Start with Python
- Just Enough Python for Apache Spark™
- Delta Lake Rapid Start with Spark SQL
- Introduction to Applied Unsupervised Learning
- Introduction to Feature Engineering and Selection with Databricks
- Introduction to Hyperparameter Optimization with Databricks
- Introduction to Natural Language Processing with Databricks
- Natural Language Processing at Scale with Databricks
- What’s New In Spark 3.0
- Certification Prep Course for the Databricks Certified Associate Developer for Apache Spark Exam
Data Engineer Learning Path
Junior or senior data engineers who want to master the tools proposed by Databricks for data engineering should take this path. The courses will cover all the necessary knowledge to use Spark properly in order to design data pipelines. The two main courses will provide detailed knowledge about the Spark APIs (both Scala and Python) and also present the inner working of the spark architecture necessary to design optimized pipelines. As for the Data Scientist pathway, a lot of additional courses are associated to complete your training or prepare the Databricks Certified Associate Developer for Apache Spark Exam. Most of the courses can be followed with a free Databricks community edition account.
Additional courses
- Introduction to Apache Spark Architecture
- Quick Reference: Spark Architecture
- Fundamentals of Lakehouse Architecture
- Quick Reference: Relational Entities on Databricks
- Lakehouse with Delta Lake Deep Dive
- How to Code-Along with Self-Paced Courses
- Just Enough Python for Apache Spark™
- Delta Lake Rapid Start with Python
- Delta Lake Rapid Start with Spark SQL
- AWS Databricks Cloud Architecture and System Integration Fundamentals
- Azure Databricks Cloud Architecture and System Integration Fundamentals
- Databricks Command Line Interface (CLI) Fundamentals
- Introduction to Databricks Connect
- Optimizing Apache Spark on Databricks
- Quick Reference: CI/CD
- Structured Streaming
- What’s New In Spark 3.0
- Certification Prep Course for the Databricks Certified Associate Developer for Apache Spark Exam
Conclusion
We proposed here a way to organize your learning pathway to ramp-up your skills to use the Databricks platform in different professional contexts. Target an objective and dive into one of these paths. Be aware that some of the courses mentioned above might require additional costs and these should be anticipated before you decide to start one of them. Keep in mind that we will update this article as Databrick adds content and changes to their eLearning offerings.