Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI

Self-Paced training from Databricks: a guide to self-enablement on Big Data & AI

Anna KNYAZEVA

By Anna KNYAZEVA

May 26, 2021

Categories: Data Engineering, Learning | Tags: Cloud, Data Lake, Databricks, Delta Lake, MLflow [more][less]

Self-paced trainings are proposed by Databricks inside their Academy program. The price is $ 2000 USD for unlimited access to the training courses for a period of 1 year, but also free for customers and qualified partners. The list of available courses is presented in alphabetical order and might evolve over time. At the time of this publication, there are 71 courses, designed for the five main personas in Big Data and AI. This article will act as a guide to help you build your own learning program.

Thanks to the Databricks team, in particular Taggart McCurdy, for feedback, review and contribution to this article. Adaltas is a Databricks partner located in France. Don’t hesitate to contact us for complementary information.

We propose you the following approach:

  • Pass the fundamental, high-level courses, which are included in all learning paths, and provide high level information about the data and AI space that is relevant to everyone.
  • Choose a profession and its corresponding learning path among the following:
    • Business Leader
    • SQL Analyst
    • Platform Administrator
    • Data Scientist
    • Data Engineer
  • Pass the main, additional and accreditation training from the chosen learning path.
  • Please note, that the Platform Administrator, Data Scientist, and Data Engineer personas also have certifications you can achieve, which include digital badges.

Each pathway we propose here is designed as a succession of several courses categorized into three groups: main, additional and accreditation. The main courses help you to progressively build fundamental knowledge for each related pathways. The additional courses shed light on very specific topics and will most often come handy to fill some gaps. Finally the accreditations are here to practice and validate the knowledge acquired during your training.

From a practical point of view, keep in mind that some of the following courses require some additional resources not provided by Databricks. We will specify the needed requirements when necessary.

The place of self-paced trainings inside the learning program proposed by Databricks is illustrated by the image below. The green outline shows the fields covered by these courses.

Databricks learning path

Self-paced courses give you a possibility to acquire three accreditations and to get ready for more advanced trainings and workshops. We could recommend the self-paced courses as a solid basis of your education.

Fundamental, high-level Courses for All Learning Paths

If you just started your Big Data & AI journey and do not know much about the solutions proposed by Databricks, you should start with this path. It will bring you all the necessary knowledge in Big Data & AI and the Databricks platform to move towards more advanced roles and leverage the possibilities of the Databricks platform. All these courses can be followed with a free Databricks community edition account.

Main Courses

Additional courses

Accreditations

Business Leader Learning Path

Courses found in the Business leader learning path propose a high-level training on several topics related to Data Engineering and Data Science. This path fits perfectly with people with experience in dealing with Big Data & AI projects but want to acquire all the fundamental technical knowledge necessary to use the Databricks solutions. All the courses can be followed with a free Databricks community edition account.

Additional courses

SQL Analyst Learning Path

If you want to gain knowledge in data analysis using SQL and Databricks products you should definitively tackle this learning pathway. It contains several courses describing how Databricks leverage Spark and SQL to perform ETL and data analysis. Some of the courses however require the use of Databricks SQL analytics which is still not open to the public but available on demand on their website.

Main courses

Additional courses

Accreditations

Platform Administrator Learning Path

The Platform administrator path provides the necessary knowledge on how to manage and administer clusters on Databricks. It is advised to have a good knowledge and practical experience with Big Data, Databricks and Cloud Engineering before picking this path. Because these courses require the use of Cloud providers and advanced privileges on the Databricks platform, they might lead to additional costs to provision Cloud resources. We gather and describe all the requirements on the table below. This learning path will also have a certification available in mid to late 2021.

Main courses

Additional courses

Courses which require special account

Courses Requirements
AWS Databricks Workspace Deployment Databricks account with Account Owner permissions
AWS Databricks Identity Access Management Databricks workspace deployment with administrator rights
AWS Databricks Data Access Management Databricks Premium Plan
Collection: AWS Databricks Workspace Administration Databricks Premium Plan; Administrator rights for an AWS Databricks workspace
AWS Databricks SQL Analytics Administration Databricks account on the Premium plan (with SQL Analytics enabled); Administrator credentials to your organization’s Databricks Workspace
Azure Databricks Workspace Deployment Access the Admin Console in the deployed Azure Databricks workspace
Azure Databricks Data Access Management Azure Databricks Premium Plan
Collection: Azure Databricks Workspace Administration Azure Databricks Premium Plan; Administrator rights for an Azure Databricks workspace
Azure Databricks SQL Analytics Administration Databricks account on the Premium plan (with SQL Analytics enabled); Administrator credentials to your organization’s Databricks Workspace

Data Scientist Learning Path

The Data Scientist pathway is not about teaching you how to become a Data Scientist! Instead it will show you (i) how to leverage the Databricks platform to perform exploratory data analysis, (ii) train and test your models using Spark and (iii) track and deploy them using MLflow. Consequently this course fits perfectly with people who have experience in Data Science and want to sharpen their tools on the Databricks platform. This path contains a lot of additional content that will permit the learner to refresh some knowledge and fill some gaps if needed. Of note, one additional course also prepares the trainee to the Databricks Certified Associate Developer for Apache Spark Exam. All the courses can be followed with a free Databricks community edition account.

Main courses

Additional courses

Data Engineer Learning Path

Junior or senior data engineers who want to master the tools proposed by Databricks for data engineering should take this path. The courses will cover all the necessary knowledge to use Spark properly in order to design data pipelines. The two main courses will provide detailed knowledge about the Spark APIs (both Scala and Python) and also present the inner working of the spark architecture necessary to design optimized pipelines. As for the Data Scientist pathway, a lot of additional courses are associated to complete your training or prepare the Databricks Certified Associate Developer for Apache Spark Exam. Most of the courses can be followed with a free Databricks community edition account.

Additional courses

Conclusion

We proposed here a way to organize your learning pathway to ramp-up your skills to use the Databricks platform in different professional contexts. Target an objective and dive into one of these paths. Be aware that some of the courses mentioned above might require additional costs and these should be anticipated before you decide to start one of them. Keep in mind that we will update this article as Databrick adds content and changes to their eLearning offerings.

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.