Spring 2022 internship - building a Datalab

Spring 2022 internship - building a Datalab

David WORMS

By David WORMS

Nov 24, 2021

Job Description

Over the last few years, we developed the ability to use computers to process large amount of data. The ecosystem evolved over a large offering of tools and libraries and the creation of the field of data science. Connecting all those components into a coherent and secured platform is a daunting tasks. New comers as well as more experienced users benefit from platforms which offers a first class developer experience.

Datalabs provide developers a comprehensive suite of software to help them explore, visualize, process and expose data. Using their favorite language such as Python, JavaScript or SQL, they build pipelines to collect and store data, build visualization dashboards and deploy machine learning models.

As part of your internship, you will assemble multiple open source technologies to provide the data scientists a modern environment suiting their needs. Data scientists expect a user friendly web interface to provision their favorite development editors, the ability to use their favorite libraries without restriction in an isolated and self contained environment, the scaling of resources according to their requirements, and the ability to push their code into production.

The Datalab platform relies on the flexible Kubernetes backend coupled with a document storage compatible with any S3 standard interface. On demand containers should be provisioned and cover a large panel of databases (Elasticsearch, MongoDB, PostgreSQL, …), environments (TensorFlow, VSCode, Jupyter, RStudio, …) and complementary tools such as secrets management with Vault, automated provisioning with Argo CD, OpenID Connect authentication with Keycloack, workflow scheduling, API publishing, …

During the course of this intership, you will become familiar with the Kubernetes and the CNCF ecosystem, gain a deep understanding of the roles and the responsibilities expected from Data Scientists and become comfortable in addressing their needs. You will join an agile team led by a Data Science expert.

In addition, you will obtain at the end of the internship a certification from a Cloud provider, and a Databricks certification.

Company presentation

Adaltas is a consulting agency led by a team of open source experts focusing in data management. We deploy and operate the storage and computing infrastructures in collaboration with our customers.

Partner with Cloudera and Databricks, we are also open source contributors. We invite you to browse our site and our many technical publications to learn more about the company.

Responsibilities

  • Understand and adress the need of data science
  • learn the various moving pieces of a Datalab
  • Deploy the Datalab inside a Kubernetes cluster
  • Deploy machine learning workflows

Expected qualifications

  • Engineering school, end of studies internship
  • Analytical and structured
  • Autonomous and curious
  • You are an open-minded person who enjoys sharing, communicating and learning from others
  • Good knowledge of Python, Spark and Linux systems

You will be in charge of understing the architecture and integrating with an existing infrastructure. You will work with InfraOps and data scientists. We are looking for a person who will develop skills on the following tools and solutions:

All complementary experiences are valuable.

Additional information

  • Location: Boulogne Billancourt, France
  • Languages: French or English
  • Start: February 2022
  • Duration: 6 months
  • Teleworking: possibility of working 2 days a week remotely

Available hardware

A laptop with the following characteristics:

  • 32GB RAM
  • 1TB SSD
  • 8c/16t CPU

A cluster made up of:

  • 3x 28c/56t Intel Xeon Scalable Gold 6132
  • 3x 192TB RAM DDR4 ECC 2666MHz
  • 3x 14 SSD 480GB SATA Intel S4500 6Gbps

Platforms, components, tools

A Kubernetes cluster.

Remuneration

  • Salary 1200 € / month
  • Restaurant tickets
  • Transportation pass
  • Participation in one international conference

In the past, the conferences which we attended include the KubeCon organized by the CNCF foundation, the Open Source Summit from the Linux Foundation and the Fosdem.

Contact

For any request for additional information and to submit your application, please contact David Worms:

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.