Spring 2022 internship - building a Data Lab
By David WORMS
Nov 24, 2021
Never miss our publications, subscribe to the Adaltas' newsletter about Open Source, big data and distributed systems. We maintain a low frequency of one email every two months.
Over the last few years, we developed the ability to use computers to process large amounts of data. The ecosystem evolved over a large offering of tools and libraries and the creation of the field of data science. Connecting all those components into a coherent and secured platform is a daunting task. Newcomers, as well as more experienced users, benefit from platforms that offer a first-class developer experience.
As part of your internship, you will assemble multiple open source technologies to provide the data scientists with a modern environment suiting their needs. Data scientists expect a user-friendly web interface to provision their favorite development editors, the ability to use their favorite libraries without restriction in an isolated and self-contained environment, the scaling of resources according to their requirements, and the ability to push their code into production.
The Datalab platform relies on the flexible Kubernetes backend coupled with document storage compatible with any S3 standard interface. On-demand containers should be provisioned and cover a large panel of databases (Elasticsearch, MongoDB, PostgreSQL, …), environments (TensorFlow, VSCode, Jupyter, RStudio, …), and complementary tools such as secrets management with Vault, automated provisioning with Argo CD, OpenID Connect authentication with Keycloack, workflow scheduling, API publishing, …
During this internship, you will become familiar with the Kubernetes and the CNCF ecosystem, gain a deep understanding of the roles and the responsibilities expected from Data Scientists and become comfortable in addressing their needs. You will join an agile team led by a Data Science expert.
Adaltas is a consulting agency led by a team of open source experts focusing on data management. We deploy and operate the storage and computing infrastructures in collaboration with our customers.
Partner with Cloudera and Databricks, we are also open source contributors. We invite you to browse our site and our many technical publications to learn more about the company.
- Understand and address the need for data science
- learn the various moving pieces of a Datalab
- Deploy the Datalab inside a Kubernetes cluster
- Deploy machine learning workflows
- Engineering school, end of studies internship
- Analytical and structured
- Autonomous and curious
- You are an open-minded person who enjoys sharing, communicating, and learning from others
- Good knowledge of Python, Spark, and Linux systems
You will be in charge of understanding the architecture and integrating it with an existing infrastructure. You will work with InfraOps and data scientists. We are looking for a person who will develop skills on the following tools and solutions:
All complementary experiences are valuable.
- Location: Boulogne Billancourt, France
- Languages: French or English
- Start: February 2022
- Duration: 6 months
- Teleworking: possibility of working 2 days a week remotely
A laptop with the following characteristics:
- 32GB RAM
- 1TB SSD
- 8c/16t CPU
A cluster made up of:
- 3x 28c/56t Intel Xeon Scalable Gold 6132
- 3x 192TB RAM DDR4 ECC 2666MHz
- 3x 14 SSD 480GB SATA Intel S4500 6Gbps
A Kubernetes cluster.
- Salary 1200 € / month
- Restaurant tickets
- Transportation pass
- Participation in one international conference
For any request for additional information and to submit your application, please contact David Worms: