Internship in Data Engineering

Internship in Data Engineering

David WORMS

By David WORMS

Oct 25, 2021

Job Description

Data is a valuable business asset. Some call it the new oil. The data engineer collects, transform and refine ​​raw data into information that can be used by business analysts and data scientists.

As part of your internship, you will be trained in the different aspects of the data engineer activities. You will build a real-time, end-to-end data streaming ingestion pipeline combining metric collections, data cleansing and aggregation, storage to multiple data warehouses, (near) real-time analysis by exposure key metrics in a dashboard, and the usage of machine learning models applied to the prediction and detection of weak signals.

You will participate in the application architecture and the implementation of the pipeline with the goal of going into production. You will join an agile team led by a Big Data expert.

In addition, you will obtain at the end of the internship a certification from a Cloud provider, and a Databricks certification.

Company presentation

Adaltas specializes in the processing and storage of data. We work on-premise and in the cloud to operate Big Data platforms and strengthen our clients’ teams in the areas of architecture, operations, data engineering, data science and DevOps. Partner with Cloudera and Databricks, we are also open source contributors. We invite you to browse our site and our many technical publications to learn more about Adaltas.

Responsibilities

  • Collecting system and application metrics
  • Supplying a distributed data warehouse with OLAP-type column storage
  • Cleansing, enrichment, aggregation of data flows
  • Real-time analysis in SQL
  • Dashboards creation
  • Putting machine learning models into production in an MLOps cycle
  • Deployment in an Azure cloud infrastructure and on-premise

Expected qualifications

  • Engineering school, end of studies internship
  • Analytical and structured
  • Autonomous and curious
  • You are an open-minded person who enjoys sharing, communicating and learning from others
  • Good knowledge of Python, Spark and Linux systems

You will be in charge of designing the technical architecture. We are looking for a person who masters or who will develop skills on the following tools and solutions:

All complementary experiences are valuable.

Additional information

  • Location: Boulogne Billancourt, France
  • Languages: French or English
  • Start: February 2022
  • Duration: 6 months
  • Teleworking: possibility of working 2 days a week remotely

Available hardware

A laptop with the following characteristics:

  • 32GB RAM
  • 1TB SSD
  • 8c/16t CPU

A cluster made up of:

  • 3x 28c/56t Intel Xeon Scalable Gold 6132
  • 3x 192TB RAM DDR4 ECC 2666MHz
  • 3x 14 SSD 480GB SATA Intel S4500 6Gbps

Platforms, components, tools

A Kubernetes cluster and a Hadoop cluster.

Remuneration

  • Salary 1200 € / month
  • Restaurant tickets
  • Transportation pass
  • Participation in one international conference

In the past, the conferences which we attended include the KubeCon organized by the CNCF foundation, the Open Source Summit from the Linux Foundation and the Fosdem.

Contact

For any request for additional information and to submit your application, please contact David Worms:

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.