Internship in Big Data infrastructure with TDP
By Daniel HARTY
Oct 25, 2021
Never miss our publications, subscribe to the Adaltas' newsletter about Open Source, big data and distributed systems. We maintain a low frequency of one email every two months.
Big Data and distributed computing is at Adaltas’ core. We support our partners in the deployment, maintenance and optimization of some of France’s largest clusters. Adaltas is also an advocate and active contributor to Open Source with our latest focus being a new Hadoop distribution which is fully open source. This project is the TOSIT Data Platform (TDP).
During this internship, you will join the TDP project team and contribute to the development of the project. You will deploy and test production ready Hadoop TDP clusters, you will contribute code in the form of iterative improvements on the existing codebase, you will contribute your knowledge of TDP in the form of customer ready support resources and you will gain experience in the usage of core Hadoop components like HDFS, YARN, Ranger, Spark, Hive, and Zookeeper.
This will be a serious challenge, with a large number of new technologies and development practices for you to tackle from day one. In return for your dedication, you will finish your internship fully equipped to take on a role in the domain of Big Data.
Adaltas specialises in Big Data, Open Source and DevOps. We operate both on-premise and in the cloud. We are proud of our Open Source culture and our contributions have aided users and companies across the world. Adaltas is built on an open culture. Our articles share our knowledge on Big Data, DevOps and multiple complementary topics.
The development of the TDP platform requires an understanding of Hadoop’s distributed computation model and how its core components (HDFS, YARN etc.) work together to solve Big Data problems. A working knowledge of using Linux and the command line is required.
During the course of the internship you will learn:
- Hadoop cluster governance
- Hadoop cluster security including Kerberos and SSL/TLS certificates
- Highly availability (HA) of services
- Scalability in Hadoop clusters
- Monitoring and health assessment of services and jobs
- Fault tolerant Hadoop cluster with recoverability of lost data on infrastructure failure
- Infrastructure as Code (IaC) via DevOps tools such as Ansible and Vagrant
- Code collaboration using Git in both Gitlab and Github
- Become familiar with the TDP distribution’s architecture and configuration methods
- Deploy and test secure and fault tolerant TDP clusters
- Contribute to the TDP knowledge-base with troubleshooting guides, FAQs and articles
- Participate in the debates about the TDP project objectives and roadmap strategies
- Actively contribute ideas and code to make iterative improvements on the TDP ecosystem
- Research and analyse the differences between the major Hadoop distributions
- Location: Boulogne Billancourt, France
- Languages: French or English
- Starting date: mars 2022
- Duration: 6 mois
Much of the digital world runs on Open Source software and the Big Data industry is booming. This internship is an opportunity to gain valuable experience in both domains. TDP is now the only truly Open Source Hadoop distribution. This is a great momentum. As part of the TDP team, you will have the possibility to learn one of the core big data processing models and participate in the development and the future roadmap of TDP. We believe that this is an exciting opportunity and that on completion of the internship, you will be ready for a successful career in Big Data.
A laptop with the following characteristics:
- 32GB RAM
- 1TB SSD
- 8c/16t CPU
A cluster made up of:
- 3x 28c/56t Intel Xeon Scalable Gold 6132
- 3x 192TB RAM DDR4 ECC 2666MHz
- 3x 14 SSD 480GB SATA Intel S4500 6Gbps
A Kubernetes cluster and a Hadoop cluster.
- Salary 1200 € / month
- Restaurant tickets
- Transportation pass
- Participation in one international conference
For any request for additional information and to submit your application, please contact David Worms: