Data Science

La data science, et plus généralement l'Intelligence Artificielle (IA), se distingue de la programmation et de l'analyse traditionnelle par sa capacité à extraire des connaissances à partir de données et modifier son comportement (c’est-à-dire apprendre) sans programmation spécifique. Alors que les logiciels traditionnels prédéfinissent la logique qui régit leurs processus, les algorithmes de data science construisent et découvrent des modèles et sont en capacité de les améliorer continuellement.

La data science regroupe un ensemble de compétence incluant le Machine Learning, le traitement automatique du langage naturel (NLP pour Natural Language Processing), ou encore la reconnaissance de la parole, des images et des visages (entre autres applications). Dans certaines applications, les algorithmes vont jusqu'à simuler l’intelligence humaine.

STATISTICSBUSINESSDATA

Points clés essentiels

  • Les data scientists construisent, entrainent, et valident les modèles pour prendre des décisions critiques.
  • Les Data Scientists gèrent l'accès aux données, la reproductibilité et la collaboration afin de créer rapidement des modèles déployables à grande échelle.
  • Adaltas permet aux Data Scientists de créer, mettre à l'échelle et déployer facilement des modèles de machine learning en quelques minutes, contribuant ainsi à stimuler l'innovation dans l'ensemble de l'entreprise.
alien science

Articles associés à la data science

Deploy your containerized AI applications with nvidia-docker

Deploy your containerized AI applications with nvidia-docker

Catégories : Containers Orchestration, Data Science | Tags : containerd, DevOps, Learning and tutorial, NVIDIA, Container, Docker, Keras, TensorFlow

More and more products and services are taking advantage of the modeling and prediction capabilities of AI. This article presents the nvidia-docker tool for integrating AI (Artificial Intelligence…

Robert Walid SOARES

Par Robert Walid SOARES

24 mars 2022

Spring 2022 internship - building a Data Lab

Spring 2022 internship - building a Data Lab

Catégories : Data Science, Learning | Tags : MongoDB, Spark, Argo CD, Elasticsearch, Internship, Keycloak, Kubernetes, OpenID Connect, PostgreSQL

Job Description Over the last few years, we developed the ability to use computers to process large amounts of data. The ecosystem evolved over a large offering of tools and libraries and the creation…

David WORMS

Par David WORMS

24 nov. 2021

H2O in practice: a protocol combining AutoML with traditional modeling approaches

H2O in practice: a protocol combining AutoML with traditional modeling approaches

Catégories : Data Science, Learning | Tags : Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python, XGBoost

H20 comes with a lot of functionalities. The second part of the series H2O in practice proposes a protocol to combine AutoML modeling with traditional modeling and optimization approach. The objective…

H2O in practice: a Data Scientist feedback

H2O in practice: a Data Scientist feedback

Catégories : Data Science, Learning | Tags : Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python

Automated machine learning (AutoML) platforms are gaining popularity and becoming a new important tool in the data scientists’ toolbox. A few months ago, I introduced H2O, an open-source platform for…

Petra KAFERLE DEVISSCHERE

Par Petra KAFERLE DEVISSCHERE

29 sept. 2021

Apache Liminal: when MLOps meets GitOps

Apache Liminal: when MLOps meets GitOps

Catégories : Big Data, Containers Orchestration, Data Engineering, Data Science, Tech Radar | Tags : Data Engineering, CI/CD, Data Science, Deep Learning, Deployment, Docker, GitOps, Kubernetes, Machine Learning, MLOps, Open source, Python, TensorFlow

Apache Liminal is an open-source software which proposes a solution to deploy end-to-end Machine Learning pipelines. Indeed it permits to centralize all the steps needed to construct Machine Learning…

Aargan COINTEPAS

Par Aargan COINTEPAS

31 mars 2021

Storage size and generation time in popular file formats

Storage size and generation time in popular file formats

Catégories : Data Engineering, Data Science | Tags : Avro, HDFS, Hive, ORC, Parquet, Big Data, Data Lake, File Format, JavaScript Object Notation (JSON)

Choosing an appropriate file format is essential, whether your data transits on the wire or is stored at rest. Each file format comes with its own advantages and disadvantages. We covered them in a…

Barthelemy NGOM

Par Barthelemy NGOM

22 mars 2021

TensorFlow Extended (TFX): the components and their functionalities

TensorFlow Extended (TFX): the components and their functionalities

Catégories : Big Data, Data Engineering, Data Science, Learning | Tags : Beam, Data Engineering, Pipeline, CI/CD, Data Science, Deep Learning, Deployment, Machine Learning, MLOps, Open source, Python, TensorFlow

Putting Machine Learning (ML) and Deep Learning (DL) models in production certainly is a difficult task. It has been recognized as more failure-prone and time consuming than the modeling itself, yet…

Faster model development with H2O AutoML and Flow

Faster model development with H2O AutoML and Flow

Catégories : Data Science, Learning | Tags : Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python

Building Machine Learning (ML) models is a time-consuming process. It requires expertise in statistics, ML algorithms, and programming. On top of that, it also requires the ability to translate a…

Petra KAFERLE DEVISSCHERE

Par Petra KAFERLE DEVISSCHERE

10 déc. 2020

Data versioning and reproducible ML with DVC and MLflow

Data versioning and reproducible ML with DVC and MLflow

Catégories : Data Science, DevOps & SRE, Events | Tags : Data Engineering, Databricks, Delta Lake, Git, Machine Learning, MLflow, Storage

Our talk on data versioning and reproducible Machine Learning proposed to the Data + AI Summit (formerly known as Spark+AI) is accepted. The summit will take place online the 17-19th November…

Petra KAFERLE DEVISSCHERE

Par Petra KAFERLE DEVISSCHERE

30 sept. 2020

Experiment tracking with MLflow on Databricks Community Edition

Experiment tracking with MLflow on Databricks Community Edition

Catégories : Data Engineering, Data Science, Learning | Tags : Spark, Databricks, Deep Learning, Delta Lake, Machine Learning, MLflow, Notebook, Python, Scikit-learn

Introduction to Databricks Community Edition and MLflow Every day the number of tools helping Data Scientists to build models faster increases. Consequently, the need to manage the results and the…

Petra KAFERLE DEVISSCHERE

Par Petra KAFERLE DEVISSCHERE

10 sept. 2020

Version your datasets with Data Version Control (DVC) and Git

Version your datasets with Data Version Control (DVC) and Git

Catégories : Data Science, DevOps & SRE | Tags : DevOps, Infrastructure, Operation, Git, GitOps, SCM

Using a Version Control System such as Git for source code is a good practice and an industry standard. Considering that projects focus more and more on data, shouldn’t we have a similar approach such…

Grégor JOUET

Par Grégor JOUET

3 sept. 2020

Importing data to Databricks: external tables and Delta Lake

Importing data to Databricks: external tables and Delta Lake

Catégories : Data Engineering, Data Science, Learning | Tags : Parquet, AWS, Amazon S3, Azure Data Lake Storage (ADLS), Databricks, Delta Lake, Python

During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…

MLflow tutorial: an open source Machine Learning (ML) platform

MLflow tutorial: an open source Machine Learning (ML) platform

Catégories : Data Engineering, Data Science, Learning | Tags : AWS, Azure, Databricks, Deep Learning, Deployment, Machine Learning, MLflow, MLOps, Python, Scikit-learn

Introduction and principles of MLflow With increasingly cheaper computing power and storage and at the same time increasing data collection in all walks of life, many companies integrated Data Science…

Introduction to Ludwig and how to deploy a Deep Learning model via Flask

Introduction to Ludwig and how to deploy a Deep Learning model via Flask

Catégories : Data Science, Tech Radar | Tags : Learning and tutorial, Deep Learning, Ludwig Deep Learning Toolbox, Machine Learning, Python

Over the past decade, Machine Learning and deep learning models have proven to be very effective in performing a wide variety of tasks such as fraud detection, product recommendation, autonomous…

Robert Walid SOARES

Par Robert Walid SOARES

2 mars 2020

Internship Data Science & Data Engineer - ML in production and streaming data ingestion

Internship Data Science & Data Engineer - ML in production and streaming data ingestion

Catégories : Data Engineering, Data Science | Tags : Flink, DevOps, Hadoop, HBase, Kafka, Spark, Internship, Kubernetes, Python

Context The exponential evolution of data has turned the industry upside down by redefining data storage, processing and data ingestion pipelines. Mastering these methods considerably facilitates…

David WORMS

Par David WORMS

26 nov. 2019

Avoid Bottlenecks in distributed Deep Learning pipelines with Horovod

Avoid Bottlenecks in distributed Deep Learning pipelines with Horovod

Catégories : Data Science | Tags : GPU, Deep Learning, Horovod, Keras, TensorFlow

The Deep Learning training process can be greatly speed up using a cluster of GPUs. When dealing with huge amounts of data, distributed computing quickly becomes a challenge. A common obstacle which…

Grégor JOUET

Par Grégor JOUET

15 nov. 2019

Innovation, project vs product culture in Data Science

Innovation, project vs product culture in Data Science

Catégories : Data Science, Data Governance | Tags : DevOps, Agile, Scrum

Data Science carries the jobs of tomorrow. It is closely linked to the understanding of the business usecases, the behaviors and the insights that will be extracted from existing data. The stakes are…

David WORMS

Par David WORMS

8 oct. 2019

Machine Learning model deployment

Machine Learning model deployment

Catégories : Big Data, Data Engineering, Data Science, DevOps & SRE | Tags : DevOps, Operation, AI, Cloud, Machine Learning, MLOps, On-premises, Schema

“Enterprise Machine Learning requires looking at the big picture […] from a data engineering and a data platform perspective,” lectured Justin Norman during the talk on the deployment of Machine…

Oskar RYNKIEWICZ

Par Oskar RYNKIEWICZ

30 sept. 2019

TensorFlow installation on Docker

TensorFlow installation on Docker

Catégories : Containers Orchestration, Data Science, Learning | Tags : CPU, Jupyter, Linux, AI, Deep Learning, Docker, TensorFlow

TensorFlow is an Open Source software from Google for numerical computation using a graph representation: Vertex (nodes) represent mathematical operations Edges represent N-dimensional data array…

Pierre SAUVAGE

Par Pierre SAUVAGE

5 août 2019

Spark Streaming part 4: clustering with Spark MLlib

Spark Streaming part 4: clustering with Spark MLlib

Catégories : Data Engineering, Data Science, Learning | Tags : Apache Spark Streaming, Spark, Big Data, Clustering, Machine Learning, Scala, Streaming

Spark MLlib is an Apache’s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform for…

Oskar RYNKIEWICZ

Par Oskar RYNKIEWICZ

27 juin 2019

Introduction to Cloudera Data Science Workbench

Introduction to Cloudera Data Science Workbench

Catégories : Data Science | Tags : Azure, Cloudera, Docker, Git, Kubernetes, Machine Learning, MLOps, Notebook

Cloudera Data Science Workbench is a platform that allows Data Scientists to create, manage, run and schedule data science workflows from their browser. Thus it enables them to focus on their main…

Mehdi ELALAMI

Par Mehdi ELALAMI

28 févr. 2019

Applying Deep Reinforcement Learning to Poker

Applying Deep Reinforcement Learning to Poker

Catégories : Data Science | Tags : Algorithm, Gaming, Q-learning, Deep Learning, Machine Learning, Neural Network, Python

We will cover the subject of Deep Reinforcement Learning, more specifically the Deep Q Learning algorithm introduced by DeepMind, and then we’ll apply a version of this algorithm to the game of Poker…

Oscar BLAZEJEWSKI

Par Oscar BLAZEJEWSKI

9 janv. 2019

CodaLab – Data Science competitions

CodaLab – Data Science competitions

Catégories : Data Science, Adaltas Summit 2018, Learning | Tags : Database, Infrastructure, Machine Learning, MySQL, Node.js, Python

CodaLab Competition is a platform for code execution in the field of Data Science. It is a web interface on which a user can submit code or results and compare themselves to others. Let’s see how it…

Robert Walid SOARES

Par Robert Walid SOARES

17 déc. 2018

Nvidia and AI on the edge

Nvidia and AI on the edge

Catégories : Data Science | Tags : Caffe, Edge computing, GPU, NVIDIA, AI, Deep Learning, Keras, PyTorch, TensorFlow

In the last four years, corporations have been investing a lot in AI and particularly in Deep Learning and Edge Computing. While the theory has taken huge steps forward and new algorithms are invented…

Yliess HATI

Par Yliess HATI

10 oct. 2018

Lando: Deep Learning used to summarize conversations

Lando: Deep Learning used to summarize conversations

Catégories : Data Science, Learning | Tags : Micro Services, Open API, Deep Learning, Internship, Kubernetes, Neural Network, Node.js

Lando is an application to summarize conversations using Speech To Text to translate the written record of a meeting into text and Deep Learning technics to summarize contents. It allows users to…

Yliess HATI

Par Yliess HATI

18 sept. 2018

Deep learning on YARN: running Tensorflow and friends on Hadoop cluster

Deep learning on YARN: running Tensorflow and friends on Hadoop cluster

Catégories : Data Science | Tags : YARN, GPU, Hadoop, MXNet, Spark, Spark MLlib, Deep Learning, PyTorch, TensorFlow, XGBoost

With the arrival of Hadoop 3, YARN offer more flexibility in resource management. It is now possible to perform Deep Learning analysis on GPUs with specific development environments, leveraging…

Louis BIANCHERIN

Par Louis BIANCHERIN

24 juil. 2018

YARN and GPU Distribution for Machine Learning

YARN and GPU Distribution for Machine Learning

Catégories : Data Science, DataWorks Summit 2018 | Tags : YARN, GPU, Machine Learning, Neural Network, Storage

This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be…

Grégor JOUET

Par Grégor JOUET

30 mai 2018

TensorFlow on Spark 2.3: The Best of Both Worlds

TensorFlow on Spark 2.3: The Best of Both Worlds

Catégories : Data Science, DataWorks Summit 2018 | Tags : Mesos, YARN, C++, CPU, GPU, Tuning, Spark, JavaScript, Keras, Kubernetes, Machine Learning, Python, TensorFlow

The integration of TensorFlow With Spark has a lot of potential and creates new opportunities. This article is based on a conference seen at the DataWorks Summit 2018 in Berlin. It was about the new…

Yliess HATI

Par Yliess HATI

29 mai 2018

Apache Apex: next gen Big Data analytics

Apache Apex: next gen Big Data analytics

Catégories : Data Science, Events, Tech Radar | Tags : Apex, Flink, Storm, Tools, Hadoop, Kafka, Data Science, Machine Learning

Below is a compilation of my notes taken during the presentation of Apache Apex by Thomas Weise from DataTorrent, the company behind Apex. Introduction Apache Apex is an in-memory distributed parallel…

César BEREZOWSKI

Par César BEREZOWSKI

17 juil. 2016

Apache Apex with Apache SAMOA

Apache Apex with Apache SAMOA

Catégories : Data Science, Events, Tech Radar | Tags : Apex, Flink, Samoa, Storm, Tools, Hadoop, Machine Learning

Traditional Machine Learning Batch Oriented Supervised - most common Training and Scoring One time model building Data set Training: Model building Holdout: Paremeter tuning Test: Accuracy Online…

Pierre SAUVAGE

Par Pierre SAUVAGE

17 juil. 2016

Definitions of machine learning algorithms present in Apache Mahout

Definitions of machine learning algorithms present in Apache Mahout

Catégories : Data Science | Tags : Algorithm, Сlassification, Hadoop, Mahout, Clustering, Machine Learning

Apache Mahout is a machine learning library built for scalability. Its core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop…

David WORMS

Par David WORMS

8 mars 2013

Hadoop and R with RHadoop

Hadoop and R with RHadoop

Catégories : Business Intelligence, Data Science | Tags : Thrift, Learning and tutorial, R, Hadoop, HBase, HDFS, MapReduce, Data Analytics

RHadoop is a bridge between R, a language and environment to statistically explore data sets, and Hadoop, a framework that allows for the distributed processing of large data sets across clusters of…

David WORMS

Par David WORMS

19 juil. 2012

Installing and using MADlib with PostgreSQL on OSX

Installing and using MADlib with PostgreSQL on OSX

Catégories : Data Science | Tags : Database, Greenplum, Statistics, PostgreSQL, SQL

We cover basic installation and usage of PostgreSQL and MADlib on OSX and Ubuntu. Instructions for other environments should be similar. PostgreSQL is an Open Source database with enterprise…

David WORMS

Par David WORMS

7 juil. 2012

Canada - Maroc - France

Nous sommes une équipe passionnée par l'Open Source, le Big Data et les technologies associées telles que le Cloud, le Data Engineering, la Data Science le DevOps…

Nous fournissons à nos clients un savoir faire reconnu sur la manière d'utiliser les technologies pour convertir leurs cas d'usage en projets exploités en production, sur la façon de réduire les coûts et d'accélérer les livraisons de nouvelles fonctionnalités.

Si vous appréciez la qualité de nos publications, nous vous invitons à nous contacter en vue de coopérer ensemble.

Support Ukrain