Loading...
Home2018-06-06T08:30:40+00:00

BigData

Data Engineering

Data collect, data preparation, data lake, data gouvernance

Data Science

Writing algorithms, Spark, machine learning, exploration, statistics, python, R

Data Streaming

Message Bus, Key Performance Indicator (KPI), Threshold Detection, Time Window Queries, Intelligent Behaviors

Data Analytics

Visualization, notebooks

Latest articles

Clusters and workloads migration from Hadoop 2 to Hadoop 3

By |July 25th, 2018|Categories: Big Data|Tags: , , , |

Hadoop 2 to Hadoop 3 migration is a hot subject. How to upgrade your clusters, which features present in the new release may solve current problems and bring new opportunities, how are your current processes [...]

Deep learning on YARN: running Tensorflow and friends on Hadoop cluster

By |July 24th, 2018|Categories: Data Science, DataWorks Summit 2018|Tags: , , , |

With the arrival of Hadoop 3, YARN offer more flexibility in resource management. It is now possible to perform Deep Learning analysis on GPUs with specific development environments, leveraging available resources. This article is a [...]

Curing the Kafka blindness with the UI manager

By |June 20th, 2018|Categories: Big Data, DataWorks Summit 2018|Tags: , , , |

Today it’s really difficult for developers, operators and managers to visualize and monitor what happens in a Kafka cluster. This articles covers a new graphical interface to oversee Kafka. It was given  by George Vetticaden, [...]

A CoreOS development cluster with Vagrant and VirtualBox

By |June 20th, 2018|Categories: Container, DevOps|Tags: , , , |

Following CoreOS’s instructions on how to set up a development environment in VirtualBox did not work out well for me. Here are the steps I followed to get Container Linux up and running with Vagrant. [...]

Data Lake ingestion best practices

By |June 18th, 2018|Categories: Data Engineering, DevOps|Tags: , , , , , , , |

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers. […]

DataWorks Summit 2018: A few days speaking Hadoop

By |June 5th, 2018|Categories: DataWorks Summit 2018|Tags: , |

The Adaltas crew went to the DataWorks Summit 2018 held in Berlin on the 18th and 19th of April 2018. On this occasion, we compiled a series of articles about the conferences that have marked [...]

Accelerating query processing with materialized views in Apache Hive

By |May 31st, 2018|Categories: Data Engineering, DataWorks Summit 2018|

Jesus Camacho Rodriguez from Hortonworks held a talk “Accelerating query processing with materialized views in Apache Hive” about the new materialized view feature coming in Apache Hive 3.0. This article covers the main principle of [...]