Loading...
Home2018-11-05T13:43:57+00:00

BigData

Data Engineering

Data Collect, Data Preparation, Data Lake, Data Governance

Data Science

Writing algorithms, Spark, Machine Learning, exploration, statistics, Python, R

Data Streaming

Message Bus, Key Performance Indicator (KPI), Threshold Detection, Time Window Queries, Intelligent Behaviors

Data Analytics

Visualization, notebooks

Latest articles

Data Lake ingestion best practices

By |June 18th, 2018|Categories: Data Engineering, DevOps|Tags: , , , , , , , |

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers. […]

DataWorks Summit 2018: A few days speaking Hadoop

By |June 5th, 2018|Categories: DataWorks Summit 2018|Tags: , |

The Adaltas crew went to the DataWorks Summit 2018 held in Berlin on the 18th and 19th of April 2018. On this occasion, we compiled a series of articles about the conferences that have marked [...]

Accelerating query processing with materialized views in Apache Hive

By |May 31st, 2018|Categories: Data Engineering, DataWorks Summit 2018|

Jesus Camacho Rodriguez from Hortonworks held a talk “Accelerating query processing with materialized views in Apache Hive” about the new materialized view feature coming in Apache Hive 3.0. This article covers the main principle of [...]

YARN and GPU Distribution for Machine Learning

By |May 30th, 2018|Categories: Data Science, DataWorks Summit 2018|Tags: , , |

This article goes over the fundamental principles of Machine Learning and what tools are currently used to run machine learning algorithms. We will then see how a resource manager such as YARN can be useful [...]

Apache Metron in the Real World

By |May 29th, 2018|Categories: Cyber Security, DataWorks Summit 2018, Events|Tags: , , |

Apache Metron is a storage and analytic platform specialized in cyber security. This talk was about demonstrating the usages and capabilities of Apache Metron in the real world. The presentation [...]

TensorFlow on Spark 2.3: The Best of Both Worlds

By |May 29th, 2018|Categories: Big Data, DataWorks Summit 2018, Deep Learning|Tags: , , , , , , |

The integration of TensorFlow With Spark has a lot of potential and creates new opportunities. […]

Running Enterprise Workloads in the Cloud with Cloudbreak

By |May 28th, 2018|Categories: Big Data, DataWorks Summit 2018|Tags: , , , |

This article is based on Peter Darvasi and Richard Doktorics’ talk Running Enterprise Workloads in the Cloud at the DataWorks Summit 2018 in Berlin. It presents Hortonworks’ automated deployment tool for cloud environments, Cloudbreak, describes [...]

Omid: Scalable and highly available transaction processing for Apache Phoenix

By |May 24th, 2018|Categories: Big Data, DataWorks Summit 2018, Events|Tags: , , , , , |

Apache Omid provides a transactional layer on top of key/value NoSQL databases. In practice, it is usually used on top of Apache HBase. […]