Database
Related articles
Download datasets into HDFS and Hive
Categories: Big Data, Data Engineering | Tags: Analytics, HDFS, Hive, Big Data, Data Analytics, Data Engineering, Data structures, Database, Hadoop, Data Lake, Data Warehouse
Introduction Nowadays, the analysis of large amounts of data is becoming more and more possible thanks to Big data technology (Hadoop, Spark,…). This explains the explosion of the data volume and the…
By Aida NGOM
Jul 31, 2020
Comparaison of different file formats in Big Data
Categories: Big Data, Data Engineering | Tags: Analytics, Avro, HDFS, Hive, Kafka, MapReduce, ORC, Spark, Batch processing, Big Data, CSV, Data Analytics, Data structures, Database, JSON, Protocol Buffers, Hadoop, Parquet, Kubernetes, XML
In data processing, there are different types of files formats to store your data sets. Each format has its own pros and cons depending upon the use cases and exists to serve one or several purposes…
By Aida NGOM
Jul 23, 2020
CodaLab – Data Science competitions
Categories: Data Science, Adaltas Summit 2018, Learning | Tags: Database, Infrastructure, MySQL, Machine Learning, Node.js, Python
CodaLab Competition is a platform for code execution in the field of Data Science. It is a web interface on which a user can submit code or results and compare themselves to others. Let’s see how it…
Dec 17, 2018
Yahoo's Vespa Engine
Categories: Tech Radar | Tags: Database, Elasticsearch, Search Engine, Tools
Vespa is Yahoo’s fully autonomous and self-sufficient big data processing and serving engine. It aims at serving results of queries on huge amounts of data in real time. An example of this would be…
Oct 16, 2017
MariaDB integration with Hadoop
Categories: Infrastructure | Tags: Hive, Database, HA, MariaDB, Hadoop
During a workshop with one of our customers, Adaltas has identified a potential risk to use MariaDB’s High Availability (HA) strategy. Since the customer selected Cloudera’s CDH 5 distribution, the…
By David WORMS
Jul 31, 2017
Managing authorizations with Apache Sentry
Categories: Data Governance | Tags: Ansible, Hue, Database, LDAP, Nikita, Sentry, CDH, Deployment
Apache Sentry is a system for enforcing fine grained role based authorization to data and metadata stored on a Hadoop cluster. With this article, we will show you how we are using Apache Sentry at…
By Axel JACQIN
Jul 24, 2017
Hive, Calcite and Druid
Categories: Big Data | Tags: Analytics, Druid, Hive, Database, Hadoop
BI/OLAP requires interactive visualization of complex data streams: Real time bidding events User activity streams Voice call logs Network trafic flows Firewall events Application KPIs Traditionnal…
By David WORMS
Jul 14, 2016
Testing the Oracle SQL Connector for Hadoop HDFS
Categories: Data Engineering | Tags: HDFS, Database, File system, Oracle, CDH, SQL
Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in HDFS files or a Hive table. You can also query and join data in HDFS or a Hive table with other…
By David WORMS
Jul 15, 2013
Options to connect and integrate Hadoop with Oracle
Categories: Data Engineering | Tags: Avro, HDFS, Hive, MapReduce, Sqoop, Database, Java, NoSQL, Oracle, R, RDBMS, SQL
I will list the different tools and libraries available to us developers in order to integrate Oracle and Hadoop. The Oracle SQL Connector for HDFS described below is covered in a follow up article…
By David WORMS
May 15, 2013
Installing and using MADlib with PostgreSQL on OSX
Categories: Data Science | Tags: Database, Greenplum, Statistics, PostgreSQL, SQL
We cover basic installation and usage of PostgreSQL and MADlib on OSX and Ubuntu. Instructions for other environments should be similar. PostgreSQL is an Open Source database with enterprise…
By David WORMS
Jul 7, 2012