Apache Parquet is a binary, open-source, columnar storage format in Hadoop ecosystem. Its support for efficient compression and the ability to be split onto multiple disks and parallelized makes it suitable for usage in Big Data environment.
During a Machine Learning project we need to keep track of the training data we are using. This is important for audit purposes and for assessing the performance of the models, developed at a later…
May 21, 2020
A few days ago, we have conducted a test in order to compare various Hive file formats and compression methods. Among those file formats, some are native to HDFS and apply to all Hadoop users. The…
By David WORMS
Mar 13, 2012