Data Governance

Data governance represents a set of procedures to ensure important data are formally managed through the company.

It provides trust in the datasets as well as user responsibility in case of low data quality. This is of particular importance inside a Big Data platform fully integrated inside the company where multiple dataset, multiple treatments and multiple users coexist.

Governance foundation

Articles related to gouvernance

Innovation, project vs product culture in Data Science

Innovation, project vs product culture in Data Science

Categories: Data Science, Data Governance | Tags: DevOps, Agile, Scrum

Data Science carries the jobs of tomorrow. It is closely linked to the understanding of the business usecases, the behaviors and the insights that will be extracted from existing data. The stakes are…

By David WORMS

Oct 8, 2019

Users and RBAC authorizations in Kubernetes

Users and RBAC authorizations in Kubernetes

Categories: Containers Orchestration, Data Governance | Tags: Authentication, Authorization, Cyber Security, Kubernetes, RBAC, SSL/TLS

Having your Kubernetes cluster up and running is just the start of your journey and you now need to operate. To secure its access, user identities must be declared along with authentication and…

By Robert Walid SOARES

Aug 7, 2019

Self-sovereign identities with verifiable claims

Self-sovereign identities with verifiable claims

Categories: Data Governance | Tags: Authentication, Blockchain, Cloud, Identity, Ledger

Towards a trusted, personal, persistent, and portable digital identity for all. Digital identity issues Self-sovereign identities are an attempt to solve a couple of issues. The first is the…

By Nabil MELLAL

Jan 23, 2019

Managing User Identities on Big Data Clusters

Managing User Identities on Big Data Clusters

Categories: Cyber Security, Data Governance | Tags: Ansible, FreeIPA, Identity, Kerberos, LDAP, Active Directory

Securing a Big Data Cluster involves integrating or deploying specific services to store users. Some users are cluster-specific when others are available across all clusters. It is not always easy to…

By David WORMS

Nov 8, 2018

Managing authorizations with Apache Sentry

Managing authorizations with Apache Sentry

Categories: Data Governance | Tags: Ansible, CDH, Hue, Database, Deployment, LDAP, Nikita, Sentry

Apache Sentry is a system for enforcing fine grained role based authorization to data and metadata stored on a Hadoop cluster. With this article, we will show you how we are using Apache Sentry at…

By Axel JACQIN

Jul 24, 2017

About the new BSD license and its difference with other BSD licenses

About the new BSD license and its difference with other BSD licenses

Categories: Data Governance | Tags: License, Open source

As a non restrictive Open Source license, the “new BSD license” is a commonly used license accross the Node.js community. However, this is only one of the BSD license available along the original “BSD…

By David WORMS

Aug 8, 2013

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.