Apache Metron in the Real World
May 29, 2018
Never miss our publications about Open Source, big data and distributed systems, low frequency of one email every two months.
Apache Metron is a storage and analytic platform specialized in cyber security. This talk was about demonstrating the usages and capabilities of Apache Metron in the real world. The presentation was led by Dave Russell, Principal Solutions Engineer - EMEA + APAC at Hortonworks, at the Dataworks Summit 2018 (Berlin).
Apache Metron is a cyber security application framework that provides organizations the ability to ingest, process and store diverse security data feeds in order to detect cyber anomalies and enable them to rapidly respond.
It provides a scalable advanced security analytics framework which is built with Hadoop technologies and is specifically designed to monitor network traffic and machine logs within an organization by continuously consuming live flowing data from a lot of “data in motion” sources.
Apache Metron overview
Metron has a clear and intuitive interface.
Apache Metron interface
For each input we have some useful informations from Metron and we can filter on our own data too.
- A score to evaluate the level of the threat
- A timestamp
- The alert status
- The threat reason (for eg. “The distinct number of machines that user U22 attempted to login to (2) is more than 5 standard deviations (0.29) from the median (1.00)“)
- An associated user
Which response does Metron bring?
Currently, data retention time is much lower than the detection time of a breach, the average data retention duration is 6 months while for breach detection it’s 8 months. So we need a system that stores huge amounts of data over several years and that’s where Metron comes in!
”Sometime in the next few years we’re going to have out first category-one cyber-incident; one that will need a national response”
Ian Levy, Technical Director of National Cyber Security Center
Metron also come with algorithmic parts to detect threats.
Profiling by time
For cluster sizing there are several points to consider:
- Events per second (average and peak)
- Retention time for Hot/Warm/Cold zones
- Node sizing
- I/O Considerations
- PCAP (API for capturing network traffic)
The sizing of a cluster must be progressive:
- Today to 3 months: we use a fast indexing layer (using Apache Solr or ElasticSearch)
- 3 months to 12 months: we use a warm HDFS layer
- After 12 months: we use a cold HDFS layer
Metron offers many different solutions to each problem:
- Apache Nifi: syslog, socket, file, web services, SQL, RDBMS, Windows Event Log, FTP, MQ, JMS, Splunk and others
- High-pergformance DPDK Packet Capture
- Cisco ASA
- Palo alto
- Snort IDS
- Bro DPI
- Netflow, IPFIX
- Grok (Custom)
- Java (Custom)
- CEF, LEEF (ArcSight, Qradar compat.)
- Applications: DHCPD, AD
Enrichment and threat feeds
- Profiler and statistical baselining engine
- Model Services for advanced ML
- Threat Triage rules and scoring engine
Index and search features
Data science features
- Spark Machine Learning
- Zeppelin notebooks
- Eco-système des partenaires Wide
- PCAP inspector
- PCAP query
- Long term data store
Like sizing, deploying a Metron cluster must be progressive.
A fully deployed Apache Metron ecosystem
For example, a 3 phases deployment:
- Phase 1: Setup a HDP and a HDF cluster. You must ingest your files, streams and syslogs (through Apache NiFi) and enrich data with Storm, Parse, GeoIP, etc. You also need visualization tools like Grafana or Kibana.
- Phase 2: Install the Apache Metron Profiler (with Alert and Triage) and enrich data with Netflow, PCAP, Snort (through Apache Kafka).
- Phase 3: Finally, make historical analysis with Apache Spark and setup alerts (via UI and automated responses).
- Datawork Submit 2018 session (Apache Metron in the Real World by Dave Russell) (edit: retired url)
- Metron Architecture: https://cwiki.apache.org/confluence/display/METRON/Metron+Architecture
- Metron Installation: https://cwiki.apache.org/confluence/display/METRON/Installation