Oskar is an all-round engineer with data science and software development skills. He takes interest in Big Data and has an aptitude for Machine Learning. Lately, he has been focused on harnessing Spark, Hadoop, and distributed systems. Over last 2 years, he acquired fluency in Python Data Science ecosystem tools. Having done 6 months internship as Research Engineer in a top university in Australia, he has the ability to address problems formulated with mathematics.
He obtained his diploma from the French school of engineering IMT Atlantique, with specialization in Information Processing Systems. He has been studying Computer Science and Statistics, participating in numerous projects and collaborating with people from various nationalities and backgrounds. Three years outside his home country, Poland, allowed him to obtain a broad perspective and competencies in French and English. He is a generalist with a diverse skill set, always keen on learning and pushing his technical abilities.
Published articles

Machine Learning model deployment
Categories: Big Data, Data Engineering, Data Science, DevOps & SRE | Tags: DevOps, Operation, AI, Cloud, Machine Learning, MLOps, On-premises, Schema
“Enterprise Machine Learning requires looking at the big picture […] from a data engineering and a data platform perspective,” lectured Justin Norman during the talk on the deployment of Machine…
Sep 30, 2019

Spark Streaming part 4: clustering with Spark MLlib
Categories: Data Engineering, Data Science, Learning | Tags: Apache Spark Streaming, Spark, Big Data, Clustering, Machine Learning, Scala, Streaming
Spark MLlib is an Apache’s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform for…
Jun 27, 2019

Spark Streaming part 3: DevOps, tools and tests for Spark applications
Categories: Big Data, Data Engineering, DevOps & SRE | Tags: Apache Spark Streaming, DevOps, Learning and tutorial, Spark
Whenever services are unavailable, businesses experience large financial losses. Spark Streaming applications can break, like any other software application. A streaming application operates on data…
May 31, 2019

Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop
Categories: Data Engineering, Learning | Tags: Apache Spark Streaming, Spark, Python, Streaming
Spark can process streaming data on a multi-node Hadoop cluster relying on HDFS for the storage and YARN for the scheduling of jobs. Thus, Spark Structured Streaming integrates well with Big Data…
May 28, 2019

Spark Streaming part 1: build data pipelines with Spark Structured Streaming
Categories: Data Engineering, Learning | Tags: Apache Spark Streaming, Kafka, Spark, Big Data, Streaming
Spark Structured Streaming is a new engine introduced with Apache Spark 2 used for processing streaming data. It is built on top of the existing Spark SQL engine and the Spark DataFrame. The…
Apr 18, 2019

Publish Spark SQL DataFrame and RDD with Spark Thrift Server
Categories: Data Engineering | Tags: Hive, Thrift, JDBC, Hadoop, Spark, SQL
The distributed and in-memory nature of the Spark engine makes it an excellent candidate to expose data to clients which expect low latencies. Dashboards, notebooks, BI studios, KPIs-based reports…
Mar 25, 2019