Our 49th meetup goes into depth on large geospatial datasets in our traditional format. For this topic, TomTom is so kind to not only host us, but also provide the 1st presentation. Rockestate tackles the second.
1/ MapMapReduce: Managing Big Geospatial Data (TomTom)
TomTom is handling large volumes of geospatial data. The shape of this data poses some unique challenges but also some opportunities to exploit when it comes to distributed processing. In this talk we will shed some light on the data processing pipeline we have built and do a deep dive into geo spatial indexing on top of HBase.
2/ Building a 3D model of EACH building in Flanders (Mathieu Carette, Rockestate)
Processing big point cloud datasets: more and more cities, regions and countries gather point cloud data through airborne Lidar sensors. We'll explain what is point cloud data, discuss Flanders' large point cloud dataset and the challenges that pose the task of computing a 3D model for each building in Flanders.
20h00 1st presentation
21h00 2nd presentation
Companies are getting up to speed with their big data efforts. However, next to volume, they also see added value in speedier data. But how to fit that into the architecture?
1/ Capturing, storing and anonymising CDRs (Matej Pazdic & Pedro Santos, Voxbone)
At Voxbone we are dealing with millions of Call Detail Records (CDR) per day generated by our customers. With the growth of our customer base, we needed a faster way to store and process this data to enable personalised data access to fulfil the company’s specific business needs. We will show how we leveraged Kafka and Kafka Streams to build a custom solution for Voxbone that is able to cope with different use cases such as rating CDRs, comply with the GDPR directive, and data dashboarding powered by the ELK stack.
2/ Pragmatic Streaming, beaming to BigQuery
At Vente-Exclusive.com we've taken a pragmatic approach to streaming. Each micro service streams streams data over Pub/Sub to our data lake. We'll go over the mechanics and see how BigQuery, BigTable and Elastic fits into the picture. You also gain some inside on how our data lake looks like and how we handle streaming data in BigQuery.
Alex Van Boxel, Big Data Architect
Bob De Schutter, Product Manager
19h30 pizza and beers, thanks to Voxbone!
20h00 Storing and processing CDRs (Voxbone)
20h45 Pragmatic Streaming, beaming to BigQuery (Alex Van Boxel)
Industry 4.0 is a name for the current trend of automation and data exchange in manufacturing technologies. It includes cyber-physical systems, the Internet of things, cloud computing and cognitive computing. And it is mostly data-driven! An excellent topic for our meetup! We do have some major companies and startups active in the field!
1/ Marry Workflow and Big Data at TenForce (Bastiaan Deblieck, TenForce)
TenForce built a software product that is essentially a workflow system. Over the last 15 years we also built a nice reputation through our projects with semantic technology, linked open data, metadata, big data, …
In our presentation we will share our vision and activities to make our workflow system smarter; and we will show you what we learned from some of the research projects we have been working on.
• Industry 4.0, the TenForce solution and our research with KUL (Jetro Wils)
• Big Data Europe, learnings and results from a large scale EC research project (Jonathan Langens)
• SPECIAL, how to enforce GDPR in big data solutions (Uroš Milosevic)
2/ Prescriptive maintenance on filters (Youri Soons, Sitech)
It is not uncommon for a chemical plant to produce a revenue of up to 1 million euros per day. Imaginably, unplanned downtime of critical assets is prevented as much as possible by designing a tailor-made maintenance strategy. In a prescriptive approach, not only the upcoming failure is predicted, but actions are prescribed as well to prevent or mitigate the consequences. A combination of sensor- and process data has been successively used by Sitech to develop models for over 600 pieces of equipment running live in an online platform. I'm working on a decision support tool for filters at a chemical plant. The highly variable service life of these filters ranging from 1 to 10 days hinders them to be operated in an efficient way. After years of fruitful attempts to develop a physical (white box) model, my task is now to take a data-driven approach and accurately predict the Remaining Useful Life (RUL). I am still working on this project so I'm actively soliciting input during this interactive presentation.
19h30 - doors
20h00 - Industry 4.0 (Bastiaan Deblieck, TenForce)
20h45 - break
21h00 - Prescriptive maintenance on filters (Youri Soons, Sitech)
21h45 - Networking
Time-series is an often difficult topic in any data environment. Not only big. If it's fast it becomes even more complex. KX will present how they leveraged their tiny but blazingly fast technology, kdb, in a time-series use case. Time-Series Analytics for Big Fast Data (Sean Lang, KX.com). Learn how to use a relational time-series and columnar database as well as a tightly integrated query language capable of doing aggregations and consolidations on billions of streaming and real time historical records.
From optimised single node to distributed time-series analytics: linear regression on steroids. TrendMiner has transitioned from a highly optimised single node application to a distributed time-series analytics platform for the domain expert. We'll discuss the challenges of this transition, and illustrate it further based on a transformation of a linear regression algorithm into a memory- and compute-distributed implementation.
And finally, Joren Van Severen (consultant in sensor data analytics) will discuss tips and tricks in data science for mobile sensor data. There are certain specific difficulties that arise when analysing sensors available in many mobile devices. Accelerometers & Gyroscopes generate tons of noisy data, which can be overwhelming. This talk is about both the data science and data engineering techniques that exist to address these issues.
Our meetup is kindly hosted by Trendminer at Corda INCubator!
Pizza's and beer will be provided, so there is no reason not to come!
On January 31, 1958, Explorer 1 became the first successfully launched satellite. It was designed, built and operated by the NASA Jet Propulsion Laboratory and carried a cosmic ray detector. That detector lead to Explorer Principal Investigator Dr. James Van Allen's discovery of radiation belts around Earth held in place by the planet's magnetic field. The findings were later named Van Allen belts in honor of their discoverer.
Is that relevant, to our 45th meetup? No ... but still very interesting to know! What is relevant is our presentation lineup for tonight:
- our friends at Real-Impact Analytics are not only so kind to host us for this evening, but they will also present Trumania, the tool they are open-sourcing to generate synthetic datasets. You can read up on RIA's blog https://realimpactanalytics.com/en/news/trumania-or-the-need-for-an-artificial-digital-world. (Gautier Krings & Sven Vanderveken, RIA)
- VRT has been building a realtime streaming stack, at first for the audience measurements from their video streaming platform https://vrt.nu, but widening to VRTNWS, monitoring, etc. The platform is gently being open-sourced as https://github.com/dataprism covering ingestion, processing, provisioning and laboratory exploration of data, all from within the browser ... yes, even developing the processing. (Matthias De Vriendt, VRT)
- Dataminded has been building their own tools to organise and structure data lakes. Lighthouse will be the opensource version of their years of expertise in the field (Dataminded)
See you on the last day of the month!
19h30 - doors
20h00 - Trumania, FOSS synthetic data generator (Gautier Krings & Svend Vanderveken, RIA)
20h30 - Dataprism, FOSS realtime data platform (Matthias De Vriendt, VRT)
21h00 - break
21h15 - Lighthouse, FOSS tool to build and structure your datalake (Dataminded)
21h45 - Networking
Big Data is a team effort, so software engineering best practices apply! Like continuous integration and continuous deployment. AXA is so kind to host our meetup and discuss how they tackle CI/CD for their data pipelines, esp. on Spark. Real-impact analytics will share their experiences on this important subject as well. Edward De Brouwer (PhD KU Leuven) will explain how to do visual detection of car damage, while David Massart (D.E. Solution) will talk about the big data architecture put in place to collect and process in near-real time data generated by thousand of traffic cameras across Belgium using Kafka and Akka. Drop by to have our community vibe boost your creativity!
19h00 - doors
19h30 - Continuous integration / Continuous development for full data pipelines (Mehdi OUAZZA -
19h50 Continuous integration / Continuous development for full data pipelines (Daniel Mescheder - Real Impact Analytics)
20h10 - break
20h20 - Car damage visual detection (Edward De Brouwer - PhD KU Leuven)
20h40 - Managing Nation-Wide Traffic Cameras and Sensors (David Massart - D.E. Solution)
21h00 - networking
• Important to know
Because we are at a bank, security will be a bit more tight. So, keep your RSVP up to date that we can track attendees properly! We'll also have to close RSVP some time before the actual meetup!!!
As our 43rd meetup is so kindly hosted by the Collibra rockstars, a meetup on data governance is self-explanatory. It's a topic we as data practitioners are all confronted with ... even more so thanks to the GDPR lurking around the corner.
See you there!
1/ Big data: A Catalog and how to build it (Collibra)
Big Data technology stacks have quickly become a default go to for new application and data analysis architecture. Unfortunately they are at risk of getting the same problems as the old data architectures and you end up with a data swamp instead of a data lake. Many people look to the Data Catalog as a solution to index, control and search what is in the lake. Peter Princen (Collibra’s PM for Catalog) and Maxime Jeanmart (Senior Software Developer - data science) will explain Collibra’s view on a Catalog, how we are building it including the use of big data technology (like Spark), how cloud versus on premise changes part of the game and what we are working on next.
2/ Data governance tools (Data Minded)
We compared several open source tools in the master data management and data governance space, such as Hortonworks Schema Registry, Apache Atlas, Linkedin Wherehows and more. We share our lessons learned and advice on when to use them and when not to use them.
For our 42nd meetup we are invited at the Corda Campus! An ecosystem that's inspiring by so much more than only its location. Let's find that out by ourselves!
19h30 - doors
20h00 - Evolution of a scalable and reliable SaaS solution for enterprise training (Tom Pennings, Onsophic)
In 2010 several Silicon Valley veterans joined forces to rethink the world of education and how technology could truly be applied to the learning process setting the scene for the early EdTech startup Onsophic Inc. Today the company is helping large organizations scale education across thousands of learners thanks to a highly flexible and sophisticated data collection, analysis and recommendation engine. How did a relatively simple web based application evolve to a multi-region solution with ever increasing stability and scalability while maintaining data privacy? Keywords are realtime, analytics, privacy, nosql, containers, devops, auto-deployment
20h45 - break
21h00 - Cassandra for time series data (Joris Gillis, Trendminer)
The process industry is riding along on the IIoT wave, with millions of sensors monitoring every aspect of a process, producing large sets of dense time series. TrendMiner, the real google for the process industry, is a high performance discovery analytics engine for process measurement data. Using patent pending pattern recognition & machine learning big data technologies it provides powerful search capabilities and advanced machine learning algorithms for forensic analysis of past events and predictive monitoring on future anomalies. All this functionality is put directly in the hands of the process engineer without the need for a data scientist. This talk addresses three questions: Why should/shouldn't you use Cassandra for time series data? How to model time series in Cassandra? What configuration options are available for time series in Cassandra?
21h45 - networking
The meetup is gratefully hosted by Onsophic, Trendminer and Corda INCubator.
On site, you will find us in Corda building 1 at 1st floor.
Our 41st meetup is again packed with interesting topics by interesting Belgian organisations. Check it out and RSVP!
1/ Optimizing Big Data pipelines, Elastic Search GeoSpatial Case Study (Johan De Gelas, Sizing Servers)
Discovering pitfalls, quantifying technology choices and optimising case studies in differents sectors. That is the added value we deliver at the applied research group Sizing Servers (Howest, department NMCT). In this talk, we'll give an overview of the different data pipelines which we studied and zoom in on optimising Elasticsearch for a geospatial application.
2/ Building an intelligent cross cloud platform (Bram Pieters, Victhorious)
This talk is about how data helps to build an intelligent platform that is able to help developers optimize applications, to reduce operational costs, to allow business owners to optimize their conversion ratios, to automatically react to changing data variables or to predict and act based on captured data. This leads to a cognitive intelligence powered by data sourced from performance analytics, Machine Learning, anomaly detection, social media, log analytics, and resulting in an adaptive decision making model. This model allows the platform to power the most complex applications of the future.
The General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679) intends to strengthen and unify data protection for individuals within the EU. It also addresses the export of personal data outside the EU.
The primary objectives of the GDPR are to give citizens back control of their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU. The GDPR extends the EU data protection law to all foreign companies processing data of EU residents and installs severe penalties of up to 4% of worldwide turnover!
So, despite the GDPR being developed with a focus on cloud providers and social networks, the GDPR will impact all organisations and applications that process and store data. And, by the nature of the beast, it will impact any big data solution even more!
1/ Introduction to GDPR (Natalie Bertels, Centre for IT & IP Law, KUL)
During the first presentation, Natalie explains in more detail what the GDPR is and how it will impact data processing and organisations.
2/ Privacy Engineering with LINDDUN (Kim Wuyts, Aram Hovsepyan, Distrinet, Imec-KUL)
Privacy must also be integrated in the software development lifecycle as soon as possible. LINDDUN is a privacy threat analysis methodology that supports analysts in eliciting privacy requirements.
3/ How Rombit deals with GDPR and privacy? (Nico Janssens, technical director RomCore, Rombit)
The big data team from Rombit will go into the details on how they have setup their big data tools (Kafka a.o.) in the cloud to comply with the GDPR.
Rombit, the leading Belgian IoT provider, is kind enough to open their offices (Frankrijklei 115, Antwerp) to host us! It certainly looks like a brilliant place right in the center of Antwerp!
Park Indigo (Nationale bank), Frankrijklei 166, 2000 Antwerpen
The meetup will be video recorded for a documentary on applications and use of big data in Flanders/Belgium!