• Popular ML algorithms with scikit-learn & ML models optimization with SageMaker

    Alba Graduate Business School, The American College of Greece

    Hello wonderful big data developers and enthusiasts. We are excited to announce our next event that will take place on Thursday, 14 March at 7pm. Our venue this time is the auditorium on the ground floor of Alba Graduate Business School, The American College of Greece. Our speakers for the evening will be Julien Simon (https://linkedin.com/in/juliensimon/) from Amazon Web Services and Pavlos Mitsoulis Ntompos (https://linkedin.com/in/pavlosmitsoulis/) from Expedia Group. In the first talk Julien will walk us through the most popular ML algorithms, and in the second talk Pavlos will show us the art of building and optimizing ML models. Looking forward to seeing you there! Adrianos | Euangelos | Stavros Talks: 1st Talk: An intro to popular ML algorithms with Python and scikit-learn In this session, we'll introduce you to popular Machine Learning algorithms: regression, classification, decision trees, etc. We'll use Python, SciKit-learn and Jupyter notebooks. No prior ML experience required. 2nd Talk: A focus on building and optimizing ML models with Amazon SageMaker The biggest challenge facing a Machine Learning professional is to train, tune, and deploy Machine Learning on the cloud. AWS SageMaker offers a powerful infrastructure to build end-to-end ML solutions. This talk will teach you to run your new or existing ML project on SageMaker. You will train, tune, and deploy your models in an easy and scalable manner by abstracting many low-level engineering tasks. You will see how to code training and prediction workflows by working on a novel ML problem using embeddings. The talk will focus on the usage of Python libraries: sagemaker (https://sagemaker.readthedocs.io/en/stable/) and sagify (https://kenza-ai.github.io/sagify/). Speakers: 1st Speaker: As the Global AI & Machine Learning Evangelist, Julien focuses on helping developers and enterprises bring their ideas to life. He frequently speaks at conferences and he's also actively blogging at https://medium.com/@julsimon . 2nd Speaker: Pavlos Mitsoulis Ntompos has 7 years of Machine Learning and Software Engineering experience. Currently, he is a Staff Software Engineer (Machine Learning) at HomeAway (an Expedia Group brand), leading Machine Learning initiatives to support growth marketing. Recently, he published a Packt video course about AWS SageMaker, "Hands-On Machine Learning Using Amazon SageMaker" (https://www.packtpub.com/application-development/hands-machine-learning-using-amazon-sagemaker-video). Additionally, he is the co-creator of Sagify, an open-source library that simplifies training, evaluating, and deploying ML models to SageMaker. In the past, he was an instructor at the MSc in Business Analytics course offered by Athens University of Economics and Business, teaching the applications of Machine Learning using big data technologies. He has a Master's degree in Computer Science from Imperial College London. Finally, Pavlos always seeks to apply and discover new Machine Learning theories and best practices. Sponsors: - Intracom Telecom : [ http://intracom-telecom.com/ ] - Channel VAS : [ http://channelvas.com/ ] - Glispa : [ http://glispa.com/ ] - Alba Graduate Business School, The American College of Greece : [ http://alba.acg.edu/ ] Schedule: 7:00 - Socialize 7:25 - Welcome 7:30 - 1st Talk 8:15 - 2nd Talk 9:00 - Drinks & Pizzas We are always looking for speakers for our meetups. If you would like to give a talk please drop us a line!

    4
  • Running Spark on Mesos & Using Big Data Science for Risk Management

    Hello wonderful big data developers and enthusiasts. We are excited to announce our next event that will take place on Tuesday, 27 November at 7pm. Our venue is once again the ground floor of The Cube Athens. Our speakers for the evening will be Christos Sidiropoulos (https://linkedin.com/in/chris-sidiropoulos-2a6b156a/) from Encode and Stelios Lelis (https://linkedin.com/in/stelios-lelis-78179a138/) from Channel VAS. In the first talk Christos will walk us through the process of running Spark on Mesos, whereas in the second talk Stelios will share with us the key role and business value of big data science on the field of risk management. Looking forward to seeing you there! Adrianos | Euangelos | Stavros Talks: 1st Talk: Running Spark on Mesos In this talk, we will go through setting up a Mesos cluster for scheduling spark jobs. We will use DC/OS, the Distributed Cloud Operating System, to set up our cluster and start executing jobs. We will cover cloud and on premises deployment options. We will see how we can scale our cluster and considerations specific to running and configuring spark jobs. We will also check the monitoring and logging provided. Some of the tools we are going to use: Terraform, Ansible, Cloudformation, Prometheus, Grafana, Docker. 2nd Talk: Risk Managing Half a Billion People: Putting Big Data Science in Action for Business Value Big Data Science lies at the core of many recent successful business ventures. Putting Big Data technologies together with effective Data Science to support core business functions is essential for such a success. In this talk we describe our experience, data flow, architecture, technological and algorithmic solutions of applying Big Data Science for risk managing half a billion people. Speakers: Christos Sidiropoulos is a Lead Devops Engineer at Encode (http://encodegroup.com/), where he is responsible for automating and optimizing the software delivery pipeline of Encode’s products. He has a strong background in *nix system administration and information security and likes to "keep things simple". He holds an M.Sc. in "Advanced Informatics and Computing Systems" from University of Piraeus. Stelios Lelis is a seasoned manager in credit risk, data, algorithms, AI, machine learning and business optimization fields. He has more than 15 years experience in building and guiding the development of result driven algorithms, models, and machine learning solutions in various application fields (e.g. online marketing, mobile content optimization, credit risk). Several of these have been incorporated or lie at the core of innovative products and services. He is currently leading Credit Risk, Big Data and Data Science at Channel VAS (http://channelvas.com/). He is responsible for the development of innovative credit risk models for nano- and micro-finance, and the risk management of more than 300 million active users and more than $2 billion USD in loans granted annually. Stelios holds a PhD in Informatics from the University of Manchester, an MSc in Computer Science and a BSc in Mathematics from the University of Crete. Sponsors: - Intracom Telecom : [ http://intracom-telecom.com/ ] - OpenBet : [ http://openbet.com/ ] Schedule: 7:00 - Socialize 7:25 - Welcome 7:30 - 1st Talk 8:15 - 2nd Talk 9:00 - Drinks & Pizzas We are always looking for speakers for our meetups. If you would like to give a talk please drop us a line!

    3
  • Athens Big Data Meetup 2018 - Workshop II

    ALBA Graduate Business School

    Hello wonderful big data developers and enthusiasts. We are happy to announce our second Workshop (Part II) for 2018, which will take place on Saturday June 23 at ALBA Graduate Business School (http://www.alba.acg.edu). We organize this workshop in partnership with Landoop (http://www.landoop.com). This time our agenda will be about Kafka! This is an event you don't want to miss! More about the event bellow. • Sessions Short Speech: ‘’Apache Kafka entering the streaming world via Lenses’’ What is Apache Kafka? What solution does it bring to a complex streaming world and how can you manage online in-motion data? Workshop : ‘’Let’s set things up’’ Time for hands-on experience. A workshop based on how to setup Apache Kafka and explore some tools and extensions. ********************** • Important to know There is limited availability of 25 seats (only the first 25 valid registrations will be accepted). Please fill the next registration form. A confirmation request message will be sent to first 25 registered members, that they have to reply back in order to be confirmed. • Registration Form : https://docs.google.com/forms/d/e/1FAIpQLSeIwuKZ7XsjpBFnDw83PP0z-1mH75qHoFy9IM4dtsTYRHYPEQ/viewform ********************** Instructor Bios: John is a software engineer at Landoop, aiming to enable organisations and developers move and extract value from their data through/enhancing our data streaming platform. Currently he is passing his days - and nights - testing our upgrades on Lenses streaming platform and Kafka connectors and APIs. John comes to share with you how Apache Kafka is entering a streaming world through Lenses in his own humorous and simple way. Marios, our Guru DevOps engineer, is Head of DevOps at Landoop, with numerous personal projects and serial entrepreneur. He takes for granted professionalism and commitment with his partners and team members. His motto? ‘’Do not panic!’’. Come and meet him introducing you to Apache Kafka through a hands-on workshop on setting up Kafka and using tools and extensions. ********************** • Prerequisites: Will be sent to you soon, stay tuned! • What to bring: - This is Bring Your Own Device (BYOD) event, so do not forget your device! - Basic knowledge of programming - Basic knowledge of Kafka. - You need to setup the environment in order to attend. Instructions will be provided. • Environment setup - Instructions for Environment Setup & Configuration : (details will be provided soon) • Sponsors - Intracom Telecom : [http://www.intracom-telecom.com/] - Landoop : [http://www.landoop.com/] - ALBA Business School : [http://www.alba.acg.edu/] • Workshop Agenda 10:30 - 10:45: Welcoming & Registration 10:45 - 11:00 Opening Remarks-Getting to know each other 11:00 - 11:45 Speech 11:45 - 12:00 Break 12:00 - 15:30/16:00 Workshop 16:00 - 17:00 Socializing

    4
  • Current Trends in Enterprise Data Science

    The Cube Athens

    Hello wonderful big data developers and enthusiasts. We hope this email finds everyone well! We are excited to announce our first event for this summer that will take place on Thursday, June 21st at 7pm. Our venue is once again the ground floor of The Cube Athens. Our speakers will be Nikolaos Vasiloglou (https://linkedin.com/in/vasiloglou/) and Christos Malliopoulos (https://linkedin.com/in/cmalliopoulos/) from MLTrain (https://mltrain.cc/), who will be talking to us about Current Trends in Enterprise Data Science, with a focus on data privacy, representation and neural computation frameworks. Adrianos | Euangelos | Stavros About the Talks: 1st Talk: Privacy, Security, and Ethics in Data Science When a data scientist works on datasets she/he is focusing on the scientific side of the problem. In many cases though data contains private and sensitive information that the data scientist might not even be allowed to see. In the first part of this talk we will explore methods and techniques of automatic preprocessing of the data that allow the data scientist to create models without direct access to private and sensitive data. In the second part we will explore the biases that ML algorithms can have that raise ethical issues. How do you check that your classifier is not a racist? 2nd Talk: Training Neural Networks with Enterprise Relational Data A basic property of enterprise data is their qualitative nature. We usually handle data that represent categories rather than quantities. This is not of major concern when we employ tree-type methods for inference (forests and gradient boosted trees). Model-based methods on the other hand generalize better than trees but are suited for scalar rather than categorical data. Traditionally we overcome this limitation by converting categories to binary variables but again, this increases dramatically the input dimensionality making the learning task harder. In the talk we explain vector embeddings of categorical variables and how we can use them to train a feed-forward neural network with Tensorflow. Speakers: Nikolaos Vasiloglou holds a Diploma in Electrical and Computer Engineering from the National Technical University of Athens and a PhD from the department of Electrical and Computer Engineering at Georgia Institute of Technology. His thesis was focused on scalable machine learning over massive datasets. After graduating from Georgia Tech he founded Analytics1305 LLC and Ismion Inc. He has architected and developed the PaperBoat machine learning library which has been successfully integrated and used in the LogicBlox and HPCCSystems platforms. Currently he works as a machine learning consultant for Symantec and Infor focusing on Google's TensorFlow and has been active in developing the syllabus for a series of TensorFlow training events. His work has resulted in patents and production systems. Christos Malliopoulos holds a Diploma in Electrical and Computer Engineering, an MSc (summa cum laude) in Probability and Statistics and a PhD (summa cum laude) in signal processing and machine learning, all from the National Technical University of Athens. He has worked as a research scientist at the "Institute for Language and Speech Processing" of "Athena research center" and, as BI specialist and later as the manager of the BI department of Hellenic Telecommunications Organization, a subsidiary of Deutsche Telekom AG. He has been a consulting contractor of the data science group of Logicblox Inc. Currently he works as a machine-learning consultant for Infor Inc. focusing on declarative numerical optimization frameworks and in-database machine learning. • Sponsors - Intracom Telecom : [http://www.intracom-telecom.com/] - MLTrain : [https://mltrain.cc/] - efood : [https://www.e-food.gr/] • Schedule 7:00 - Socialize 7:25 - Welcome 7:30 - 1st Talk 8:15 - 2nd Talk 9:00 - Drinks & Pizzas We are always looking for speakers for our meetups. If you would like to give a talk please contact.

    3
  • Athens Big Data Workshop Part I

    Microsoft Hellas SA

    Hello wonderful big data developers and enthusiasts. We are happy to announce our first Workshop (Part I) for 2018, which will take place at Saturday March 17 at headquarters of Microsoft Hellas, Marousi. We organize this workshop in partnership with Microsoft Hellas. This time our agenda will be about streaming processing in the context of Big Data. This is an event you don't want to miss! More about the event bellow. • Session 1) Big data/streaming data processing. Intro about streaming data, business case introduction. Definition of Lambda architecture for data processing. Creation of client(s) throwing data at high speed. 2) Real-Time data ingress Data ingress handling. Setting up Real-Time Integration with Kafka. 3) Streaming data processing Processing data using Apache Spark structured Streaming. ETL, Windowing, etc. Instructor: Jan Pospisil ********************** From Jan: Come to see how you can leverage on advanced data processing using advanced analytics tools like Spark, Kafka, Data Lake, Hadoop, Data Factory and others. We will walk you through real life like project and how to build solution using platform services on Azure. My passions are IoT gadgets, IoT solutions, Big Data, Machine Learning, Cognitive Services, New Technologies (even bleeding edge), e-commerce, robotics, automation, coding, ... Bio: Sr. Technology Evangelist @ Microsoft, SW a solution Architect, Developer, IoT & DIY Geek, Father, Husband, technocrat, ... Twitter: https://twitter.com/pospanet Linkedin: https://cz.linkedin.com/in/pospa ********************** • Important to know There is limited availability of 25 seats (only the first 25 valid registrations will be accepted). Please fill the next registration form. A confirmation request message will be sent to first 25 registered members, that they have to reply back in order to be confirmed. • Registration Form : https://docs.google.com/forms/d/e/1FAIpQLSeqKquFRcu_8BvTQqiAg9DTvlAs_yYDWw4ARUjpW1iWuAA4xQ/viewform • SW requirements: - Modern WEB browser - Visual Studio Code or any similar IDE - Python environment - .NET core environment - Azure subscription (You can use Azure Pass provided by Microsoft) - RDP client • What to bring: - This is Bring Your Own Device (BYOD) event, so do not forget your device! - Basic knowledge of programming - Basic knowledge of Spark, Hadoop, Kafka and Azure. - You need to setup the environment in order to attend. Instructions will be provided. • Environment setup - Instructions for Environment Setup & Configuration : (details will be provided soon) Workshop Agenda 10:00 - 10:10 Welcome speech 10:10 - 11:20 Session 1 11:20 - 11:30 Coffee Break 11:30 - 12:40 Session 2 12:40 - 12:50 Coffee Break 13.00 - 14.00 Session[masked] - 14.30 Pizza Time

    5
  • Machine Learning in Fintech & Blockchain meets Big Data

    Hello wonderful big data developers and enthusiasts. We hope this email finds everyone well! We are happy to announce our last (sixth) event for 2017 which will take place next Wednesday, December 27, 2017 at The Cube Athens. This time our agenda will be about Fintech and Blockchain in the context of Big Data. Both speakers have a long international experience, living and working abroad for many years, as well as a big expertise in a number of domains, including but not limited to Finance, Fintech, Machine Learning, Data Science, Blockchain, Data Engineering and Big Data. This is an event you don't want to miss! More about the talks bellow. The event will start at 7:00 PM and will take place at The Cube Athens (Kleisovis 8, Athens[masked], Greece). We are really looking forward to seeing you there and don't forget to spread the word! Adrianos (https://www.linkedin.com/in/adrianosdadis) | Euangelos (https://www.linkedin.com/in/eualin) | Stavros (https://www.linkedin.com/in/stavroskontopoulos) Agenda: 1st Talk: Challenges of ML-based solutions for credit risk assessment End-to-end fully automated machine-learning-based decision engines for credit risk assessment present several challenges. Scientific, engineering, financial, regulatory. In this presentation we will focus mainly on the scientific challenges and we will briefly touch upon some of the engineering ones. What data to collect and how to process them, what machine-learning algorithms to use, how to evaluate the models, how to achieve high performance, how to maintain and monitor the engines in production are a few questions that will be answered. If time permits, we will dive deeper into feature selection and data source evaluation, concept drift and reject inference mitigation, transfer, active and reinforcement learning, growth vs risk trade-off and pricing strategy. Konstantinos Papakonstantinou (https://www.linkedin.com/in/konstantinospapakonstantinou/) has a proven track record of managing data science teams and developing data products in high-tech and fintech organizations. At Kreditech (https://www.kreditech.com/), he built from ground up and is currently directing the "Data Science Lab", a team that has prototyped and productionized machine-learning-based solutions for credit risk management, product personalization, customer acquisition and retention. Konstantinos holds a MSc in Electrical Engineering from the University of Southern California and a PhD in Statistical Signal Processing from Telecom ParisTech. 2nd Talk: Beyond Bitcoin; Blockchain technologies and how they disrupt every industry Bitcoin proved that open-source software, with open data and open infrastructure, can manage trillions of dollars without a single glitch. In this presentation, we will investigate the technologies behind Bitcoin and Ethereum and we will see how they promise to disrupt every major industry by bringing significant benefits over traditional infrastructure. Dimitrios Kouzis-Loukas (https://www.linkedin.com/in/lookfwd/) studied Applied Mathematics and Physics in Athens and a decade later ended up designing low-latency distributed infrastructure for financial applications in New York. He enjoys working on challenging, high-impact problems and appreciates simple, elegant and pragmatic solutions. Schedule: 7:00 - Socialize 7:25 - Welcome 7:30 - 1st Talk 8:15 - 2nd Talk 9:00 - Drinks & Snacks We are always looking for speakers for our meetups. If you would like to give a talk please contact Adrianos (https://www.linkedin.com/in/adrianosdadis), Euangelos (https://www.linkedin.com/in/eualin), or Stavros (https://www.linkedin.com/in/stavroskontopoulos).

    3
  • Deep Learning: Introduction, Architectures and Use Cases

    Hello wonderful big data developers and enthusiasts. We hope this email finds everyone well! We are happy to announce our fifth event for 2017 which will take place this Monday, December 11, 2017 (just 3 days away!) at The Cube Athens (venue has changed!). This time our agenda will be a bit more broad hosting two interesting talks about deep learning. By doing so, we hope we can attract a bigger audience, including but not limited to, data scientists and machine learning engineers. After all, we believe that big data and data science go hand in hand and thus we hope that both presentations will be beneficial for engineers and scientists alike! More about the talks bellow. The event will start at 7:00 PM as usual, but this time in a difference place: The Cube Athens (Kleisovis 8, Athens[masked], Greece). We are really looking forward to seeing you there and don't forget to spread the word! Adrianos (https://www.linkedin.com/in/adrianosdadis) | Euangelos (https://www.linkedin.com/in/eualin) | Stavros (https://www.linkedin.com/in/stavroskontopoulos) Agenda: 1st Talk: Deep learning for content recommendation at high-scale Taboola is the world's leading content discovery platform. The challenge we face is selecting the most suitable content for each user in a given context, in less than a second and out of millions of available items. In this lecture we will discuss Taboola's high-scale, deep learning solution for content recommendation and its real world challenges. We will then dive into the solution architecture which combines neural networks and matrix factorization concepts and discuss some of our key challenges. Dr. Gil Chamiel (https://www.linkedin.com/in/gil-chamiel-1020185/) is a Director of Data Science and Algorithm Engineering at Taboola (https://www.taboola.com/). Gil holds a PhD in Computer Science (AI) from the University of New South Wales, Australia in the area of personalization and preference elicitation. He is a Taboola veteran and has been working on Taboola's core algorithmic engine for 7 years. 2nd Talk: Deep learning for modeling visual and textual modalities: research and applications In this talk we present an introduction of modeling visual and textual modalities using deep learning. We will talk about convolutional neural network (CNN) based architectures for object classification and image understanding, as well as recurrent neural networks (RNN) for text modeling and topic/sentiment classification. Finally, we will highlight indicative examples of applications for the case of recommender systems, e-commerce product categorization, image captioning and visual question answering. Dr. Theodorakis Stavros (https://www.linkedin.com/in/stheodorakis) is a co-founder and senior research engineer at DeepLab (http://deeplab.ai). DeepLab develops machine learning solutions for real-word applications and production systems, while bridging the gap between the research and industry. Stavros holds a PhD in machine learning from the National Technical University of Athens, Greece, and has worked as a research assistant in EU-funded research projects and applied machine learning solutions to real-world applications. Schedule: 7:00 - Socialize 7:25 - Welcome 7:30 - 1st Talk 8:15 - 2nd Talk 9:00 - Drinks & Snacks We are always looking for speakers for our meetups. If you would like to give a talk please contact with Adrianos (https://www.linkedin.com/in/adrianosdadis), Euangelos (https://www.linkedin.com/in/eualin), or Stavros (https://www.linkedin.com/in/stavroskontopoulos).

    2
  • Big data meets renewable energy / Lead scoring & grading

    ALBA Graduate Business School

    Hello wonderful big data developers and enthusiasts. We hope this email finds everyone well! We are happy to announce our forth event for 2017! This time, we welcome Stamatis Stefanakos (https://www.linkedin.com/in/stefanakos/) (Managing Director at D ONE) who will talk about the architecture and challenges of building big data and real time solutions in the energy sector. Our second speaker is Lefteris Mantelas (https://www.linkedin.com/in/lefterismantelas/) (Business Intelligence & Analytics at Beat) who will present us how lead scoring and lead grading can help sales team increase their revenues. At the end, MLTrain (https://mltrain.cc/) will speak us about a very promising Data Science and Machine Learning course (https://mltrain.cc/events/python-for-big-data-analytics-and-machine-learning-101/) in Athens. Our venue is the auditorium on the ground floor of ALBA Graduate Business School. The venue has around 110 seats but there is space for people to stand as well. Please RSVP early but do remember to keep your RSVP up to date to allow other people who would like to attend a chance to come if your plans change. We are really looking forward to seeing you there! Adrianos (https://www.linkedin.com/in/adrianosdadis) | Euangelos (https://www.linkedin.com/in/eualin) | Stavros (https://www.linkedin.com/in/stavroskontopoulos) Agenda: 1st Talk: Title: Big data meets renewable energy: Building a real time asset management platform for renewable energy How does one cope with some[masked] wind turbines worldwide, each delivering 100+ measurements per second? How do you guarantee profitable operations of an industry worth $110 billion investments globally, making it one of the fastest growing industrial segments in the world? In this talk, we discuss how WinJi built its TruePower Asset Management Platform. In particular, we will discuss the overall architecture and the motivation behind it (lambda architecture, data-vault, in-stream analysis), the physics behind the data (wind measurement corrections, in-stream calibration of turbine efficiency using neural networks) as well as the business case (analytics of lost production opportunities, predictive maintenance, expected power prediction & "day ahead" production forecast). Dr. Stamatis Stefanakos (https://www.linkedin.com/in/stefanakos/) is a managing director with D ONE, a premium business consultancy headquartered in Zurich. He advises his clients in data analytics focusing on architecture and strategy. He holds a PhD in theoretical computer science from the Swiss Federal Institute of Technology Zurich (ETH). He received his diploma from University of Patras in 2000. 2nd Talk: Title: Lead scoring & grading: A near live combined implementation in a SaaS company The combination of lead scoring and lead grading aims at optimising the usage of sales team resources. This is done by allocating to them the most qualified leads that have the best chances to match the company’s customer profile. This is a high-level view on how a near-live combined analytical approach was implemented in Workable using data from 7 different data sources. Dr. Lefteris Mantelas (https://www.linkedin.com/in/lefterismantelas/) is a BI & Analytics professional that recently moved to Beat (up to recently Taxibeat) from Workable. Before that, he had worked on consulting firms (EY & IRi) on analytics projects and in academia (FORTH). He holds a BA in Applied Mathematics from the University of Crete, an MSc in Geoinformatics and a PhD in Urban Growth Modelling from NTUA. For the last 10 years, he has been applying data analytics and modeling to solve problems and support intelligent business decisions on various areas and domains. Mini presentation: MLTrain (https://mltrain.cc/) will give a short introduction and presentation about a new Data Science and Machine Learning course (https://mltrain.cc/events/python-for-big-data-analytics-and-machine-learning-101/) in Athens. MLTrain, an educational endeavor of 'Ismion Inc.', focuses on machine-learning-in-practice by leveraging the content of recent academic research to meet the needs of industrial applications. MLTrain offers courses at different levels of detail, addressing varying audiences, from developers to business analysts and executives. It is our tenet that effective machine learning is done by understanding the potential, the effects and the practical implications of state-of-the art algorithms, and implement them using toolsets that are open and adopted by industry giants like Google and Amazon. MLTrain is based in Atlanta US and offers its services worldwide with a recent record of courses and workshops in the US, Australia and South Africa. Schedule: 7:00-7:25 - Socialising 7:25-7:30 - Welcome 7:30-8:05 - 1st Talk 8:15-8:50 - 2nd Talk 8:50-9:00 MLTrain mini presentation 9:00++ - Drinks and Pizzas Sponsors: A massive thank you to our sponsors: Wanna Join? We are always looking for speakers for our meetups. If you would like to give a talk this year please contact with Adrianos (https://www.linkedin.com/in/adrianosdadis), Euangelos (https://www.linkedin.com/in/eualin), Stavros (https://www.linkedin.com/in/stavroskontopoulos).

    4
  • Riding the Streaming Wave with Kafka / Analytics Beyond RAM Capacity with R

    Hello wonderful big data developers and enthusiasts. We hope this email finds everyone well! We are happy to announce our third event for 2017! This time, we welcome Konstantine Karantasis (https://github.com/kkonstantine) (Software Engineer at Confluent Inc (https://www.confluent.io)) who will talk about how to build streaming pipelines with Apache Kafka (https://kafka.apache.org/) & Confluent Open Source (https://www.confluent.io/product/confluent-open-source/) tools. Our second speaker is Alex Palamides (https://www.linkedin.com/in/alex-palamides-622a5ba/) (Data Scientist at Clayton Euro Risk (http://www.claytonerm.com)) who will present us the Microsoft R Server (https://www.microsoft.com/en-us/cloud-platform/r-server) solution and how to perform big data operations with it. At the end, Landoop (http://landoop.com/) engineers will show us their new powerful framework Lenses (http://landoop.com/kafka-lenses/) for Apache Kafka ™, in a mini presentation. Our venue is the auditorium on the ground floor of ALBA Graduate Business School. The venue has around 110 seats but there is space for people to stand as well. Please RSVP early but do remember to keep your RSVP up to date to allow other people who would like to attend a chance to come if your plans change. We are really looking forward to seeing you there! Adrianos (https://www.linkedin.com/in/adrianosdadis) | Euangelos (https://www.linkedin.com/in/eualin) | Stavros (https://www.linkedin.com/in/stavroskontopoulos) Agenda: 1st Talk: Title: Riding the Streaming Wave DIY style: Using & Building Kafka Connect Plugins with Confluent Open Source Stream processing is changing the way companies organize their data systems architecture and respond to events critical to their business. In this talk, we'll review how software available with Confluent Open Source can help you hit the ground running when integrating your data systems to Apache Kafka. We'll see how Kafka Connect API can be leveraged to do the heavy lifting at scale and how new tools in Confluent Open Source help you use, test and even develop Kafka Connect plugins. Konstantine Karantasis (https://github.com/kkonstantine) is a Software Engineer at Confluent, Inc. working from Palo Alto, CA. He's the main contributor to open source projects such as the Confluent S3 Connector, classloading isolation in Apache Kafka Connect, Confluent CLI and many more. Previously, he built open source web-services for big data at Yahoo and did HPC research at the University of Illinois at Urbana-Champaign. Konstantine holds a Ph.D. from the University of Patras. 2nd Talk: Title: Analytics Beyond RAM Capacity: The Microsoft R Server Solution R is a language and environment for statistical computing and graphics which was developed at Bell Laboratories and is considered one of the default choices for a data scientist. However as by design all computations take place in RAM, it suffers from memory limitations in big data applications. Microsoft R Server (MSR) on the other hand by utilizing RevoScaleR package capabilities follows a different approach; Datasets are stored on the disk and computations are performed into chunks of data, therefore data is inherently distributed. However as most open-source R algorithms require the whole data frame loaded into RAM, the first challenge is to process distributed data indirectly utilizing open-source R algorithms. On the other hand in the MSR most common data operations (manipulation and analysis) are supported by counterpart functions. Moreover the inherently parallel processing makes deployment to a production environment such as SQL Server or on HDFS relative easy. MSR runs either in standalone mode, either within the SQL Server branded as R Services. Dr. Alex Palamides (https://www.linkedin.com/in/alex-palamides-622a5ba/) is a data scientist in Clayton Euro Risk, deploying risk and marketing models in the banking sector mainly programming with R. Previously he was with IRI (EU and US), with the European Space Agency and in various consulting roles. He holds a BSc in Electronic and Computer Engineering from the Technical University of Crete and a PhD in Computational Statistics from the University of Peloponnese. Mini presentation: Landoop will give a short introduction and presentation. Landoop last week, during the Kafka Summit (San Francisco) announced their new powerful framework Lenses (http://landoop.com/kafka-lenses/) for Apache Kafka ™ (a visual interface for interactive queries on Kafka topics via Kafka SQL). Landoop is a company based in London, Amsterdam and Athens. Schedule: 7:00-7:15 - Socialising 7:15-7:20 - Welcome 7:20-8:05 - 1st Talk 8:10-8:55 - 2nd Talk 9:00-9:10 - Landoop mini presentation 9:10++ - Drinks and Pizzas Sponsors: A massive thank you to our sponsors: Wanna Join? We are always looking for speakers for our meetups. If you would like to give a talk this year please contact with Adrianos, Euangelos or Stavros.

    5
  • Scaling data pipelines at Connected Home / Master BigQuery & Redshift by Blendo

    Hello wonderful big data developers and enthusiasts. We hope this email finds everyone well! We are happy to announce our second event for 2017! This time, we welcome Angelos Petheriotis (https://gr.linkedin.com/in/apetheriotis) (Big Data Engineer at Connected Home (https://www.hivehome.com/) of the Centrica (https://www.centrica.com/) / British Gas (http://www.britishgas.co.uk/)) who will present us the lessons learned during the design of pipelines that handle billions of messages a day through the use of Kafka Connect and Kafka Streams. Our second speaker is Kostas Pardalis (https://gr.linkedin.com/in/kostaspardalis) (Co-Founder and Software Engineer at Blendo.co (https://www.blendo.co/)) who will talk about the pros and cons of Google BigQuery and Amazon Redshift, two very popular analytical data warehousing technologies. Our venue is the auditorium on the ground floor of ALBA Graduate Business School. The venue has around 110 seats but there is space for people to stand as well. Please RSVP early but do remember to keep your RSVP up to date to allow other people who would like to attend a chance to come if your plans change. We are really looking forward to seeing you there! Adrianos (https://www.linkedin.com/in/adrianosdadis) | Euangelos (https://www.linkedin.com/in/eualin) | Stavros (https://www.linkedin.com/in/stavroskontopoulos) Agenda: 1st Talk: Processing 4 Billion Messages a Day: Lessons Learned Designing a pipeline that handles billions of messages from IoT devices offers exciting challenges to engineers. The system needs to operate at scale and recover from failures seamlessly in order to reliably deliver content to the rest of the company and the customers. In this talk we are analyzing how the Connected Home data back-end has been designed as an Event Based system running on top of Kafka. Furthermore we are going to describe why we are replacing our Spark pipelines with Kafka Connect and Kafka Streams and the tools we use around this new ecosystem. We are going to conclude with describing how we collaborate with our data scientist team, how theirs models get into production pipelines and what lessons learned from our journey in implementing and operating the system. Angelos Petheriotis is a Scala enthusiast who enjoys working on fast-data projects and particularly using the Spark and Kafka ecosystem. His passion is mostly on back-end development, mostly for high-performance, distributed and scalable systems. He has significant exposure in writing multithreaded applications and he has been involved in the analysis and re-design of systems in order to improve performances. 2nd Talk: Amazon Redshift Vs Google BigQuery Some of the most interesting innovation in cloud computing is taking place in the space of analytical data warehousing. Google and Amazon are leading the race with Redshift and BigQuery respectively. In this presentation we will go through the pros and cons of both technologies, point out their similarities and differences and see how these affect the life of both data engineers and data analysts. Blendo is a new breed of integration-as-a-service platforms that enables companies to extract a multiple of data (sales, marketing, product, customer support, etc) from different cloud services, integrate it and load it into their own cloud-based data warehouses for analysis. Schedule: 7:00-7:15 - Socialising 7:15-7:20 - Welcome 7:20-8:05 - 1st Talk 8:10-8:55 - 2nd Talk 9:00++ - Drinks and Pizzas Sponsors: A massive thank you to our sponsors: Wanna Join? We are always looking for speakers for our meetups. If you would like to give a talk this year please contact with Adrianos, Euangelos or Stavros.

    6