• Data Science Salon - Travel, Finance & Technology

    Online event

    PyData Berlin is collaborating with Data Science Salon to promote their events which are divers in topics and speakers. NOTE: you need to register on their website, RSVP to this event is not a ticket to the events. From DSS: The data science salon is a unique vertical focused conference which grew into a diverse community of senior data science, machine learning and other technical specialists. We gather face-to-face and virtually to educate each other, illuminate best practices and innovate new solutions in a casual atmosphere. The December edition is on Applying AI & Machine Learning To Finance & Technology, a 4 day conference on December 8-11. Please register for free here: https://www.datascience.salon/travel-finance-and-technology/

  • PyData Berlin November Meetup

    Online event

    Welcome to the November Virtual Meetup The talks will start at 19:00 The link to the Zoom meeting will be sent to all attendees about an hour before the meetup and there will be a YouTube live stream for those not on the zoom call. Talk 1 by Rebecca Raper Title: Can we teach machines how to be moral? Abstract: Morality is often seen as something distinctly human; in that it would be impossible for a machine to make moral decisions. In this presentation, combining philosophy, cognitive science, and computing, I introduce the area of Machine Ethics: the pursuit to make machines that can decide between right and wrong. I present motivations for creating such machines, and discuss the idea that a teaching model might be the key to their creation Abstract: Bio: Rebecca Raper is a final year PhD candidate and senior consultant in Ethical AI at Oxford Brookes University. She has a background in philosophy and psychology, but is now working within computing, tackling the area of machine ethics to try and find a suitable way to teach machines how to become moral. Combining developmental psychology with ethics and machine learning techniques, she is working on developing a system that is taught morals from a role model. Her interests include formal logic, metaphysics and cognitive science. She also has a wider interest in Ethical AI and has been working to develop a Risk Classification System for AI Systems as a consultant in The Institute for Ethical AI. Talk 2 by Klaus G Paul and Álvaro Corrales Cano Title: Analyses of the Economic Impact of COVID-19 Abstract: As part of a not-for profit activity led by Emergent Alliance, a joint team from Rolls-Royce and IBM have performed a variety of analyses around COVID-19, from health, public opinion, impact on tourism to the impact on the economy. Most blog entries from https://emergentalliance.org/?page_id=1128 are from us, most of our work is on https://github.com/emergent-analytics/workstreams. Description We aim to supply governments and businesses with insights which could be used to support decision making during the Covid-19 crisis. Some of our developments include: - A risk index to evaluate the public health risk of a region. - A pulse dashboard to understand the sentiment about covid expressed on media how travelling has changed. - An economic simulation engine to model the impact of shocks in the different sectors of the economy. - A labelling tool to understand what happened so far, which measures were put in place, and how the economy reacted We are a team of data scientists from IBM’s Data Science & AI Elite Team, IBM’s Cloud Pak Acceleration Team, and Rolls-Royce’s R² Data Labs working on Regional Risk-Pulse Index: forecasting and simulation within Emergent Alliance. Have a look at our challenge statement, our articles on the blog, and our github repository! Álvaro Corrales Cano is a Data Scientist at IBM’s Cloud Pak Acceleration team. With a background in Economics, Álvaro specialises in a wide array Econometric techniques and causal inference, including regression, discrete choice models, time series and duration analysis. Klaus G. Paul is a Data Scientist and the Berlin AI Hub and Emerging Technologies Capabilities Lead for R²Data Labs at Rolls-Royce Deutschland. As an aerospace engineer, Klaus has a background in condition monitoring, big data systems, predictive analytics, and a passion for intelligent assistants that support decision making. ---------------------------------------------------------------------------------------------------- NumFOCUS Code of Conduct https://numfocus.org/code-of-conduct Please have a look at the comment section for the short version of our Code of Conduct.

    1
  • Data Science Salon - Retail & Ecommerce

    Online event

    PyData Berlin is collaborating with Data Science Salon to promote their events which are divers in topics and speakers. NOTE: you need to register on their website, RSVP to this event is not a ticket to the events. From DSS: The data science salon is a unique vertical focused conference which grew into a diverse community of senior data science, machine learning and other technical specialists. We gather face-to-face and virtually to educate each other, illuminate best practices and innovate new solutions in a casual atmosphere. The November edition is on Applying AI & Machine Learning To Retail & Ecommerce, a two day conference on November 17-18. Please register here: https://www.datascience.salon/retail-and-ecommerce/ There are a few free tickets available with promotion code PyDataBER, first come first serve.

  • PyData Berlin October Meetup

    Online event

    Welcome to the October Virtual Meetup The talks will start at 19:00 The link to the Zoom meeting will be sent to all attendees about an hour before the meetup and there will be a YouTube live stream for those not on the zoom call. Talk 1 by Limor Gultchin : Long Story Short: Using BERT for abstractive text summarization on a small, curated corpus Abstract: Machine Learning provides a myriad of exciting new ways to extract and analyze data from the ever growing number of information sources we have today. While the internet indeed provides vast amounts of high-quality data, a lot of information is still enclosed in documents, and PDF documents in particualr. To unlock their potential, OCR and other information retrieval tools already provide a convenient way to extract knowledge from well structured files. An obstacle remains in the realm of tables: the format in which most quantitive information in documents are stored. And tables, while they can be extracted, usually only make sense in the context of their original document. In this talk, I'm going to share my experience working on a project to automatically compose informative table titles, using the powerful NLP model BERT, and connect the task to generative abstractive text summarization, for a specialized domain with limited amounts of data. Bio: Limor is a PhD student in Machine Learning and Causal Inference at the computer science department in the University of Oxford, and at the Alan Turing Institute. Her current research interests are in Causal Inference in the service of Responsible ML, but previously she worked on Natural Language Processing, ML for social science research and computational humor. Limor will be very happy to discuss any of those topics in the Q&A. Talk 2 by Arnault Chazareix: Building a NLP pipeline to detect relationship between fictional characters Abstract: Who's this guy again ? Every time we start watching the new season of our favorite show, we find ourselves asking this question. What if we didn't need to binge watch all the previous seasons ? What if we could just look at a graph summarizing all the characters and their relationship to one another ? All this information is available on the Internet, but there is no easy way to use it because the data is unstructured. This talk will demonstrate how to use Natural Language Processing to extract a relationship graph from any TV show Bio: Arnault studied at Centrale Paris (a french engineering Grande Ecole) in Computer Science & AI. He interned as a NLP data scientist at Feedly in Palo Alto. Nowadays Arnault is working as a lead data scientist at Sicara, a Data consulting startup specialized in Computer Vision. He specializes in Detection and Few-Shot Learning, and building great data sets. He is interested in the ability to transform unstructured "human" data (text, images, video, sound...) into structured data. ---------------------------------------------------------------------------------------------------- NumFOCUS Code of Conduct https://numfocus.org/code-of-conduct Please have a look at the comment section for the short version of our Code of Conduct.

    16
  • PyData Berlin September Meetup

    Online event

    Welcome to the September Virtual Meetup The talks will start at 19:00 The link to the Zoom meeting will be sent to all attendees about an hour before the meetup and there will be a YouTube live stream for those not on the zoom call. Talk 1 by Alexis Toumi : Language Processing on Quantum Hardware with DisCoPy Abstract: String diagrams are a mathematical tool that describe the information flow both in categorical quantum mechanics (CQM) and natural language processing (NLP). DisCoPy (https://github.com/oxford-quantum-group/discopy) is a Python implementation of string diagrams, which we used to translate the grammatical structure of natural language onto the architecture of a quantum circuit: a proof-of-concept for quantum natural language processing (QNLP). I’ll give a short introduction to the category theory behind DisCoPy, followed by a demo where I’ll teach a quantum computer how to say “Alice loves Bob”. Bio: I’m a PhD student at the Quantum Group of the Computer Science depatment in Oxford, under the supervision of Prof. Bob Coecke. I work at the intersection of natural language processing, quantum computing and category theory (a.k.a. general abstract nonsense). I’m also a Python programmer with some background in data science and deep learning for medieval paleography. Talk 2 by Procheta Sen: Socially Responsible AI: Cognitive Bias-Aware Multi-Objective Learning Abstract: Human society had a long history of suffering from cognitive biases leading to social prejudices and mass injustice. The prevalent existence of cognitive biases in large volumes of historical data can pose a threat of being manifested as unethical and seemingly inhumane predictions as outputs of AI systems trained on such data. To alleviate this problem, we propose a bias-aware multi-objective learning framework that given a set of identity attributes (e.g. gender, ethnicity etc.) and a subset of sensitive categories of the possible classes of prediction outputs, learns to reduce the frequency of predicting certain combinations of them, e.g. predicting stereotypes such as `most blacks use abusive language', or `fear is a virtue of women'. Our experiments conducted on an emotion prediction task with balanced class priors shows that a set of baseline bias-agnostic models exhibit cognitive biases with respect to gender, such as women are prone to be afraid whereas men are more prone to be angry. In contrast, our proposed bias-aware multi-objective learning methodology is shown to reduce such biases in the predicted emotions. https://arxiv.org/pdf/2005.06618.pdf ---------------------------------------------------------------------------------------------------- NumFOCUS Code of Conduct https://numfocus.org/code-of-conduct Please have a look at the comment section for the short version of our Code of Conduct.

    8
  • PyData Berlin August Meetup

    Online event

    Welcome to the August Virtual Meetup The talks will start at 19:00 The link to the Zoom meeting will be sent to all attendees about an hour before the meetup and there will be a YouTube live stream for those not on the zoom call. Talk 1 by Pan Kessel: Can explanations be trusted? Abstract: Explanation methods in Machine Learning are on the rise. This is unsurprising as they promise to provide a tool to make blackbox algorithms transparent. This, in turn, can lead to increased trust and reliability. Furthermore, explanation methods are very simple to deploy as they are now integrated in standard deep learning libraries. In this talk, I will however demonstrate that explanations have to be considered with care. This is because they can be easily manipulated to closely reproduce an almost arbitrary target explanation. The underlying mechanisms for this surprising degree of manipulablity can be theoretically understood using the mathematics of the General Theory of Relativity, i.e. Differential Geometry. Reference: https://papers.nips.cc/paper/9511-explanations-can-be-manipulated-and-geometry-is-to-blame Bio: Pan Kessel is a member of the machine learning group at Technische Universität Berlin. He received his PhD in String Theory at the Max Planck Institute for Gravitational Physics. His main research interests currently are theoretically grounded explainable AI, generative models and their application to quantum physics, and the theory of learning. Talk 2 by Manas Gaur and Kaushik Roy: Knowledge-infused Statistical Learning for Social Good Applications Abstract: Humans are able to provide symbolic knowledge in structured form for potential use by an AI system in learning human desirable concepts. In clinical settings for instance, prediction of patient outcomes by an AI can be guided by knowledge from patient history. This history contains concepts such as treatment information, observational and drug-related information, mental health condition, and severity of disease/disorder. Additionally, there is also often a certain graphical structure to the knowledge among the concepts, for example, "patient symptoms cause certain tests to be taken", which in turn affects prescription of medication. This type of structure between human interpretable concepts contained in knowledge can aid the AI to an informed prediction. References: http://kidl2020.aiisc.ai/ http://wiki.aiisc.ai/index.php/Main_Page Bio: Manas Gaur is currently a Ph.D. student in the Artificial Intelligence Institute at the University of South Carolina. He has been Data Science and AI for Social Good Fellow with the University of Chicago and Dataminr Inc. His interdisciplinary research funded by NIH and NSF operationalizes the use of Knowledge Graphs, Natural Language Understanding, and Machine Learning to solve social good problems in the domain of Mental Health, Cyber Social Harms, and Crisis Response. His work has appeared in premier AI and Data Science conferences (CIKM, WWW, AAAI, CSCW), journals in science (PLOS One, Springer-Nature, IEEE Internet Computing), and healthcare-specific meetings (NIMH MHSR, AMIA). Personal Webpage: https://manasgaur.github.io/ Kaushik Roy is currently a Ph.D. student in the Artificial Intelligence Institute at the University of South Carolina. He completed his master's in Computer Science at Indiana University Bloomington and has worked at UT Dallas’s starling lab. His research interests include Statistical Relational Artificial Intelligence, Knowledge graphs, Machine Learning, and Reinforcement Learning. His work has been featured at reputed conferences (IEEE, KR). ---------------------------------------------------------------------------------------------------- NumFOCUS Code of Conduct https://numfocus.org/code-of-conduct Please have a look at the comment section for the short version of our Code of Conduct.

    15
  • PyData Berlin July Meetup

    Online event

    Welcome to the July Virtual Meetup - this time cross-linked with PyData Jeddah https://www.meetup.com/PyData_Jeddah/ The talks will start at 19:00 The link to the Zoom meeting will be sent to all attendees about an hour before the meetup and there will be a YouTube live stream for those not on the zoom call. We have two great speakers for the upcoming meetup, as well as an informative PyData break in between. The Schedule for the evening: Talk 1: 35 mins Title: Quantum Machine Learning for Programmers speaker: Dr. Maria Schuld Abstract: Algorithms that run on quantum computers - so-called quantum circuits - underlie different laws of information processing than conventional computations. By optimizing the physical parameters of quantum circuits we can use them like neural networks, and train circuits to generalize from data. This talk highlights different aspects of such "variational quantum machine learning algorithms", including their role in the development of near-term quantum technologies, their connection to classical machine learning, and strategies of fitting the quantum model to data. The theory will be illustrated by code examples from the python-based open-source software framework "PennyLane" throughout the talk. Bio: Maria Schuld works as a researcher for the Toronto-based quantum computing start-up Xanadu, as well as for the Big Data and Informatics Flagship of the University of KwaZulu-Natal in Durban, South Africa. She received her PhD from the University of KwaZulu-Natal in 2017 for her work on the intersection of quantum computing and machine learning, which was published as the book "Supervised Learning with Quantum Computers" (Springer, 2018, co-authored by F. Petruccione). Besides her physics background Maria has a postgraduate degree in political science, and a keen interest in the interplay of emerging technologies and society. Talk 2: 35 mins Title: Application of NLP in the UK rail Industry Abstract: This talk will describe how the python NLP ecosystem has been used to help with maintenance of the UK stations lifts and escalators. By the end of this talk, members will learn how to convert an unsupervised NLP problem to a semi-supervised one. Alongside this, parallelism with Python multithread processing will be explored on how it was used to spell check and clean 150,000 maintenance reports. The talk will also explore when to build ML models and when to use good-old pure python for solving NLP problems Bio: Ali Parandeh is a Chartered Engineer and Microsoft certified data scientist (MCADSA) with 5 years of engineering consulting experience in the rail industry and he is the Founder and instructor at Beginners Machine Learning Group in London. He uses data science and machine learning technologies to help transport clients make key decisions by developing data analytics and predictive solutions ---------------------------------------------------------------------------------------------------- NumFOCUS Code of Conduct THE SHORT VERSION Communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery is not acceptable This meetup is a strictly professional meetup, and the default assumption is that people are not looking for dates/partners here Sending an individual message on any digital platform should ideally be done with their explicit consent, and if there's a message you attach to a connection request, or a conversation you may have afterwards should be kept strictly professional, unless you have their explicit consent NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form Thank you for helping make this a welcoming, friendly community for all https://numfocus.org/code-of-conduct

    6
  • PyData June 2020 Virtual Meetup

    Online event

    6th meetup in 2020 and again we will make it remote! We will start with the talks at 19:00. The link to the Zoom meeting will be sent to all attendees about an hour before the meetup and there will be a YouTube live stream for those not on the zoom call. We have two great speakers for the upcoming meetup, as well as an informative PyData break in between. Hope to you all there. Talks: Karol Przystalski Computer vision methods for skin cancer recognition Pattern recognition of images is one of the most popular approaches that is used in machine learning solutions supporting medical doctors. We show to use image processing methods to do simple image analysis to find skin cancer cases. In the next step, we use neural networks and simple white-box methods to recognize skin cancer patterns on multilevel images. Karol Obtained a PhD degree in Computer Science in 2015 at the Jagiellonian University in Cracow. CTO and founder of Codete. Leading and mentoring teams of Codete. Working with Fortune 500 companies on data science projects. Built a research lab that is working on machine learning methods and big data solutions in Codete. Give speeches and workshops in German and English in data science with a focus on applied machine learning. https://github.com/codete/PyData-Berlin ------------------- Andrés Ruiz (https://www.linkedin.com/in/andres-ruiz-montanez) Introduction to Time Series Analysis and Forecast When dealing with time as an independent variable, it is valuable to comprehend the effects that the underlying relationships among the features and time may have on our analysis. This presentation aims to introduce the main concepts behind time series analysis and to provide the audience with a basic understanding of the processes and techniques. Initially, key definitions as stationarity, patterns, autocorrelation, and lag variables will be explored. Afterward, a brief overview of the differences and factors to take into consideration while preparing and modeling time-dependent data. And finally, a high-level overview of the most relevant models as Naïves, AR/MA and ARIMAX, traditional multiple regression approach, and Long short-term memory (LSTM) artificial recurrent neural network (RNN). The presentation focuses on the theoretical aspect of time series and will not approach the technical, and implementation aspects in depth. Andrés is a Colombian Architect, based in Berlin while pursuing an M.Sc. in Project Management and Data Science. He is a self‐taught developer with more than 8 years of experience in the digital ecosystem and entrepreneurship. Currently working as Chief Technology Officer at Aequales, a fast-growing company dedicated to providing tools for the closing of gender gaps in the workplace through technology and data. He is the lead developer of the Ranking PAR Platform which is a measurement tool of the gender equality conditions of organizations in Latin America. The PAR Ranking provides an internal report of gender equality to each participating company, the ability to compare itself with more than 800 of the biggest companies across the continent, and see their progress over time. ------- NumFOCUS Code of Conduct Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS. All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery is not appropriate. NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form. Thank you for helping make this a welcoming, friendly community for all. If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct

    1
  • PyData May 2020 Virtual Meetup

    Online event

    Fifth meetup in 2020 and again we will make it remote! We will start with the talks at 7 pm. The link to the Zoom meeting will be sent to all attendees about an hour before the meetup. We have two great speakers for the upcoming meetup. Talks: Alon Nir: Getting an Edge with Network Analysis with Python Networks are all around us. People, places, things and even ideas are inter-connected in innumerable networks, and these can have a great (yet sometimes inconspicuous) impact on our lives. The purpose of this talk is to provide an introduction to network analysis and its importance, and present the basic building blocks for applied network analysis with Python (using the friendly NetworkX library). I hope this talk will encourage members of the audience to consider network analysis approaches in their line of work/research and intrigue them to learn more. The talk will weave theory and practice, and given the limited time compared to the breadth of the topic, the focus would lean towards intuitive understanding of concepts and seeing them in practice, over deep theoretical formulations and hardcore mathematics. Alon's Bio: I'm a London based senior data scientist. Currently I am a data science lead at Deliveroo, working on Plus, the company's premium subscription service. Come geek out on data with me at https://www.linkedin.com/in/alonnir/ or twitter.com/alonnir ------- Alan Akbik: The Flair Framework for Text Analytics and NLP Research Abstract: This talk gives an overview of the Flair Framework for Natural Language Processing (NLP). It's two main features are that (1) it is very easy to use and (2) that it gets state-of-the-art accuracies on many NLP tasks. I'll give an introduction from the practitioner's side and show how Flair can be used for tasks such as Named Entity Recognition or Sentiment Analysis on your data, and show how you can train your own models. I'll also briefly cover research aspects of the framework, such as learning word and sentence representations with neural language modeling, and discuss future directions. Alan's Bio: Alan recently joined the Humboldt-Universität zu Berlin as professor of machine learning. His group focuses on natural language processing (NLP), i.e. methods that enable machines to understand human language. His research is made available in form of the open source NLP framework Flair that allows anyone to use state-of-the-art NLP methods in their research or applications. Before that, he spent many years in industrial research labs, first at IBM Research in California, then at Zalando Research in Berlin. He completed his PhD in 2015 at the TU Berlin, vowing to never return to academia again. Go ahead and check out https://github.com/flairNLP/flair ------- NumFOCUS Code of Conduct THE SHORT VERSION Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS. All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery is not appropriate. NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form. Thank you for helping make this a welcoming, friendly community for all. If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct ------- Lightning talks: Lightning talk slots are still open. Give us a message on meetup, email us at [masked] or just ask us at the event if slots are still open. Looking forward remotely to seeing you at the April meetup! We hope this all works out for everybody.

    7
  • PyData April 2020 Virtual Meetup

    Online event

    Fourth meetup in 2020 and this time we try to make it remote! We will start with the talks at 7 pm. After a lovely experience in March we will have another virtual meetup. The link to the Zoom meeting will be sent to all attendees about an hour before the meetup. Talks: Georgios Ntanakas & Gaiar Baimuratov: String Similarity Matching via Distributed Cloud Computations This talk is about how Applift tackled the problem of matching fuzzy app names with apps in official app stores at a large scale. To do so, an algorithm that employs TF-IDF vectorization and cosine similarity was deployed on Kubernetes using Dask to parallelize computations. ------- Eoin Murray: Managing organizational data-science knowledge remotely In this talk I will go through the benefits and setup of a system for sharing knowledge in a company or research group. I will talk about how to structure a process of using Jupyter notebooks to run experiments and then share results with all stakeholders, including non-technical as well technical members of the team. Speaker Bio: Currently co-founder of Kyso, previously founded Rinodrive and sold it in 2019 to Integumen. Built quantum computers when I was a Marie Curie fellow at Cambridge & Toshiba. Before that I worked at Tyndall stochastically modelling quantum systems. ------- Lightning talks: Lightning talk slots are still open. Give us a message on meetup, email us at [masked] or just ask us at the event if slots are still open. Looking forward remotely to seeing you at the April meetup! We hope this all works out for everybody.

    9