Turning story based insights into meaningful BIG DATA solutions
Details
Target Audience: Developers, Software Engineers, Data Engineers, Algorithm Developers
Abstract:
PayPal is the leading online payments system. PayPal has more than 165 million active consumers globally and it is active in 203 countries. Every day PayPal process 11 Million payments and more than 2 Billion events.
PayPal’s Risk organization is a world leader in fraud prevention, securing 100’s of millions of users globally as they shop safely online - using state of the art machine learning algorithms and huge amounts of data, in real time.
- “I’m not a number I’m a person” -
Many PayPal customers have more than one account. In order to reach a decision regarding customer transactions I need to be able to look at all accounts that belong to the same real world person. However, matching users poses a series of analytical and technical challenges
In this session, we’ll cover the lesson learnt from building a graph based entity resolution system and discuss Paypal’s risk unique analysts-developer side by side development methodology.
- Challenges in developing and deploying scikit-learn based classification algorithm on a 1000+ nodes cluster
Till recently, PayPal’s risk used a home grown classification algorithm to classify Paypal sellers’ website which was developed in 2010. However, data-mining/ML has rapidly developed during recent years.
SVM (Support vector machines) was chosen as the new classification algorithm to be executed through scikit, a python based library to run ML algorithms.
In this session we’ll go over the problem we tried to solve in replacing the home grown algorithm, the technical and analytical challenges we faced in developing and deploying python based classification algorithm on a 1000+ nodes cluster.
- “If only pigs could fly” – Paypal contribution to the Pig-Eclipse plugin and other examples of contribution to the open source community in Paypal
What you will learn (bullet list):
· How Paypal use behavioral analytics to transform data into insights and observations about people, businesses, devices and their habits
· How to deploy and run a python based ML algorithm on Hadoop.
· Job orchestration and flow management for Big data applications
· QA for Big data application
Presenters bio:
o Chen Kovacs - https://il.linkedin.com/in/chenkovacs
o Lior Ebel - https://www.linkedin.com/in/liorebel
o Eyal Allweil - https://il.linkedin.com/in/eyalallweil
