Making sense of Customer Feedback!! & Faster, Better Event Stream processing...
Details
Session-1: The site feedback link in the footer of most PayPal pages is a key conduit for customers to share their thoughts and concerns with us. FPTI is our de facto source for user behavior data. This talk describes how they are used in combination to build a flexible analytics and reporting framework that processes these comments on a daily basis and provides invaluable reporting, analytics and insights to the business.
The feedback comments are first pre-processed and additional meta-data added such as NPS and product information. Text mining and cluster analysis are then employed to identify key themes and construct initial taxonomies. The text-mining is based on a simple vector-space model, where an appropriate feature vector, including single words, ngrams and other meta-data, is constructed. A ‘TF-IDF’ matrix is generated with a suitable weight function and a singular value decomposition applied to reduce the dimensionality. K-means clustering is then used to identify the ‘important’ taxonomies or comment themes.
The reporting framework consists of email alerting and web-based charting, both with drill down to the actual comments by product and comment theme. Pareto and time-series graphs are available on-line, with built in statistical process control for exception alerting and trended regression.
About the Speaker: Mark Scarr is currently a Statistician and sometimes Data Scientist at PayPal and has previously held similar positions at Intel Corp. and Yahoo Inc. As an avid consumer and advocate of PayPal FPTI data, he enjoys, amongst other things, exploring new ways to leverage the data and build cool data products for the business to use.
----------------
Session -2: Scalable, High-Performance, locally persisted queuing library to reliably process heterogeneous event streams with minimal Data loss.
While building a platform for ingesting streaming events from different sources into our data platform we realised the need to have some kind of locally persisted queues at each handover point to provide a high level of data reliability. Events were being routed via multiple tenants and networks, each of which had its own availability characteristics, causing some loss of data at various of these touch-points. To start with we experimented using a JMS messaging framework as the relay for collecting the events, but there were a few issues which caused us to rethink this approach. The result was out SnF (Store & Forward) framework which we'll discuss during this session.
About the Speaker: Geeta Iyer is currently a Software Engineer with the FPTI team @ PayPal and is responsible for the end to end ingestion pipeline, right from where the events originate till the Analytics piece. Prior to joining PayPal Geeta was part of the Grid Data Management at Yahoo! Inc
