Princeton Data Science - Shivani Patel, IntegriChain


Details
I'm pleased to announce our first speaker will be Shivani Patel, Director, Advanced Analytics, at IntegriChain.
IntegriChain (http://www.integrichain.com/) will host and provide pizza and refreshments.
Anyone interested in drinks after event can join us at On the Border Mexican Grill & Cantina (https://www.google.com/maps/place/On+the+Border+Mexican+Grill+%26+Cantina/@40.316158,-74.658247,17z/data=!3m1!4b1!4m2!3m1!1s0x89c3e11530ca0bf9:0x331b07c2a135183b), across the parking lot from the IntegriChain (http://www.integrichain.com/) office.
Agenda
7:00-7:20 Pizza
7:20-7:30 Opening Remarks, David Stengle, Founder
7:30-7:35 IntegriChain, Kevin Leininger, CEO
7:35-8:35 Keynote, Shivani Patel
8:35-8:50 Q&A
8:50-9:00 Open Mike (one minute max, strict)
9:00 Meetup ends
Abstract
At IntegriChain, we deal with pharmaceutical supply chain data, and namely channel commerce data. As these data are used to track product through the value chain, inherently they are defined as “big data”, both in transactional volume as well as the challenges that are ubiquitous across similar datasets. Challenges we face daily include capture, reporting accuracy, storage, transfer, data quality, privacy, and analysis. There are three foundational data sets we deal in the typical flow of product through the “channel”:
• Factory Sales Data:
Transactional records of product as they are shipped from the manufacturer to either downstream wholesalers or to direct point of sale accounts (pharmacies, hospitals, etc.)
• EDI 852 Data:
Transactional records of product are there are received by wholesalers. This data contains wholesaler level shipment received information, shipment out information, and inventory information.
• EDI 867 Data:
Transactional records of product as they are distributed by the wholesaler to downstream customers across all major channels of distribution (retail pharmacy, mail order, long term care, and non-retail).
While all three data sets present obstacles to us in their raw form, the focus of this discussion will be around the EDI 867 data set. This data has become the corner stone of our demand visibility applications and although the challenges it presents to us unique, they are not unique at all in the world of “big data”. There are two major challenges in the raw data structure itself: Firstly, ~50% is redacted, meaning it lacks transparency into the final location of the shipments as they originate from the wholesalers and secondly, the data is not harmonized, meaning it lacks a static customer master. The bread and butter of our business is to address these two core challenges, as we provide 100% visibility to this data, describing all shipments as they originate from a wholesaler down to an individual account . We employ sophisticated modeling techniques to address the “blinding” aspect of the data. Utilizing the transitional level detail that is provided in the blinded records (such as transaction size, originating wholesaler, and final zip3 level destination), we have created a model that essentially assesses”goodness of fit” between the known characteristics of each blinded transaction and a known universe of potential final destinations. Our processes also employ a variety of higher end statistical modeling techniques to scrub the data, ranging from methods such as K-means clustering to isolate most like transactions to utilizing probabilistic geo spatial predictive modeling to distribute aggregate downstream supplier volumes down to individual point of care accounts. To address the second challenge of harmonizing the data at an account level, we’ve invested significant resources in master data management. Data elements are created and tracked at a transitional level, allowing us to trace transactions back to an individual point of care account that may come described in a myriad of ways across different receipts of the EDI 867 data.
Ultimately, as the structure of the core EDI 867 changes on a daily basis, we strive to ensure our processes are equipped with the latest in statistical modeling techniques as well as broader data management practices in order to achieve the highest level of reporting accuracy.

Princeton Data Science - Shivani Patel, IntegriChain