Skip to content

February Meetup Night Hosted by Privacera

Photo of Joseph Kambourakis
Hosted By
Joseph K.
February Meetup Night Hosted by Privacera

Details

This month’s meetup is sponsored by Privacera, whose data access governance platform breaks down data silos and simplifies data access control across on-premises and cloud data analytics services. Privacera’s Bill Brooks will walk through how the Privacera platform, based on Apache Ranger, enables fine-grained access control for data science and machine learning workloads on Apache Spark.

Our next talk is by Dan Nadler about NLP on Spark. Man Numeric is a quantitative asset manager. For most of our history, all financial data were relational, and mostly numerical. In recent years, a wealth of text and language data has become available to help us create a more complete picture of the companies we are investing in, and to do so at scale. The goal of the framework that we’ve built is to make text data as easy to do research on as traditional financial data. It’s a nice example spark-native research software that provides an intuitive user experience and makes it easy and cheap to run and share NLP experiments. Find Man Group Alpha Technology on twitter @ManQuantTech and on github at https://github.com/man-group/

Dan Nadler is an Engineer at Man Numeric where he has focused on building and improving tooling that helps researchers and portfolio managers conduct research and understand their portfolios. His recent focus has been on improving accessibility to unstructured data and distributed computing through an NLP research platform. Dan studied astrophysics and neuroscience at the University of Colorado at Boulder before receiving a Master’s degree in Finance from the University of Denver. Dan is a CFA® charterholder, and previously worked for Putnam Investments where he conducted research on long-term investing and portfolio construction.

Our last talk will feature Andre Mesarovic. He'll talk about ONNX.
ONNX is a new open interoperable machine learning format sponsored by big vendors such as Microsoft, Facebook, Intel and others. ONNX decouples training from scoring regarding language and framework. You train your model in your favorite framework/language, save it in ONNX format and then you can run it on another framework. ONNX consists of two parts - the format spec itself (recently moved to the Linux Foundation) and a runtime (Microsoft recently open-sourced its implementation). We will discuss ONNX in general integration with MLflow and see how well it actually works with several ML frameworks such as SparkML, Sklearn, XGBoost, Keras and PyTorch.

Photo of Boston Data Technology (Boston Data Group/BDT) group
Boston Data Technology (Boston Data Group/BDT)
See more events
Workbar Boston
24 School St 2nd floor · Boston, MA