Spark ETL and Canon for NLP

Name: Spark ETL and Canon for NLP
Start: 2019-05-29T18:00:00-04:00
End: 2019-05-29T21:00:00-04:00
Location: HealthVerity

Hosted By

Ramaa N. and Michael B.

Details

Get ready for another scintillating evening of presentations on Data Science.

The Dos and Don'ts of Spark ETL

Apache Spark is a fantastic framework for large-scale data processing. It can be used to transform and prepare structured data quickly and efficiently. However, it's not a drop-in replacement for traditional RDBMS. Learn concepts, tips, tricks, and pitfalls of using Apache Spark for your data processing needs.

Speaker: Ilia Fishbein is Director of Software Engineering at HealthVerity, managing the Logistics team. His background is parallel computing, map-reduce, and distributed data processing. Ilia and his team routinely transform 100s of GBs of healthcare data.

Canon for NLP

Canon is a datastore for natural language documents which at the moment supports sentence segmentation, tokenization, part-of-speech tagging, named entity recognition, and soon, dependency parsing. It comes with an extensible scraper which does machine learning engineering on text streams in flight. It can be deployed at the press of a button.

Speaker: Alex Tecce is a Data Architect at MachineQ working on the storage and insight of time series data from a range of IoT devices. He also has an interest in NLP both academically and industrially, and will be showing his pet project, Canon.

_____________________________________________________
Event Sponsors:
Health Verity (https://healthverity.com) - HealthVerity offers a cloud-based platform to discover, license, and link HIPAA compliant and de-identified healthcare data.
Revzilla (http://revzilla.com/) - A global eCommerce retailer providing motorcycle enthusiasts with premium apparel, accessories and parts for any riding adventure.

We are thankful to the event sponsors for their generous support of DataPhilly! If you're interested in sponsoring future events please fill out our form at https://goo.gl/JLVfqh .

Events in Philadelphia, PA

Data Science Philadelphia (DataPhilly)

See more events

Data Science Philadelphia (DataPhilly)

Wednesday, May 29, 2019 at 6:00 PM to Wednesday, May 29, 2019 at 9:00 PM EDT

HealthVerity

1818 Market St, 9th Floor · Philadelphia, pa

Data Science Philadelphia (DataPhilly)

public group

Spark ETL and Canon for NLP