Skip to content

Tensorflow on Apache Hadoop YARN and using CNN to detect Cancer

Photo of Chester Chen
Hosted By
Chester C.
Tensorflow on Apache Hadoop YARN and using CNN to detect Cancer

Details

Talk 1: Tensorflow on Apache Hadoop YARN

Tensorflow™ is one of the most popular open source projects
for machine learning and deep learning, which can handle enterprise use cases like image recognition, video analytics, audio translation, etc. However, training deep learning model was very expensive which requires lots of GPU resources. Also, a real-life distributed Tensorflow application needs a bunch of services such as workers, parameter servers, TensorBoard, etc. work together. Those services need to be carefully configured to make them can talk to each other.

To make distributed TF applications can be easily launched,
managed, monitored by YARN, we introduced YARN service assembly along with other improvements such as GPU support, container-DNS support, scheduling improvements, etc. These improvements make distributed TF applications can run on YARN as simple as run it locally, which can let TF developers focus on deep learning algorithms instead of worrying about underlying infrastructure. Also, YARN can better manage a shared cluster which runs TF and other services/batch jobs with these improvements.

During this session, we will take a closer look at these improvements, and we will do a demo of running a distributed TF assembly which consists of workers, parameter servers, TensorBoard and prediction servers on YARN.

Speaker: Wangda Tan

Wangda Tan is Product Management Committee
(PMC) member of Apache Hadoop and Staff Software Engineer at Hortonworks. His major working field is Hadoop YARN resource scheduler, participated features like node labeling, resource preemption, container resizing etc. Before join Hortonworks, he was working at Pivotal, working on integration OpenMPI/GraphLab with Hadoop YARN. Before that, he was working at Alibaba, participated creating a large scale machine learning, matrix and statistics computation platform using Map-Reduce and MPI.

Talk2 : Lighting Talk : Teaser talk on using deep CNN to detecting cancer

This lighting talk will be a quick intro for a main talk to be scheduled for Next Month

Abstract:

Breast cancer is a leading cause of cancerous death in women, accounting for 29% of all cancers in women within the U.S. Survival rates increase as early detection increases, giving incentive for pathologists and the medical world at large to develop improved methods for earlier detection. Currently, the primary driver of early detection is the analysis of tumor proliferation, the rate at which tumor cells grow. The most common technique for determining the proliferation speed is through mitotic count (mitotic index) estimates, in which a pathologist counts the dividing cell nuclei in hematoxylin and eosin (H&E) stained slide preparations to determine the number of mitotic bodies.
In this lightning talk, we give a teaser of our experience training a deep convolutional neural net for the task of detecting mitoses in high-resolution tumor slide images. Our approach makes use of techniques such as fine-tuning of pretrained models and model-bootstrapped data sampling, and is built with a hybrid TensorFlow/Keras setup.

Speakers: Mike Dusenberry , Madison J. Myers

Mike Dusenberry is a machine learning engineer at the IBM Spark Technology Center. He was on his way to an M.D. and a career as a physician in his home state of North Carolina when he teamed up with professors on a medical machine learning research project. A few years later in San Francisco, Mike is focused on deep learning algorithms and researching medical applications for deep learning.

Madison J. Myers is a data scientist at the Spark Technology Center, IBM. She received her BA from NYU, her first masters from King's College London, and her second masters from UC Berkeley. While Madison previously studied global political science and worked in a think tank focusing on food policy, she is now passionate about using data science for good, particularly in the health domain.

Agenda
6 -- 6:40 pm check-in and networking/light dinner
6:40 -- 6:50 pm Introduction and Announcement
6:50 -- 7:50 pm Main Talk + QA
7:55 -- 8:10 pm Lighting Talk + QA
8:15 -- 8:30 pm Networking
8:30 pm -- -closing

Photo of SF Big Analytics group
SF Big Analytics
See more events
GoPro HQ
3025 Clearview way · San Mateo, CA