Big Data Science Meetup Event

Name: Big Data Science Meetup Event
Start: 2012-05-12T13:30:00-07:00
End: 2012-05-12T16:30:00-07:00
Location: SGI (Eng Bldg., Baja Hall)

Hosted By

Shyam S.

Details

1:30 P.M. - 2:00 P.M. Networking

2:00 P.M. - 2:45 P.M.

Title: Condos in the Big Data Garden

Speaker: Jack Park, Senior Computer Scientist in the
areas of Artificial and Collective Intelligence

Abstract: The activity now known as Big Data
is, at once, a science, an art, and a social
enterprise, one of discovery. We present the
concept of a Knowledge Garden as a social and
knowledge-based infrastructure which offers the
opportunity to federate knowledge gained from
Big Data studies across all disciplines. We offer
a light sketch of a "condo in the garden" applied
to a biomedical aspect of Big Data.

Bio of Speaker: Jack Park is a computer scientist
working in the fields of artificial and collective
intelligence. He created, edited, and co-authored
the book XML Topic Maps: Creating and Using Topic
Maps for the Web, was a Ph.D. student researching
the topic of knowledge federation applied to
hypermedia discourse, and designs and builds
software platforms for knowledge gardening. He was
a research scientist at SRI International working
on their Cognitive Assistant that Learns and
Organizes (CALO) project, and authored and co-authored
several conference papers on the subjects of topic
mapping and semantic desktop applications for
collective intelligence. He is an avid player of
Jane McGonigal's IBIS card games.

2:45 P.M. - 3:00 P.M. Q&A

3:00 P.M. - 3:45 P.M.

TiTle: STORM and Real-time Machine Learning

Speaker: Ted Dunning, Chief Application Architect,
MapR Technologies

Abstract: Real time processing is characterized by
more or less tight latency bounds on response to
requests. Batch processing systems like Hadoop have
historically had a very hard time dealing with these
constraints due to their fundamental nature. Recent
open source developments such as the Disruptor
toolkit ( http://code.google.com/p/disruptor/ ) and
zero mq ( http://www.zeromq.org/ ) have provided the
basic components of real-time processing, but
real-time processing has lacked a metaphorical
equivalent of Hadoop. Storm is intended to fill
exactly that role for real-time systems with response
requirements in the milli-second to second time range.

Ted will provide a basic introduction to Storm and
describe a simple aggregation application. In addition,
Ted will show how Storm can be used to do completely
real-time machine learning. In such a learning
application, the classification algorithm makes
decisions which affect which data the algorithm gets
to use as training data. The Bayesian Bandit is a
recent algorithm that provides state-of-the-art
performance but is exceedingly simple.

Bio of Speaker: Ted Dunning has been involved with a
number of startups with the latest being MapR Technologies
where he is Chief Application Architect working on
advanced Hadoop-related technologies. He is also a PMC
member for the Apache Zookeeper and Mahout projects.
Opinionated about software and data-mining and passionate
about open source, he is an active participant of Hadoop
and related communities and loves helping projects get
going with new technologies.

3:45 P.M. - 4:00 P.M. Q&A

4:00 P.M. - 4:45 P.M.

Title: Reinventing Structure for Big Data Analytics

Speaker: Mark Davis, Founder and CTO, Kitenga Inc.

Abstract: A critical dimension of Big Data is the notion of data diversity where unstructured, semi-structured, and structured data converge. Making effective use of unstructured information requires mining enough structure from the data in order to enhance analysis and user interaction. In this talk, I will describe use cases for unstructured information analytics that span topics in intellectual property management, research discovery, and information analysis based on content analytics approaches that scale to Big Data technologies. Key approaches include cascaded finite state methods, conditional random fields, co-occurrence analysis, Hadoop/GPU-based classifiers, and random indexing for scalable metadata generation. With the generation of great metadata comes great responsibility, of course, and the talk will conclude with a discussion of reifying user analysis artifacts that effectively bind to rich unstructured metadata to solve real customer problems.

Bio of Speaker: Mark Davis is Founder and CTO of Kitenga, Inc., a Santa Clara-based startup that created the first commercial unstructured information analysis platform for Hadoop. Mark's background includes heading enterprise search at Microsoft and spinning companies out of Xerox PARC. In an example of Sili Valley serendipity, he was also first hired by fellow speaker Ted Dunning in 1992 to investigate the nascent field of information retrieval for the US Intelligence Community. His R&D work has included forays into machine learning, computational linguistics, visualization, cognitive science, and evolutionary optimization, and he remains fascinated by so many topics that his work sometimes appears to suffer--but only in a positive way.

4:45 P.M. - 5:00 P.M. Q/A

5:00 P.M. - 5:30 P.M. Networking

Coffee/tea and light snacks will be available.

Events in Fremont 94538, CA