addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1light-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

New Meetup: Discussion: Parallel Online Machine Learning for Big Data

From: Monica
Sent on: Sunday, August 1, 2010 5:28 PM
Announcing a new Meetup for Bay Area Artificial Intelligence Meetup Group!

What: Discussion: Parallel Online Machine Learning for Big Data

When: Sunday, August 8,[masked]:00 PM

Where: TechShop
120 Independence Dr.
Menlo Park, CA 94025

Moore's Law is the observation that computers get larger, faster and cheaper every year. Larger and faster means we can economically attack larger and more complex problems. But "cheaper" means we can attack problems even larger than that by using clusters, server farms, and "cloud computing"... at least as long as the problems meet certain requirements that allow them to be restated in ways that exploit parallelism. The algorithms used on clusters are sometimes quite a bit different than those used in traditional "single thread" computing.

In a semi-related development, disk drives are getting larger, faster, and cheaper even faster than computers. You can buy (at the same cost) twice as fast a computer (well, you could, until recently) every eighteen months, or twice as much RAM memory every eighteen months, but the current trend is that you can buy a disk drive twice as large after a mere twelve months. Today a one TeraByte disk drive (that's disk storage for the text in one million thick paperback books) can be had for $60.

The synergy of large clusters of cheap yet powerful computers managing very large data sets are providing the opportunity to develop new kinds of algorithms that exploit these capabilities. Google's MapReduce algorithm (and the Free and Open Source Software version named (Apache) Hadoop) are excellent examples of this. But we have barely scratched the surface.

Machine Learning (ML) algorithms can also benefit from these capabilities. Peter Norvig at Google gave a presentation at AI Meetup in December 2008 (see ) where he showed that in many different domains, simple algorithms using large amounts of data would outperform older, more complex algorithms for ML (but also in many non-ML problem domains). Many such algorithms use Model Free Methods or are based on non-parametric models or other model-weak methods.

Some interesting questions, suggested by Luca Rigazio, are "How, specifically, can we use these parallel computing and Big Data capabilities to do better (and possibly completely novel kinds of) Machine Learning. What kind of algorithms should we in the Machine Learning community invest in, not only for data mining but also for representation and storage? How do these relate to biological learning - a process which is obviously highly parallelized." He also refers us to an article in the Economist at that shows that in many domains, data is becoming available at an explosive clip, surpassing even the growth of disk drive and parallel computing capabilities.

We will spend this meetup discussing these questions (and any related issues you want to raise). Luca will briefly introduce the issue and we will then discuss it as a group.

RSVP to this Meetup:

Our Sponsors

  • Syntience Inc.

    AI research company. Provides video equipment, time, and web space

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy