addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1linklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

April Meetup.

  • Apr 18, 2012 · 6:00 PM
  • Pittsburgh Supercomputing Center

Shannon Quinn will be giving an introduction to Apache Mahout.



What is Apache Mahout? The Apache Mahout™ machine learning library's goal is to build scalable machine learning libraries. Mahout currently has
  • Collaborative Filtering
  • User and Item based recommenders
  • K-Means, Fuzzy K-Means clustering
  • Mean Shift clustering
  • Dirichlet process clustering
  • Latent Dirichlet Allocation
  • Singular value decomposition
  • Parallel Frequent Pattern mining
  • Complementary Naive Bayes classifier
  • Random forest decision tree based classifier
  • High performance java collections (previously colt collections)
  • A vibrant community
  • and many more cool stuff to come by this summer thanks to Google summer of code

With scalable we mean:

Scalable to reasonably large data sets. Our core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms

Scalable to support your business case. Mahout is distributed under a commercially friendly Apache Software license.

Scalable community. The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases. Come to the mailing lists to find out more.

Currently Mahout supports mainly four use cases: Recommendation mining takes users' behavior and from that tries to find items users might like. Clustering takes e.g. text documents and groups them into groups of topically related documents. Classification learns from exisiting categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. Frequent itemset mining takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.


Join or login to comment.

  • Steve T.

    Thanks for the links, Todd. Hey, that was a great meetup yesterday. Thanks to the organizers and especially to Shannon.

    April 19, 2012

  • Todd K.

    Collaborative event listing (CMU, Pitt, others):

    April 18, 2012

  • Todd K.

    Possibly of interest?: tomorrow 4:30@CMU

    Machine Learning/Google Seminar
    Abstract: Annotation on the Cheap

    We consider the task of labeling an entire data set, given the ability to query the labels of specific points within it.

    Specifically, we define the "finite population annotation" (FPA) problem as follows. The input consists of: a set of n data points, each of which has an associated label that is initially missing; and parameters del

    April 18, 2012

10 went

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy