Hands on Lab Big Data Analysis Tools - Algorithmic Approach

Time: Saturday 23.

The idea is to do real programming activity on big data analysis tools like hadoop and Scala, including Scalding.

However, we will focus on the algorithms side of the map-reduce instead of on the nitty-gritty of hadoop  and Scalding. We'll have a look at the algorithms starting from word count and simple statistic analysis (mean, standard deviation, ...), graph algorithms and social network analysis, machine learning.

Note that, we will unlikely to have experts on those fields, we will just have some facilitators, so don't expect something very precise. Bear with us on this. The idea is really to try to implement some algorithms, however imprecise it is, using map reduce.

OBJECTIVE

At the end of the lab, participants have coded non-trivial map-reduce algorithms. Non-trivial is defined as iterative map-reduce algorithms or non-trivial problems like classification and clustering problem from machine learning.

PROGRAMMING LANGUAGES & LIBRARY

  • We're going to code in Scala and Java. Typical application would be to write driver in Scala while mapper and reducer will be still written in Java.
  • Hadoop distribution to be used is Cloudera CDH4.
  • We may also play with Scalding or Scoobi (scala apis for Hadoop). You can just take the latest version of those.

PROBLEMS

Five algorithms are targeted (but not limited to. If you want to go beyond this list during the session, feel free to go ahead).

  • Word count, for warming up. This can be expanded to bigram count or N gram count.
  • Naive bayes.
  • Logistic regression
  • K-Means Clustering and Beyond
  • Shortest path / Triangle Calculation.

Each problem is going to be exhibited in the presentations. Each presentation goes until quite detail so that the participants can translate it into the codes.

REQUIREMENTS

- basic linux shell knowledge

- basic JAVA knowledge

- macOS/Linux laptop with JVM 6 and ssh installed.

- 2GB RAM, wifi network card.

- few GB free HDD space.

Windows support is possible, but macos/linux is better. if you plan using windows (nobody is perfect!), you need 4GB RAM and be ready to install virtualbox for runing a linux VM.

DATA

We will have some small data to play for some algorithms, and bigger data to have the feeling how it looks like to be executed in a more realistic data.

COST

Free of charge of course.

We're looking for sponsors for pizzas. In case we don't manage to have the sponsors, we will need to order pizzas and share the cost together (to see at D day ).

We will have one Amazon EC2 clusters shared by everybody. If you want to have your own Amazon EC2 clusters, you're very welcome.

SCHEDULE

Here is the confirmed schedule :

09.30 - 10.00 Opening

10.00 - 11.00 Map Reduce Refresher and 1st Algorithms: Word count and Beyond (Paul de Schacht)

11.00 - 12.30 Installation & implementation of word count.

12.30 - 13.45  Pizzas.

13.45 - 14.45  Classification Algorithms  (Mario Pastorelli)

15.00  - 16.00 Clustering (Nicolas Maillot)

16.05 - 16.25 Just Enough Scala to Survive (Tobo Atchou)

16.00 -  20.00 Machine Learning coding + pizza starting at 18.30

20.00 - 20.45  Introduction to Map Reduce Graph Programming (shortest path and minimum spanning tree) (Anwar Rizal)

20.45 - 21.00 Conclusion (Anwar Rizal).

21.00 - 22.00 Bootstrapping graph algorithms, free, home, coffee, finishing pizzas ...

22.00             End.

Join or login to comment.

  • Anwar R.

    I have uploaded the slides for Clustering presentation done by Nicolas Maillot.
    Check here: http://files.meetup.com/2771782/meetup-clustering-nm.pdf

    March 16, 2013

  • Mario P.

    Hi guys, first of all thank you for coming: I really had a good time :-). Sorry for the delay but I had a log of work to do during the last two weeks.
    BTW I put the source code of Naive Bayes on github at https://github.com/melrief/NaiveBayesIMDB . This implementation is a little bit more powerful than the one that I presented, in particular support smoothing, but it's based on the same idea that I presented.

    Thank you again and see you next time!

    March 9, 2013

    • Anwar R.

      Thank you for sharing this, Mario.

      March 10, 2013

  • Eric Djatsa Y.

    Thanks for the organizers and all the participants , I spent a nice Saturday ;-) Looking forward for the next meetup.
    PS: I couldn't find my small black umbrella branded "Pierre Cardin" , I left it close to the door, did someone mistakenly took it ? ;-)

    February 25, 2013

  • Tobo A.

    So many laptop brands at the same location ;-) It was good to have brains at work on interesting algorithms the whole day. Nice presentations, good coding experience. Thank you all.

    1 · February 25, 2013

  • Mark K.

    Great event thanks!

    February 24, 2013

  • A former member
    A former member

    Thanks for the great event yesterday. It was really nice collaborative & productive Saturday!

    1 · February 24, 2013

  • A former member
    A former member

    Anwar, Nicolas & Tobo, thanks for organizing this event. I really enjoyed it.

    February 24, 2013

  • Patrick G.

    J'ai oublié mon parapluie à côté du frigo. Quelqu'un l'a récupéré?

    February 24, 2013

  • emil s.

    A nice experience. Thank you all for the presentations, I'm looking forward to meet you again.

    February 24, 2013

  • Darren C.

    sorry guys, I have to cancel for this event :(

    February 22, 2013

  • Mario P.

    Guys tomorrow I'll go by car and I have three free seats in my car. If you need a lift and you live near Golfe/Juan/Antibes, just let me know.

    February 22, 2013

  • Corinne K.

    I prefer to free my place for waiting list

    February 21, 2013

  • Kris K.

    Guys ! How much Scala knowledge is needed (hello-world-beginner/mid-level/scala-guru) ?

    February 19, 2013

    • Nicolas B.

      Hi Kris ! No skill in scala required. You'll be provided with the minimum required during the hands on lab.

      February 19, 2013

  • Thomas G.

    Not available this week-end unfortunately ...

    February 14, 2013

  • A former member
    A former member

    guys, I'd really like to be a part of this meetup. I am a data mining, machine learning enthusiast currently working at a research institution in Sophia. Is is possible for me to still be a part of this hands on Lab? I'd really like to a part of it .

    February 11, 2013

  • Sergiu O.

    I will be here

    February 7, 2013

  • Eric Djatsa Y.

    Yes I will be there,it' s a great opportunity for me to step back into Big data

    1 · February 3, 2013

  • Thomas G.

    I'll try to be there!

    January 26, 2013

  • alexandre z.

    Tentatively yes, if weekend duties will not arise

    January 25, 2013

  • A former member
    A former member

    maybe

    October 19, 2012

  • Filippo P.

    I will try, I will be leaving few days after so I cannot guarantee

    September 29, 2012

22 went

Our Sponsors

Create your own Meetup Group

Get started Learn more
Katie

I'm surpris ed by the level of growth I've seen since becoming an organizer, it's given me more confidence in my abilities.

Katie, started NYC ICO

Start your Meetup today

Act now and get 50% off.
Until February 1.

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy