addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramlinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Introduction to Data Science on Big Data

Update: Location is CSI Annex, at 720 Bathurst, room 6 which is on the ground floor.

Our presenter for this session is Chris Stephens, the Field Director for the Pivotal Data Science team. Chris has spent the last 10 years in the data analytic work space working with SAS(Sr Analytics Product Manager) and EMC.  In this presentation Chris will explain the general flow of a data sciences engagement based on real world experience in a few industries such as Telecommunications, Retail, Utilities and Financial Services. He will also review some of the tools employed by the Pivotal Data Science team: MadLib, SAS, R, Greenplum DB and Chorus

Join or login to comment.

  • David T.

    Enjoyed the presentation and discussion, particularly as the content covered what some established companies look at for incremental improvement.

    August 25, 2013

  • Dean


    August 23, 2013

  • Markhaus

    Anyone have a good script for zero-copy to copy larger datasets across a networks or suggest. How well does GRIDFTP work.

    August 20, 2013

    • Luke S.

      There are two ways for large file loading. GPFDIST or Postgres COPY. GPFDIST requires an external table and does not go through the master node but rather through the segments. This can give you something like can be 3500MB/sec (depending on network of course) per rack and can scale by adding more segments. Whereas Copy can only do 40-80MB/sec and is not as scalable as it goes through master node only. It does not require an external table definition though.

      August 22, 2013

    • Luke S.

      If you have more questions please find me at the meet up. Happy to chat more.

      August 22, 2013

  • Luke S.

    Hi All;

    We will be having this meeting at CSI Annex.

    The closest Subway station is Bathurst station. Just get off the train and head south of Bloor. 720 Bathurst is on the East side of street just south of Lennox.

    Please arrive before 7pm as the door locks after this time making it trickier to get into.

    For parking there is a lot on Lennox and Bathurst as well as street parking in the area (read the parking signs closely though).

    Looking forward to seeing you all there.


    August 19, 2013

  • William W.

    I am excited to learn about this and will be there

    August 19, 2013

  • Michael W.

    Looking forward to it

    August 19, 2013

37 went

Our Sponsors

  • Pivotal Inc

    Thank you Pivotal for providing our group with awesome guest speakers.

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy