addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrosseditemptyheartexportfacebookfolderfullheartglobegmailgoogleimageimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Natural Language Processing and Big Data

For our April Meetup, we are excited to bring you an event themed around Big Data Week! We have two presenters talking about their work with very different, very large text data sets. First, Ben Bengfort from UMBC and Full Stack Data Science will be talking about how to use Python's NLTK and Hadoop Streaming to make sense of large text corpora. Then, Tom Rindflesch from the National Library of Medicine will talk about his group's work building a system to help medical researchers keep up with the flood of current and historical articles published on PubMed.

Notes:

  • Please check out the other events around Big Data Week DC, and follow the #bdw13 hash tag on Twitter!
  • We're very happy to have new Meetup DC NLP cross-listing this event! Welcome to folks coming from DC NLP! Members of DSDC interested in Natural Language Processing should definitely consider joining DC NLP.
  • We're back at GWU for this event.

Agenda:

  • 6:30pm -- Networking and Refreshments
  • 7:00pm -- Introduction
  • 7:15pm -- Presentations and discussion
  • 8:30pm -- Post presentation conversations
  • 8:45pm -- Adjourn for Data Drinks (Tonic, 22nd & G St., space reserved!)

Presentations:

Natural Language Processing of Big Data using NLTK and Hadoop Streaming

Many of the largest and most difficult to process data sets that we encounter during the course of big data processing tend not to be well structured log data or database row values, but rather unstructured bodies of text. In recent years, Natural Language Processing techniques have accelerated our ability to stochastically mine data from unstructured text and in fact require large training data sets themselves to produce meaningful results. Simultaneously the growth of distributed computational architectures and file systems have allowed data scientists to deal with large volumes of data; clearly there is common ground that can allow us to achieve spectacular results. The two most popular open source tools for both NLP and Distributed Computing, The Natural Language Toolkit and Apache Hadoop, are written in different languages --  Python and Java. We will discusses the methodology to integrate them using Hadoop’s Streaming interface which sends and receives data into and from mapper and reducer scripts via the standard file descriptors.

Semantic MEDLINE:  An Advanced Information Management Application for Biomedicine

Semantic MEDLINE integrates information retrieval, advanced natural language processing, automatic summarization, and visualization into a single Web portal. The application is intended to help manage the results of PubMed searches by condensing core semantic content in the citations retrieved. Output is presented as a connected graph of semantic relations, with links to the original MEDLINE citations. The ability to connect salient information across documents helps users keep up with the research literature and discover connections which might otherwise go unnoticed. Semantic MEDLINE can  make an impact on biomedicine by supporting scientific discovery and the timely translation of insights from basic research into advances in clinical practice and patient care.

Bios:

Benjamin Bengfort is a Data Science consultant at Full Stack Data Science, and has used Machine Learning and Natural Language Processing techniques to determine textual complexity in large literary corpora. He is a PhD candidate in Computer Science, with a focus on NLP, at the University of Maryland, Baltimore County, and has a MS in Computer Science from North Dakota State University.

Please follow Ben on Twitter at @bbengfort!

 

Thomas Rindflesch has a Ph.D. in linguistics from the University of Minnesota and conducts research in natural language processing in the Lister Hill Center for Biomedical Communications at the National Library of Medicine. He leads a research group that focuses on developing semantic interpretation of biomedical text and exploiting results in innovative informatics methodology for clinical practice and basic research. Recent efforts concentrate on supporting literature-based discovery.

 

Join or login to comment.

  • Brand N.

    Continuation of Previous
    6. We are planning to continue the discussion at the Graph Connect Conference, October 3-4, San Francisco, California, and the data science team work in the Graph Database Meetup, October 22, 2013 (use of Semantic Medline in multiple graph database tools) http://www.graphconnect.com/san-francisco/agenda-san-francisco/

    http://www.meetup.com/graphdb-baltimore/events/125172912/

    http://semanticommunity.info/Data_Science/Graph_Databases

    http://semanticommunity.info/Data_Science/Graph_Databases/Tutorial

    September 11, 2013

  • Brand N.

    Continuation of Previous
    4. We (George Strawn, Tom Rindflesch, and Brand Niemann) discussed how to move forward at lunch: Tom mental health and cancer uses cases, Brand help from YarcData getting the Semantic Medline database running more fully, and George Strawn: more encouragement

    5. YarcData got the Semantic Medline database into their Graph Computer in short order and worked with Tom to explore the two new uses cases that were presented live yesterday to a very enthusiastic and excited audience, September 10, 2013:

    http://semanticommunity.info/Data_Science/Cloud_SOA_Semantics_and_Data_Science_Conference (Slides to be posted today) The YarcData Graph Computer was in Wisconsin and the response time to new queries was very fast!

    September 11, 2013

  • Brand N.

    We had a very successful conference yesterday!

    Tom (NLM) and Tim and Erin (from YarcData) gave amazing presentations and demos.

    We are ready to come back to present our progress to the Federal Big Data Senior Steering Work Group with the story as follows:

    1. We formed the Data Science Team and presented what we planned to do at our 14th SOA for eGov Conference, October 2, 2012

    http://semanticommunity.info/Federal_SOA/14th_SOA_for_E-Government_Conference_October_2_2012

    http://semanticommunity.info/A_NITRD_Dashboard/Semantic_Medline

    2. We also presented to the IAC Emerging Technology SIG Meeting: Big Data Committee, November 27, 2012:

    http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee

    3. We presented our progress to the Federal Big Data Steering Work Group, January 15, 2013http://semanticommunity.info/Emerging_Technology_SIG_Big_Data_Committee/Government_Challenges_With_Big_Data

    September 11, 2013

  • Tony O.

    For those interested in learning more about NLP with Python's NLTK and how to analyze text yourself, Ben is teaching an introductory workshop for Data Community DC on the subject. Check it out here - http://www.meetup.com/Data-Community-DC/events/123049612/

    2 · June 20, 2013

    • Janet D.

      Ben will also be teaching an online course called, "Introduction to Analytics using Hadoop and R" at Statistics.com this fall. You can check out our text analytics courses, including NLP, taught by Dr. Nitin Indurkhya, here - http://www.statistics...­

      June 20, 2013

  • Andrew M.

    Hey all- anybody know the type of presentation platform Bengfort used? Sure looked better than PowerPoint to me!

    April 24, 2013

  • Benjamin B.

    Really great to see such an active community of Data Scientists!

    April 24, 2013

  • Abhijit

    Ben's presentation, if you don't follow him on Twitter, is at http://www.bengfort.com/presentations/nlp-and-big-data-using-nltk-hadoop

    1 · April 24, 2013

  • Ayah Z.

    It was great but I was hoping to delve deeper into technical details especially in the second presentation.

    April 24, 2013

  • Raj V.

    Good presentations. Liked the second one better.

    April 24, 2013

  • Amrinder A.

    Great meetup!

    April 24, 2013

  • Renal B.

    Good

    April 24, 2013

  • Lloyd B.

    Two excellent presentations

    April 24, 2013

  • Harlan H.

    Glad to see so many people last night! Two things. Is there anyone interested in writing an event review (what happened, what did you get out of it) for the DC2 blog? Please get in touch. Also, Ben and Tom suggested resources related to this event here: http://datacommunitydc.org/blog/2013/04/resources-and-readings-for-big-data-week-dc-events/

    April 24, 2013

  • Eric

    Would love to get copies of slides and/or associated papers to help topics sink-in.

    1 · April 24, 2013

  • Loren

    Is there video of this event?

    2 · April 23, 2013

    • Sean Moore G.

      We'd love to have that, but we need a tripod, maybe a better camera...

      April 24, 2013

  • Ron S.

    Good event - came after another meetup and was hoping to shout out about TechBreakfast (http://www.meetup.com/techbreakfast), but didn't have a chance. See you all there?

    April 24, 2013

    • Sean Moore G.

      Sorry you didn't get a shout out about TechBreakfast, interested in cross-promotion?

      April 24, 2013

  • freddie s.

    1st talk was fantastic - 2nd was not up to par - functionality of course matters, but DSDC and similar meet-ups are essential about the technology

    2 · April 23, 2013

  • DAVID A.

    Sorry a conflict came up, hope to make the next meeting.

    April 23, 2013

  • Greg B.

    Apologies for the late cancellation but I have to work on an important project.

    April 23, 2013

  • Lael C.

    I'm really sorry, I have a family emergency and need to get straight home. Will definitely have to make the next Big Data week

    April 23, 2013

  • Tom Z.

    would really like to attend, but schedule conflict

    April 23, 2013

  • Krishna A.

    I am driving from Columbia(near 175 and 95 intersection). If someone (3 max) want to carpool with me, let me know. It takes roughly 50 mins from here. So, want to start at 5PM.

    April 23, 2013

    • Raj V.

      I can come to 7/11 or Trader Joes or the Farmers Market, whatever is good with you. Please let me know. Thanks.

      April 23, 2013

    • Krishna A.

      Hi Raj, Sure. we can carpool. I emailed you my numbers. Send me your numbers too.

      April 23, 2013

  • Ross

    Am needed at home; Big Data will have to wait! Have a blast!

    April 23, 2013

  • A former member
    A former member

    Very sorry to have to miss it as well as for the same day cancel !!

    April 23, 2013

  • Paul T.

    and for those who are driving in - whats the best place to park?

    April 23, 2013

    • Trang P.

      I use to take classes at GWU and I think after 5, many parking garages have a discounted rate of $5 flat. I know of one particular garage on K street between 20th and 21st that does that, but it'll be a 4-5 block walk.

      2 · April 23, 2013

  • Erina H.

    I will come from Baltimore that day. Anyone lives/works in Baltimore ,too?
    They told me the traffic would be very bad after 5 p.m. when I get off the work.
    I am thinking about taking train or something. It will be highly appreciated if anyone can provide the advice!
    Thank you!

    April 18, 2013

    • Erina H.

      It is near John Hopkins Homewood campus

      April 23, 2013

    • Erina H.

      Thank you Briggs. My boss allows me to leave before 4 p.m. in the Baltimore. Do you think it is a good time to drive? I have concern whether I can find a parking spot in the train station since my friends told that they do not allow to park during the weekdays:(

      April 23, 2013

  • Erina H.

    My boss allows me to leave @4p.m. from Baltimore to DC. Is it a good time to drive during that time?

    On the other hand, can anybody advice about the parking status near our meeting location?

    Thank you!!!

    April 23, 2013

  • michael k.

    Work conflict arose.

    April 23, 2013

  • A former member
    A former member

    Conflicting obligation.

    April 22, 2013

  • Clay M.

    I'm sad to say that I cannot make ti tomorrow.

    April 22, 2013

  • David H.

    Schedule conflict with civic assoc. annual meeting.

    April 22, 2013

  • Ramesh C.

    conflict with the Vienna meet. Going to miss this. But wanted to give somebody else a chance.

    April 22, 2013

  • Nevin H.

    Sadly, meeting conflict

    April 22, 2013

  • Erina H.

    HI, i found out that this seminar is an intermediate level. Will it work for the entry level?

    Thank you!!!

    April 18, 2013

    • Erina H.

      Thank you Benjamin!

      April 18, 2013

    • Frank S.

      If you'd like to see an example of NLP in action, google "watson jeopardy" or attend our seminar about using NLP and Analytics in Healthcare on 4/25. more at ibm.com/ascdc

      April 18, 2013

  • Frank S.

    If you are looking for something less technical and more applied, consider our upcoming seminar called: "Making Better Healthcare Decisions with IBM Watson (which uses NLP) and Advanced Analytics. Thurs, Apr 25, 9 am. Signup at www.ibm.com/ASCdc - Free and open to the public.

    April 18, 2013

  • Erina H.

    I am a business analyst who wants to explore more in the data area!

    April 17, 2013

  • A former member
    A former member

    I work for Esri, Inc. I have a background in computer science. I use ArcGIS professionally; and Weka, LibSVN, and Matlab for machine learning. I'm interested in how data can be gathered, analyzed, and displayed in an intelligent way. I'm interested in how raw sensor data can be interpreted by computer models and displayed visually (3-D or 2-D maps). I'm interested in LIDAR; and simultaneous localization and mapping.

    April 11, 2013

  • A former member
    A former member

    Data geek. I've played around with NLTK before, love Python, and I'm interested in using semantic technologies with the medical field, so this meetup is pretty much nirvana.

    March 27, 2013

  • Candice J.

    GW will have a representative from the F. David Fowler Career Center on hand (6:30 - 7p) for those interested in developing a relationship. Come find out how to help your org engage in on-campus recruiting, posting jobs, career fairs, panels, etc.!

    1 · March 25, 2013

  • Zhaoyang C.

    See you guys there

    March 20, 2013

  • Justin J.

    Looking for to the presentation in regard to Natural Language Processing and Big Data!

    March 15, 2013

  • A former member
    A former member

    Looking forward to it!

    March 15, 2013

  • A former member
    A former member

    I look forward to the meeting.

    March 15, 2013

  • Masahiko

    this looks interesting!!

    March 15, 2013

  • Atul K.

    Bringing Jai Jaiprakash

    March 15, 2013

210 went

Our Sponsors

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy