Past Meetup

More Data vs. Better Data vs. Better Algorithms

This Meetup is past

110 people went

Location image of event venue

Details

Hi All,

We're looking forward to this event! If you can't attend, please relinquish your spot on your RSVP - we unfortunately have limited capacity, and there's a long waiting list.

If possible, please arrive promptly at 6pm; one of our speakers has to catch a train and will be leaving at 6:20, so we'd like to kick his talk off right away.

Also, we will be adjourning to City Bar (http://www.yelp.com/biz/city-bar-boston) after the event, and Basho (http://basho.com) has very generously offered to pick up the tab. Be sure to thank Chris Meiklejohn, beer enthusiast and functional programmer extraordinaire, if you see him!

Thanks,

The organizers.

Confirmed speakers:

Speaker: Daniel Abadi, (http://www.linkedin.com/pub/daniel-abadi/1/10a/175) Hadapt (http://www.hadapt.com): Arguing for: Better Data Session: As it becomes clear that "one size does not fit all" in database systems, there has been a proliferation of start-ups that create new data mangement platforms. This has resulted in a general trend in the industry to use "the right tool for the job" and deploy multiple different database products for a variety of different data management tasks within an organization. Unfortunately, this has often lead to a chaotic data environment with data silos and a general lack of understanding of where data came from and what is the right version of "the truth." Ultimately, this leads to bad, unreliable data and theforefore, bad, unreliable decisions. In these times, we need greater emphasis on data governance and provenance, and need to eliminate data silos when they are not necessary. Bio: Daniel is Chief Scientist of Hadapt, a recognized expert in database systems, and one of the inventors of the company's patent-pending Adaptive Analytical Platform™. Daniel received his PhD from Massachusetts Institute of Technology, where his dissertation on column-store database systems led to the founding of Vertica (recently acquired by Hewlett Packard). He is a recipient of a Churchill Scholarship, an NSF CAREER Award, a Sloan Research Fellowship, the 2008 SIGMOD Jim Gray Doctoral Dissertation Award, and the 2007 VLDB best paper award. In addition to serving as Chief Scientist at Hadapt, he also serves as a faculty member in Yale University’s Computer Science department. Speaker: Paolo Gaudiano (http://gaudiano.com/), Icosystem (http://icosystem.com/): Arguing for: Better Algorithms Session: It is often thought that the accuracy of a model depends heavily on data quality and quantity. However, the notion that numerical data are the only type of information needed to build an accurate model is flawed. We present a modeling approach that combines domain expertise and quantitative data to demonstrate that predictive models can be developed without quantitative data, and that in general any model built with both quantitative data and domain expertise will outperform models developed with either type of information alone. We will also mention real-world situations where this approach has been applied successfully. Bio: Paolo Gaudiano is President and CTO of Icosystem, where he enjoys solving challenging business and technology problems for clients, while striving to ensure that Icosystem continues to be a stimulating, productive and fun company. He also serves as interim CEO of Infomous, Inc. and President of Concentric, Inc., two spinoffs created by Icosystem. After starting an academic career at Boston University, Paolo left his tenured position to pursue entrepreneurial opportunities with two start-ups, Artificial Life (as Chief Scientist) and Aliseo (as Founder and CEO). In 2001 he joined Icosystem, where he is able to nourish his multifaceted, interdisciplinary interests. He also continues to satisfy his passion for teaching through a position as Senior Lecturer at The Gordon Institute of Tufts University, and through a variety of speaking engagements. Paolo holds a B.S. in Applied Mathematics, an M.S. in Aerospace Engineering and a Ph.D. in Cognitive and Neural Systems. Speaker: Christopher Bingham (http://www.linkedin.com/pub/chris-bingham/0/52/a04), Crimson Hexagon (http://www.crimsonhexagon.com/) Arguing for: Better Algorithms on More Data Session: Often, analyzing more and more data doesn’t improve your results: you just make the same mistakes at a larger scale. We’ll discuss several techniques that leverage the quantity of data, increasing accuracy as you scale. Big data can thus lead to better analysis–not just bigger analysis. Bio: Chris Bingham is the CTO and first employee of Crimson Hexagon, a leading provider of business intelligence based on social media analysis. Speaker: Jeremy Rishel (http://www.linkedin.com/pub/jeremy-rishel/1/412/949), Bluefin Labs (http://bluefinlabs.com/) Arguing for: “D: All of the Above” Session: At Bluefin Labs we analyze social TV at large scale, with 24/7 realtime systems looking at the content on over 100 networks and the conversation and audience dynamics about brands, advertising, shows, and more in public social media. The analytics derived about engagement patterns and audiences provide rich insights for brands, agencies, and TV networks. To do this we pursue “all of the above”: more data, better data, and better algorithms. “More data” comes in many forms, including richer content streams and more granular sources. By including the broadest spectrum of data we’re able to gain insights not possible in other ways. “Better data” in our world comes from a fundamental approach of human-machine collaboration and data management that permits us to achieve consistent high data quality. Finally we are always pursuing “better algorithms”, for example in understanding the connections between audiences, as both we learn more about social TV patterns and engagement dynamics evolve. I’ll be discussing some examples of each from the Bluefin platform and why all three – more data, better data, and better algorithms – are necessary. Bio: Jeremy heads up Bluefin Labs’ engineering, product, and data efforts. Jeremy was formerly the CTO and VP of Engineering at aPriori Technologies, which developed a groundbreaking approach to real-time analysis of complex design and manufacturing data to predict manufacturing methods and costs. Prior to that he led teams at i2 focused on transportation planning and optimization. Rishel earned BS degrees in Computer Science and Philosophy from MIT and served in the US Marine Corps for seven years, leaving active duty as a Captain. Speaker: Josh Wills (http://twitter.com/#!/josh_wills), Cloudera Arguing for: Better Data Session: When people are first introduced to Hadoop, one of the most common questions is, “when should I use Hadoop instead of a relational database?” In this talk, we’ll walk through several use cases where Hadoop can solve problems better and faster than a relational database, even on relatively small data sets, in order to illustrate how Hadoop complements traditional data warehousing solutions. Bio: Josh Wills is Cloudera’s Director of Data Science, working with customers and engineers to develop Hadoop-based solutions across a wide-range of industries. Prior to joining Cloudera, Josh worked at Google, where he worked on the ad auction system and then led the development of the analytics infrastructure used in Google+. He earned his Bachelor’s degree in Mathematics from Duke University and his Master’s in Operations Research from The University of Texas at Austin.