Climate Data Analysis Cyberinfrastructure & Data Scientists vs. Data Engineers


Details
University of Colorado Boulder - Tuesday July 23, 2013 @ 6:00pm MST
For folks unable to attend in person register and we will email you a livestream link 2 hours prior to event.
Note: Dr. Arvind Sathi has a scheduling conflict and will present at a future event. Michael Walker will present on Data Scientists vs. Data Engineers.
Location: ATLAS - 1125 18th St Bldg 223, Boulder, CO - Room 100
Map: http://goo.gl/maps/XTJ9v
Agenda:
6:00 - 6:15 Schmooze - Food will be served in Lobby.
6:15 - 7:30 Rethinking Cyberinfrastructure for Climate Data Analysis Workflows by Dr. Richard Loft
7:30 - 8:30 Data Scientists vs. Data Engineers by Michael Walker
8:30 - 9:30 Network at The Sink at 1165 13th Street.
See: http://bit.ly/ND8Kp
Rethinking Cyberinfrastructure for Climate Data Analysis Workflows - Abstract
Advancements in the computational capability of massively parallel supercomputers have offered the Earth system science community an unprecedented opportunity to dramatically improve its understanding of the Earth system. This has spurred a focused effort, over many years, to improve Earth system model scalability and performance. However it has recently become painfully evident that the ancillary data analysis software and hardware systems have become the rate-limiting step in advancing scientific understanding. There are three reasons for this development: first, the rate of improvement in computing system has outpaced improvements in storage system performance; second, many workflows and tool remain serial, while applications have become increasingly parallelized; and third, many analysis tools and applications make inefficient use of the underlying hardware.
This talk will cover the history and current state of Earth system modeling and data analysis, show how capabilities of the NCAR Wyoming Supercomputing Center are advancing that state, and suggest how infrastructure and the analysis software can and must coevolve to address the massive amounts of data. The discussion will be framed through experiences at NCAR in pushing the boundaries of what is possible in data centric computing, and the trends influencing the next co-evolutionary steps.
Bio
Dr. Loft has been involved with massively parallel computing since joining Thinking Machine Corporation as an Application Engineer in 1989. Throughout his career he has contributed to the understanding and effective use of parallelism as applied to grand challenge simulations. His algorithmic innovations dramatically improved the scalability of the atmospheric component of the Community Earth System Model, and were recognized with an honorable mention prize in the IEEE/ACM Gordon Bell competition at Supercomputing 2001. Rich is currently the Director of Technology Development Division in the Computational and Information Systems Laboratory at NCAR. TDD is charged with improving application scalability and performance, exploring the use of new computer technologies, and developing software to serve and analyze large or complex datasets. He also serves as NCAR’s representative to the eXtreme Science and Engineering Discovery Environment (XSEDE) Service Provider Forum (SPF) and oversees NCAR’s participation in the XSEDE project. Dr. Loft also leads the Outreach Services Group for the CISL computing laboratory at NCAR. The education of future computational scientists is an area he is passionate about, which is why he founded the Summer Internships in Parallel Computational Science, or SIParCS program in 2007.
Data Scientists vs. Data Engineers - Abstract
Data science is a team sport. Organizations often make the mistake of mixing and confusing team roles on a data science project - resulting in over-allocation of responsibilities assigned to data scientists. For example, data scientists are often tasked with the role of data engineer leading to a misallocation of human capital. Here the data scientist wastes precious time and energy finding, organizing, cleaning, sorting and moving data. The solution is adding data engineers to the data science team.
Data scientists should be spending their time and brainpower on applying data science and analytic results to critical business issues - helping an organization turn data into information - information into knowledge and insights - and valuable, actionable insights into better decision making and game changing strategies.
Data engineers are the designers, builders and managers of the big data infrastructure. They develop the architecture that helps analyze and process data in the way the organization needs it. And they make sure those systems are performing smoothly.
This presentation will address key issues, including:
What is big data and how is it being used?
How can strategic plans for big data analytics be generated?
How does big data change analytics architecture?
Bio
Michael Walker is a managing partner at Rose Business Technologies (http://www.rosebt.com/index.html), a professional technology services and systems integration firm. He leads the Data Science Professional Practice at Rose. Mr. Walker received his undergraduate degree from the University of Colorado and earned a doctorate from Syracuse University. He speaks and writes frequently about data science and is writing a book on Data Science Strategy for Business. Learn more about the Rose Data Science Professional Practice at http://bit.ly/10TgVHG . Follow Mike on Twitter @Ironwalker76 (https://twitter.com/Ironwalker76).

Climate Data Analysis Cyberinfrastructure & Data Scientists vs. Data Engineers