Please join us for a night talking business intelligence, predictive analytics, and 'big data' with Pentaho's Founder and Chief Geek, James Dixon! James will discuss Pentaho's open source business intelligence platform and features as detailed below. This event is co-hosted by the NoSQL NYC Meetup group - special thanks to Ivan Brusic for sharing this opportunity with us.
‘Open source predictive analytics – Using Weka and Pentaho Data Integration’
Weka is open source software issued under the GNU General Public License that is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
As the leading open-source BI software company, Pentaho has become major sponsor of Weka development and will take over the administration of Weka's Sourceforge site in the near future. Pentaho also provides a live forum for interaction among Weka project community members. The combination of Weka and Pentaho provides an end-to-end open source approach to all your BI initiatives.
‘The Big Data Analytics Tool Kit’
Getting a Hadoop cluster up and running is only the first step in launching a company’s Big Data practices. You will need more than Hadoop to access all these individual data points and quickly deduce informative patterns never available before. This presentation will cover some of the tools necessary to streamline Big Data management:
MapReduce, a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes;
Hive, a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files;
An ETL tool for key data integration and transformation functionality to help more data into and out of Hadoop;
A user-friendly GUI to help set up the initial project and provide a management console for job management,
BI capabilities to drive compelling reporting and analytics.
Altogether, these tools will help accelerate Hadoop adoption and enable people to begin tackling problems that they thought were beyond their reach.
Attendees will learn about:
• Integrating with Hadoop and Hive to bring ETL, data warehousing and BI applications to the tasks of analyzing Big Data;
• Providing key data integration and transformation functionality to Hadoop data;
• Managing and controlling transformations and Hadoop jobs;
• Integrating Hadoop data with data from other sources to drive compelling reporting and analytics for today’s massive volumes of data.
About James Dixon:
As "Chief Geek" (CTO) at Pentaho, James Dixon is responsible for Pentaho's architecture and technology roadmap. James has over 15 years of professional experience in software architecture, development and systems consulting. Prior to Pentaho, James held key technical roles at AppSource Corporation (acquired by Arbor Software which later merged into Hyperion Solutions) and Keyola (acquired by Lawson Software). Earlier in his career, James was a technology consultant working with large and small firms to deliver the benefits of innovative technology in real-world environments.