• Real-time Sensor Data and Deep Learning

    Trace 3

    Hi Fellow Data Enthusiasts, Pease join us for the next Orange County Advanced Analytics / Big Data Meetup on Wednesday, September 26th. We have 2 topics to share with you: - Harnessing Sensor Data with Machine Learning in Real-time for Predictive and Prescriptive Analytics - Towards Process Automation - a Deep Learning use case More info below. All are welcome! Together we are all the lines of business, data scientists, and data engineers and architects, and this content should be accessible and useful to all of us. When you attend, we can surface up and dive deep - just ask us and we'll help! Looking forward to seeing you all there! --------------------------------------------------------------------------------------------------- Subject: Harnessing Sensor Data with Machine Learning in Real-time for Predictive and Prescriptive Analytics A real-life use case where real-time sensor data is being relayed to the data center where predictive machine learning algorithms determine the health of a vehicle. Valuable insights are derived from blending the sensor data with a wide variety of data systems and sources that hold key information such as the age, expected useful life of the parts, optimal maintenance schedules, etc. This solution is extremely valuable keeping the fleet moving at the lowest possible operating cost – while demonstrating cutting edge uses of Big Data, Machine Learning, and Analytics. About Hitachi Vantara: Hitachi Vantara combines 100 years of OT with 60 years of IT experience to connect business, human and machine data to create IoT solutions that drive measurable benefits for businesses and society as a whole! Over 80% of the Fortune 500 trust Hitachi Vantara for their DATA-Driven solutions. They trust Hitachi Vantara to help them Store, Protect, Enrich, Activate and Monetize their DATA with a uniquely designed DATA stairway to value that leverages Machine Learning and Artificial Intelligence - on premise in the cloud, and/or as a service. The largest online retailer relies on Hitachi Vantara to run their most critical applications and NASA leverages Hitachi Vantara to protect and enrich its space imagery–making it more useful for mankind. About the Speaker: Frank Dominguez is a member of Hitachi Vantara's Big Data and Analytics Engineering team. Based in beautiful Portland, OR, he helps organizations of all sizes to find the true value in their data. Frank evangelizes that data-driven organizations operate differently and that industry thought leaders demand more from their data. From an advanced analytics perspective, Frank helps leaders to spend less time prepping the data and more time focusing on the outcomes. ------------------------------------------------------------------------------------------------------ Subject: Deep learning: Towards Process Automation Artificial Intelligence (AI) is making a great impact on every industry. Researchers and practitioners are using traditional Machine Learning methods as well as Deep Neural Network to complement advanced tools and technologies in the automation process. In this presentation I will present the work that we did for one of our aerospace customers that involved some use of Natural Language Processing (NLP) and Deep Neural Network (DNN) specifically Bidirectional Long Short Term Memory (BiLSTM) Recurrent Neural Network (RNN). I will walkthrough the use case with a little background about the problem statement and discuss the modeling approach with some results. About the Speaker: Jayeeta is a Data Scientist at Trace3 where she is using Machine Learning and AI to help business find the most optimized solution. She has consulting experiences in Life Insurance, Life Sciences, Retail, Aerospace, and Health Care businesses. Jayeeta received her PhD and Master’s in Chemical Engineering from the University of California, Davis. Her diverse research involved molecular modeling, data mining of materials, biomaterials, and Cheminformatics.

    1
  • The Era of AI & Applications to Business Outcomes in Big Data Age

    Please join us for the next Orange County Advanced Analytics / Big Data meetup on Tuesday, June 26th. Featured discussions are: • The applications of AI today, the AI capabilities you can leverage to impact business outcomes, and the platforms you can use to build AI applications now with Big Data. Potential applications of AI include unknown insights and pattern finding that lead to more revenue, process automation that lead to improved operations metrics, and risk scoring and anomaly detection in cyber analytics. Nvidia will lead this discussion ML basics • Many have asked for a non-technical overview of Machine Learning, the engine behind AI, given the Big Data environment. We’re going to start a series on this, and this session will provide a high-level overview. Perfect for the non-data scientists, and a chance for the data scientists to share their knowledge About Nvidia: There’s a reason why Nvidia’s stock has soared 72% over the past 1-year YTD. And it’s likely not just video games. Here’s a hint… THE ERA OF AI Three converging forces brought about the era of AI: the availability of immense stores of data, the invention of deep learning algorithms, and the intense performance of GPU computing. New internet services, like Google Assistant, have learned speech from sound. Self-driving cars use deep learning to recognize the space the car inhabits and what to avoid. In healthcare, neural networks trained with millions of medical images can find clues in MRIs that until now could only be found through invasive biopsies. AI will spur a wave of social progress unmatched since the industrial revolution. https://www.youtube.com/watch?v=GiZ7kyrwZGQ Agenda: • Networking and food: 5:45-6:20 • Welcome: 6:20-6:30 • Overview of ML: 6:30-7:15 • Nvidia: 7:15 – 7:20 • Kinetica: 7:20 – 7:35 • MapD: 7:35 – 7:50 • H2O: 7:50 – 8:05 • NVIDIA Deep Learning: 8:05-8:15 • Wrap-up: 8:15-8:30

    1
  • #OCBigData Meetup #24

    Trace 3

    5:45 - 6:45 Socialize over food and adult beverages Elastic Stack - Machine Learning Speaker - Henry Pak, Elastic Data sets keep growing in size and complexity. Spotting infrastructure problems, cyber attacks, or business issues using only dashboards or rules become increasingly difficult as your data grows. Learn how the X-Pack Machine Learning feature can model the typical behavior of your time series data in real time to identify anomalies, streamline root cause analysis, and reduce false positives using an unsupervised approach. Henry Pak is a Solutions Architect for Elastic based out of Los Angeles, CA. With a focus on data analytics and integration, Henry has been helping enterprises across a wide range of verticals more easily access and derive meaningful information from their data Machine Learning and Analytics at Big Data Scale with the Vertica Analytics DatabaseBlurb Speaker - James Chien, HPE Vertica Learn how Vertica can provide superior speed, scale, concurrency, and advanced in-database machine learning all with a familiar SQL-based engine so your organization can analyze data at a lower TCO than alternatives.Come find out what many of the most data-centric companies in the world already know - Vertica is the secret sauce to Big Data Analytics. With machine learning, and predictive analytics being the talk of the tech industry this year, a lot of organizations are still struggling to get a handle on all of their data. Join us as we explore the emergence (or reemergence) of machine learning and how data-management practices are evolving to keep up with the new speed and scale of business through continuous, streaming applications and real-time predictive capabilities. What you will learn: - How Vertica allows for analytics to evolve from descriptive analytics to predictive and prescriptive analytics - How you can leverage SQL on large data sets for advanced analytics -How leading firms are overcoming data science talent shortages through adoption of new technology. James Chien is a Solutions Architect with HPE Vertica and has been in the Data Warehousing and BI space for over 15 years. When he’s not dealing with corporate analytics, he’s analyzing card players at the poker room. *******************************************************************

  • #OCBigData Meetup #23

    Trace 3

    5:45 - 6:45 Socialize over food and adult beverages 6:45 - 7:30: Ask the Expert: Data Science Workbench, Cloudera 7:30-8:15: Do You Still Know someone Using Excel, Paxata ****************************************************************** Ask the Expert: Data Science Workbench Mike Harding, Sr. Sales Engineer, Cloudera Data Science and machine learning are unlocking new value in our data. A quickly evolving ecosystem of powerful open-source tools coupled with big data systems like Apache Hadoop can drive better model accuracy and deliver more strategic analytic capabilities. However today, most data science teams work away from the Hadoop cluster, often on their laptops or in data silos. This limits the data that can be used in data science research and creates operational overhead and gaps in security. In this talk, we will provide background and a demo of the Data Science Workbench. We'll introduce an enterprise data science platform that accelerates analytics projects from exploration to production. Mike will explore why collaborative, customizable, self-service access is critical for data scientists to secure Hadoop environments via Python, R, and Scala. Ask the Expert: Do You Still Know Someone Using Excel? Ken Oakley, Sr. Sales Engineer, Paxata One of the biggest challenges with large amounts of data is getting it ready for analytics and decision making. The most time consuming part of every analytic exercise continues to be in combining, cleaning, and shaping varied data sets into actionable information worthy of being analyzed. Ken will demonstrate how to make it easier and faster to prepare data for analytics in a visual, intuitive data preparation platform that provides collaboration and security. Ken has an extensive background in big data, previously managing information delivery at Netflix, eBay and Visa

    1
  • #OCBigData Meetup #22

    Trace 3

    5:45 - 6:45 Socialize over food and adult beverages 6:45 - 7:30: ETL Modernization, Diyotta (https://www.diyotta.com/) 7:30-8:15: TBD ****************************************************************** ETL Modernization, Sanjay Vyas,Co-founder and Chief Customer Officer, Diyotta The traditional data integration approach of Extract, Transform, and Load (ETL) was designed 20+ years ago for a point-in-time approach to managing lower volumes and processes. Traditional ETL tools were never designed to handle the various tenets of the modern data landscape - mainly data originating anywhere, transformed anywhere, and loaded or published anywhere. Due to their architecture, ETL tools create performance bottlenecks that result in sub-optimized approaches to solving the modern needs for data integration. Companies like Sprint, Scotiabank, Philip Morris International, and others greatly benefit from next generation modern data integration architectures that allow them to build scalable data solutions making them much more conducive and responsive to their business demands. In this discussion, we will examine how these organizations are modernizing their data integration operations and reducing the cost of data integration, while improving overall performance, agility, and scalability. More about Sanjay Vyas Sanjay is Chief Customer Officer and a co-founder of Diyotta. With more than 17 years of experience in data management and architecture, he is responsible for business leadership and delivering on Diyotta’s vision towards modern data integration for big data. Prior to founding Diyotta, Sanjay was a senior architect at Bank of America, where he drove the data strategy for the entire enterprise risk platform. Previously, he held various information management leadership positions and lead projects at companies like Time Warner Cable, TIAA-CREF, Pitney Bowes, Microsoft, and Hewlett Packard.

    4
  • #OCBigData Meetup #21

    Trace 3

    5:45 - 6:45 Socialize over food and adult beverages 6:45 - 7:30: Learn how to get a functioning Hadoop cluster on bare metal Greg Bruno, StackIQ (https://www.stackiq.com/) 7:30-8:15: Data Prep using Spark Kumar Kayaram, Paxata (http://www.paxata.com/) Paxata (http://www.paxata.com/) will be sponsoring the food for this event. ******************************************************************* Speaker: Greg Bruno (http://t.sidekickopen04.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX48q5T4zW1q0Qn21qwvvvVQBb2Y56dS7ldcq1r202?t=https%3A%2F%2Fwww.linkedin.com%2Fin%2Fgreg-bruno-46558584%2F&si=4747341534330880&pi=3f9d2566-b39a-48c2-ee54-b3c1b8113a1b), VP Engineering and co-founder Topic: Step 1 of every Hadoop vendor’s documentation reads something like this: “First install a cluster.” Without a consistent group of installed machines, a Hadoop installation is prone to failure. Architected, developed, and built completely in the open, the Hortonworks Data Platform (HDP) provides Hadoop designed to meet the needs of enterprise data processing. The deployment of HDP on a cluster is a non-trivial task. And while Ambari is used to deploy HDP on a cluster, Ambari itself needs to be set up on a cluster too. Stacki automates the deployment of Ambari in a few simple steps. Stacki is an open source bare metal provisioning tool that installs machines to a ping and a prompt enabling the consistency and configuration required for modern applications, including Hadoop. The StackIQ (http://t.sidekickopen04.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX48q5T4zW1q0Qn21qwvvvVQBb2Y56dS7ldcq1r202?t=http%3A%2F%2Fwww.stackiq.com%2F&si=4747341534330880&pi=3f9d2566-b39a-48c2-ee54-b3c1b8113a1b) engineering team recently released an open source Stacki Pallet for Hortonworks (http://t.sidekickopen04.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX48q5T4zW1q0Qn21qwvvvVQBb2Y56dS7ldcq1r202?t=http%3A%2F%2Fstackiq.com%2Fhortonworks%2F&si=4747341534330880&pi=3f9d2566-b39a-48c2-ee54-b3c1b8113a1b), which provides the software necessary to easily deploy Ambari and then HDP on a cluster. This presentation will demonstrate how to the Stacki Pallet for Hortonworks can be used to give you a functioning Hadoop cluster on bare metal. You will learn how to set up Stacki (http://t.sidekickopen04.com/e1t/c/5/f18dQhb0S7lC8dDMPbW2n0x6l2B9nMJW7t5XX48q5T4zW1q0Qn21qwvvvVQBb2Y56dS7ldcq1r202?t=http%3A%2F%2Fstackiq.com%2Fdownloads%2F&si=4747341534330880&pi=3f9d2566-b39a-48c2-ee54-b3c1b8113a1b), the Pallet, Ambari, and then install Hadoop on a running cluster. You can download the necessary ISOs for the Pallet and view the documentation on the GitHub Repo: https://github.com/StackIQ/stacki-hdp-bridge Speaker: Kumar Jayaram (https://www.linkedin.com/in/kumarjayaram/), Paxata (http://www.paxata.com) Interactive applications on Spark? The how, what and why of taking Spark to the next level. Paxata is built to satisfy those who want to dramatically increase their productivity of ever-increasing data volumes while reducing the trap of data chaos. Business analysts work within an intuitive, visual, self-service data preparation application to gather, prepare and publish data with clicks, not code, with complete governance and security. IT teams administer the scale of data volume and variety, data sources, and business scenarios for both ad-hoc and repeatable data service needs.

    5
  • #OCBigData Meetup #20

    Trace 3

    5:45 - 6:45 Socialize over food and adult beverages 6:45 - 8:15: Apache Mahout: What’s next? Since v0.10 Apache Mahout has shifted focus to enabling mathematicians and data scientists to quickly deploy their own distributed algorithms using a Scala DSL that provides R-Like semantics for matrix operations, which is backend agnostic and runs on Apache Spark, Apache Flink, and others. Up-comping release v0.13 will introduce the ability to run algorithms in hybrid Spark-GPU enabled environments which promises significant gains in computational speed. Finally, we will be demoing “Apache Mahout: Beyond MapReduce” by creating our own algorithms and visualizing the results interactively in Zeppelin. Trevor Grant is committer on the Apache Mahout project and Open Source Technical Evangelist at IBM. He holds an MS in Applied Math and an MBA from Illinois State University, and used to call himself a ‘data scientist’ before it was cool. Trevor is an organizer of the newly formed Chicago Apache Flink Meet Up, and has presented at Flink Forward, ApacheCon, Apache Big Data, and other meetups nationwide. IBM (http://www.ibm.com) will be sponsoring the food for this event.

  • #OCBigData Meetup #19

    Trace 3

    5:45 - 6:45 Socialize over food and adult beverages 6:45 - 7:30 A purpose built In-Memory NoSQL database for IoT Basavaraj Soppannavar - Strategy Analyst @ Toshiba (http://taec.toshiba.com/) 7:30 - 8:15 Enterprise-grade IT operations on Hadoop Maxime Dumas, Field Engineer @ Rocana (https://www.rocana.com/) Toshiba (http://taec.toshiba.com/) will be sponsoring the food for this event.

    8
  • #OCBigData Meetup #18

    Trace 3

    5:45 - 6:45 Socialize over food and adult beverages 6:45 - 7:30 Enterprise-grade Analytics for Hadoop Peter Reale, Sr. Solutions Engineer @ Datameer (http://www.datameer.com/) 7:30 - 8:15 Enterprise-grade Foundation for Hadoop Carlos Ortega, SE @ MapR (http://www.mapr.com/) Datameer (http://www.datameer.com/) will be sponsoring the food for this event.

    8
  • #OCBigData Meetup #17

    Trace 3

    5:45 - 6:45 Socialize over food and adult beverages 6:45 - 7:30 Business Intelligence for Humans – Search Driven Analytics for the Enterprise Christos Mousouris - Senior Sales Engineer @ ThoughtSpot (http://www.thoughtspot.com/) 7:30 - 8:15 Hadoop for Humans – How to Simplify Your Big Data Integrations Ravi Dharnikota - Head of Enterprise Architecture @ SnapLogic (http://www.snaplogic.com/) ThoughtSpot (http://www.thoughtspot.com/) will be sponsoring the food for this event.

    8