• Online: Enabling Cross-Boundary Data Science with Privacy Enhancing Tech

    As data protection becomes more important, it becomes more and more challenging to easily work with such data. How do you both secure data while making it accessible? With homomorphic encryption, users can do just that. Join us in October to hear from one of the leaders in data protection on how data scientists can work with encrypted data.

    Agenda
    -------------------------------------------------
    6:00 PM -- Greetings

    6:05 PM -- Enabling Cross-Boundary Data Science with Privacy Enhancing Technologies - Ryan Carr

    7:30 PM -- Closings

    Location
    -------------------------------------------------
    Zoom and YouTube Streaming
    A link will be sent out prior to the event. Please note that Zoom is capped at 100, so if you do not get into Zoom, you will be able to watch via YouTube.

    Talks
    -------------------------------------------------
    Enabling Cross-Boundary Data Science with Privacy Enhancing Technologies
    Recent breakthroughs in Privacy Enhancing Technologies (PETs) have made it possible to build systems that can keep data encrypted for the entire processing lifecycle. These advances can uniquely enable data scientists to operate on data sets that they otherwise wouldn't be able to access due to an organizational "boundary," such as a security classification or regulatory barrier. This talk will provide a brief introduction to PETs and a detailed walk-through of two algorithms that leverage PETs for data science use cases.

    Speakers
    -------------------------------------------------
    Ryan Carr serves as CTO and VP of Engineering at Enveil, the pioneering Privacy Enhancing Technology company protecting Data in Use. With experience in leading engineering efforts at institutions such as the Johns Hopkins University Applied Physics Laboratory, Ryan’s fields of expertise include large scale analytic systems, distributed algorithms, artificial intelligence, game theory and social learning, and applying cloud computing techniques to simulate and analyze complex interactions among large numbers of autonomous agents. His research in these areas has been published in highly competitive venues such as Proceedings of the Royal Society, AAAI, and AAMAS. Ryan holds a PhD/BS in Computer Science. Ryan can be reached on Twitter at @jryancarr

    Resources
    -------------------------------------------------
    Enveil - www.enveil.com

    1
  • Online: Idiomatic Pandas

    Online event

    Doing data science is already hard. Doing it right is even harder. Join us in September to hear from O'Reilly author Matt Harrison how to write better Pandas code.

    Agenda
    -------------------------------------------------
    6:00 PM -- Greetings

    6:05 PM -- Idiomatic Pandas - Matt Harrison

    7:30 PM -- Closings

    Location
    -------------------------------------------------
    Zoom and YouTube Streaming
    A link will be sent out prior to the event. Please note that Zoom is capped at 100, so if you do not get into Zoom, you will be able to watch via YouTube.

    Preparation
    -------------------------------------------------
    This is not an introduction to Pandas so attendees may want to familiarize themselves with Pandas prior to the talk.

    Talks
    -------------------------------------------------
    Idiomatic Pandas
    Pandas can be tricky, and there is a lot of bad advice floating around. This talk will cut through some of the biggest issues I've seen with Pandas code after working with the library for a while and writing two books on it.

    Speakers
    -------------------------------------------------
    Matt is a world-renown expert on Python and Data Science. He has a CS degree from Stanford University. He is a best-selling author on Python and Data subjects. His books, Illustrated Guide to Learning Python 3, Intermediate Python, Learning the Pandas Library, and Effective PyCharm have all been best-selling books on Amazon. He just published Machine Learning Pocket Reference and Pandas Cookbook (Second Edition). He has taught courses at large companies (Netflix, NASA, Verizon, Adobe, HP, Exxon, and more), Universities (Stanford, University of Utah, BYU), as well as small companies. He has been using Python since 2000 and has taught thousands through live training both online and in person. Matt can be reached out Twitter @__mharrison__

    Resources
    -------------------------------------------------
    Pandas - https://pandas.pydata.org/
    Matt's Author Page - https://www.oreilly.com/people/matt-harrison/
    MetaSnake - https://www.metasnake.com/

    8
  • Estimating Lottery Revenue on a Quantum Computer

    Betamore

    Ever wonder how you actually use a quantum computer? What does the code to run a quantum computer program look like? Find out in August, as we return to in-person events and kick-off with a discussion on quantum computing.

    Agenda
    -------------------------------------------------
    6:00 PM -- Food and Drink

    6:30 PM -- Greetings

    6:35 PM - Data Science Corps Project - Amivi Atsu

    6:45 PM -- Estimating Lottery Revenue on a Quantum Computer - Stephen Penn, DM, PMP

    7:45 PM -- Closings

    Location
    -------------------------------------------------
    Betamore City Garage
    101 W Dickman Street
    Baltimore, MD 21230

    COVID Protocols
    -------------------------------------------------
    We ask that members continue to be cautious at the event and respectful of one another. We will do our best to provide ample room for attendees with ample sanitizer. Masks will be required.

    Parking
    -------------------------------------------------
    There is ample free parking surrounding the building.

    Food and Drinks
    -------------------------------------------------
    Complimentary food and drink will be provided.

    Talks
    -------------------------------------------------
    Estimating Lottery Revenue on a Quantum Computer
    Learning quantum computing can be daunting due to the need to learn matrix algebra and probabilities. However, that doesn’t have to be the case. You can solve existing business problems with just college algebra and python. This presentation will show how an optimization problem can be set up on a quantum computer. The business problem is based on a state lottery trying to minimize sales (consumer taxes) in specific locations, while maximizing state-wide total revenue. This presentation offers a first-step approach to learning quantum computing by walking through code that runs on a D-Wave adiabatic quantum machine.

    Speaker
    -------------------------------------------------
    Dr. Stephen Penn earned his Doctor of Management from the University of Maryland University College. His dissertation focused on data-driven decision making. Stephen has worked in Information Technology for almost thirty years, specializing in data warehousing and analytics. He is currently an Associate Professor at Harrisburg University and the Program Lead for the undergraduate business program.

    Resources
    -------------------------------------------------
    D-Wave - https://www.dwavesys.com/
    Dr. Penn @ Harrisburg - https://www.harrisburgu.edu/about/our-people/faculty-staff/stephen-penn/

    5
  • Online: Introducing Datawave - Scalable Data Ingest and Query

    Big data storage can be challenging. Complex data models, scalability issues, and working with both structured and unstructured data. With Datawave, many of these issues are addressed with a flexible, scalable, and robust architecture that utilizes proven technologies such as Accumulo. Join us in July to learn what Datawave is and how it can help solve your big data needs.

    Agenda
    -------------------------------------------------
    12:00 PM -- Greetings

    12:05 PM -- Introducing Datawave - Scalable Data Ingest and Query - Hannah Pellón

    1:30 PM -- Closings

    Location
    -------------------------------------------------
    Zoom and YouTube Streaming
    A link will be sent out prior to the event. Please note that Zoom is capped at 100, so if you do not get into Zoom, you will be able to watch via YouTube.

    Talks
    -------------------------------------------------
    Introducing Datawave: Scalable Data Ingest and Query on Apache Accumulo
    Out of the box, Accumulo's strengths are difficult to appreciate without first building an application that showcases its capabilities to handle massive amounts of data. Unfortunately, building such an application is non-trivial for many would-be users, which affects Accumulo's adoption.

    In this talk, we introduce Datawave, a complete ingest, query, and analytic framework for Accumulo. Datawave, recently open-sourced by the National Security Agency, capitalizes on Accumulo's capabilities, provides an API for working with structured and unstructured data, and boasts a robust, flexible, and scalable backend.

    We'll do a deep dive into Datawave's project layout, table structures, and APIs in addition to demonstrating the Datawave quickstart—a tool that makes it incredibly easy to hit the ground running with Accumulo and Datawave without having to develop a complete application.

    Speaker
    -------------------------------------------------
    Hannah Pellón received her B.S. in Mathematics from the University of Maryland while working as a software engineering intern at Northrop Grumman conducting RF signal analysis and spectrometry. She spent 11 years at Northrop Grumman thereafter contributing to IR&D efforts and programs centered around Accumulo and Hadoop. She is currently a software developer and lead at Tiber Technologies focusing on Datawave and distributed computing technologies

    Resources
    -------------------------------------------------
    Datawave - https://code.nsa.gov/datawave/

    1
  • Online: Graph Analytics - Rich Relationships and Powerful Insights

    John returns for another action-packed live demo featuring graph analytics! Grab your laptop and headphones, and get ready for a tour of how graph analytics can be used for a range of problems.

    Agenda
    -------------------------------------------------
    6:00 PM -- Greetings

    6:05 PM -- Graph Analytics - Rich Relationships and Powerful Insights - John Hebeler

    7:30 PM -- Closings

    Location
    -------------------------------------------------
    Zoom and YouTube Streaming
    A link will be sent out prior to the event. Please note that Zoom is capped at 100, so if you do not get into Zoom, you will be able to watch via YouTube.

    Talks
    -------------------------------------------------
    Graph Analytics: Rich Relationships = Powerful Insights
    Demo-Driven exploration of graph analytics to identify criminals, discover trolls, analyze social networks, and more using community detection, centrality, link prediction, and graph embedding for incorporation into machine learning models, along with creating graphs from Wikipedia via wikidata. Includes all you need to get started and a review and use of two graph query languages – cypher and SPARQL across multiple environments including Neo4J, Amazon Neptune, and Nvidia Cuda graphs for large scale graph processing using GPUs. Relationships are what it is all about

    Speakers
    -------------------------------------------------
    John Hebeler, Fellow for Lockheed Martin, is a developer of large scale, data-driven solutions using machine learning, graph analytics, and high-speed messaging across computer resources that reach from the clouds to the edge. Along the way he writes, presents, and teaches - (mostly learns and plays). He holds a Phd in Information Systems, an MBA, and a BSEE.

    8
  • Online: The Role of Data During Apocalyptic Times

    Online event

    Data drives decisions. As the world dealt with COVID, data was being gathered to help decision-makers respond to the crisis. Join us in April to learn how data was used by JHU to track and manage COVID.

    Agenda
    -------------------------------------------------
    1:00 PM -- Greetings

    1:05 PM -- The Role of Data during Apocalyptic Times - John Piorkowski

    2:30 PM -- Closings

    Location
    -------------------------------------------------
    Zoom and YouTube Streaming
    A link will be sent out prior to the event. Please note that Zoom is capped at 100, so if you do not get into Zoom, you will be able to watch via YouTube.

    Talks
    -------------------------------------------------
    The Role of Data during Apocalyptic Times
    The COVID-19 pandemic is the most profound health crisis to impact the United States and the world in the past 100 years. One critical challenge since the beginning of the pandemic included accurate models to inform organizations’ responses. Numerous models and analytics emerged to address disease spread, hospital utilization, PPE demand and allocation, vaccine allocation, and mortality. Underlying these models and analytics’ efficacy is the need for quality data that provides a high degree of trust. This talk will describe our experiences at the Johns Hopkins University/ Applied Physics Laboratory since the beginning of the COVID-19 pandemic in curating and building high-quality data pipelines to inform the global response.

    The talk will set the stage by describing the critical role high-quality data plays in other national security applications and medicine. In partnership with Johns Hopkins Medicine, we developed a precision medicine analytic platform that provides essential data for clinical research across many diseases to include COVID. Finally, the talk will describe the data wizardry underlying the JHU Covid Resource Center and the U.S. Government response since the beginning of the pandemic.

    Speakers
    -------------------------------------------------
    Dr. John Piorkowski serves as the Chief Artificial Intelligence Architect and Applied Information Sciences Branch Head within the Asymmetric Operations Sector at the Johns Hopkins University Applied Physics Laboratory. In these roles, he provides technical oversight and technical staff management for a multitude of national security and healthcare efforts.

    Dr. Piorkowski also serves as the chair for the artificial intelligence and co-chair for the data science programs in the Whiting School of Engineering at Johns Hopkins University. As an adjunct faculty member, he teaches courses in social media analytics and artificial intelligence.

    Dr. Piorkowski received a B.S. in electrical engineering from The Pennsylvania State University, an M.S. in electrical engineering from the Johns Hopkins University, and a Ph.D. in information systems from UMBC.

    1
  • Online: Data Science Product Management

    Online event

    Product management is vital to create a successful organization that can build successful products. But how does product management work with data science? Join us in March to hear how product management works and doesn't work with data science.

    Agenda
    -------------------------------------------------
    12:00 PM -- Greetings

    12:05 PM -- Data Science Product Management: The Highs, Lows, and "Oh No"s - Matt LeMay

    1:30 PM -- Closings

    Location
    -------------------------------------------------
    Zoom and YouTube Streaming
    A link will be sent out prior to the event. Please note that Zoom is capped at 100, so if you do not get into Zoom, you will be able to watch via YouTube.

    Talks
    -------------------------------------------------
    Data Science Product Management: The Highs, Lows, and "Oh No"s
    When put into service solving customer needs, data science can be a critical differentiator for digital products in ever-more-competitive markets. But productizing data science presents a unique set of challenges, and often leaves product managers and data scientists struggling to find common ground and a shared language.

    In this talk, product coach and consultant Matt LeMay shares the lessons he's learned building bridges between product management and data science at companies like Bitly, Songza, and Spotify. Expect a candid, direct, and entertaining conversation about mistakes made, lessons learned, and suggestions for how to move forward.

    Speakers
    -------------------------------------------------
    Matt LeMay is the author of Agile for Everybody (O’Reilly Media, 2018) and Product Management in Practice (O'Reilly Media, 2017). He has helped build and scale product management practices at companies ranging from early-stage startups to Fortune 500 enterprises. Matt was selected as a Top 50 Product Management influencer by the PM Year in Review for both 2016 and 2015. Matt is co-founder and partner at Sudden Compass, a consultancy that has helped organizations like Spotify, Clorox, and Procter & Gamble put customer centricity into practice. In his work as a technology communicator, Matt has developed and led digital transformation and data strategy workshops for companies like GE, American Express, Pfizer, McCann, and Johnson & Johnson. Matt can be reached at @mattlemay on Twitter, [masked], https://linkedin.com/in/mattlemay

    Company
    -------------------------------------------------
    Sudden Compass
    Sudden Compass co-creates the data strategy you need to move at the speed of your customers and implements and scales a practice that empowers teams to rapidly generate and activate insights. Learn more at https://www.suddencompass.com/

    4
  • Online: ML Design Patterns and Designing ML Infrastructure

    Designing, building, deploying, and scaling ML systems can be challenging. By utilizing design patterns, engineers can leverage the best practices that have been proven to be successful. Join us in February to learn about several ML design patterns and their use in production systems.

    Agenda
    -------------------------------------------------
    6:00 PM -- Greetings

    6:05 PM -- ML Design Patterns and Designing ML Infrastructure - Lak Lakshmanan

    7:30 PM -- Closings

    Location
    -------------------------------------------------
    Zoom and YouTube Streaming
    A link will be sent out prior to the event. Please note that Zoom is capped at 100, so if you do not get into Zoom, you will be able to watch via YouTube.

    Talks
    -------------------------------------------------
    ML Design Patterns and Designing ML Infrastructure
    Design patterns are formalized best practices to solve common problems when designing a software system. As machine learning moves from being a research discipline to a software one, it is useful to catalog tried-and-proven methods to help engineers tackle frequently occurring problems that crop up during the ML process. In this talk, I will cover five patterns (Workflow Pipelines, Transform, Multimodal Input, Feature Store, Cascade) that are useful in the context of adding flexibility, resilience and reproducibility to ML in production. For data scientists and ML engineers, these patterns provide a way to apply hard-won knowledge from hundreds of ML experts to your own projects.

    Anyone designing infrastructure for machine learning will have to be able to provide easy ways for the data engineers, data scientists, and ML engineers to implement these, and other, design patterns.

    Speakers
    -------------------------------------------------
    Lak is the Director for Data Analytics and AI Solutions on Google Cloud. His team builds software solutions for business problems using Google Cloud's data analytics and machine learning products. He founded Google's Advanced Solutions Lab ML Immersion program and is the author of three O'Reilly books and several Coursera courses. Before Google, Lak was a Director of Data Science at Climate Corporation and a Research Scientist at NOAA. Lak can be reached on Twitter at @lak_gcp

    Resources
    -------------------------------------------------
    Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps 1st Edition
    https://www.amazon.com/Machine-Learning-Design-Patterns-Preparation/dp/1098115783

    2
  • Online: Malware Detection, Enabled by Machine Learning

    Online event

    As malware becomes more sophisticated, new machine learning techniques and tools are needed in order to keep pace. Join us for our first talk of 2021 to learn how analysts can be kept informed through an automated machine learning process.

    Agenda
    -------------------------------------------------
    12:00 PM -- Greetings

    12:05 PM -- Malware Detection, Enabled by Machine Learning - Tina Coleman

    01:30 PM -- Closings

    Location
    -------------------------------------------------
    Zoom and YouTube Streaming
    A link will be sent out prior to the event. Please note that Zoom is capped at 100, so if you do not get into Zoom, you will be able to watch via YouTube.

    Talks
    -------------------------------------------------
    Malware Detection, Enabled by Machine Learning
    With the scale of new malware being created each year growing, as well as the expanding market opportunities for malware reuse, protecting systems can’t rely solely on downloading a vendor’s updated virus signature files. Our customers need ways to detect and cordon likely threats, by using data retrieved from a combination of static and behavioral characteristics, and comparing it to other classes of “good” versus “bad” files. Optimally, the solution cordons risky files, force ranks them according to their likelihood of causing harm, correlates some metadata to help with further learning and to provide context to analysts, and lets an analyst “release” a file after further analysis and a request from a user. Oh, with that feedback relayed back into the model to support further tuning.

    This talk will delve into IRAD efforts ClearEdge is doing on building and integrating malware detectors using machine learning algorithms.

    Speakers
    -------------------------------------------------
    Tina Coleman is a Technical Director for ClearEdge. In that role, she’s accountable for furthering the company’s depth in cybersecurity, particularly in aspects that allow ClearEdge to build solutions that scale for customer needs using its strengths in software engineering, dev ops, and data science. In addition to her work on contract and as a Technical Director, Ms. Coleman leads the Women In Technology program for ClearEdge, which seeks to encourage the participation and retention of women in technology. Ms. Coleman graduated from UMBC with undergraduate degrees in Computer Science and Economics and is currently pursuing her Masters in Cybersecurity Technology from University of Maryland, Global Campus. Tina can be found on LinkedIn at https://www.linkedin.com/in/tinadcoleman/

    7