• LA R Users: April Meeting 2021 - Emily Riederer - Column Names as Contracts

    LA R Users Group will have our April meeting online on Zoom (https://usc.zoom.us/j/92828963940).

    ***Column Names as Contracts***

    In this talk, I will explain how controlled vocabularies can be used to form contracts between data producers and data consumers. Explicitly embedding meaning in each component of variable names is a low-tech and low-friction approach which builds a shared understanding of how each field in the dataset is intended to work.
    Doing so can offload the burden of data producers by facilitating automated data validation and metadata management. At the same time, data consumers benefit by a reduction in the cognitive load to remember names, a deeper understanding of variable encoding, and opportunities to more efficiently analyze the resulting dataset.
    After discussing the theory of controlled vocabulary column-naming and related workflows, I will demonstrate how to implement the creation, upkeep, and application of controlled vocabularies with various softwares, including the R package {convo} and custom SQL templating and testing with dbt.

    Bio:
    Emily Riederer is a Senior Analytics Manager at Capital One where she leads a team to develop and sustain data products (datamarts, R packages, and dashboards) for business users. Emily blogs at https://emilyriederer.netlify.com/ , is a contributing author to The R Markdown Cookbook (CRC Press) and 97 Things Every Data Engineer Should Know (O’Reilly - forthcoming), and the developer of the R packages projmgr and convo.

    **Code of Conduct**: https://github.com/laRusers/codeofconduct

    **LA R Users Group**
    Invite yourself to our Slack group: https://socalrug.herokuapp.com/
    Ask us any questions by email: [masked]
    Find our previous talks on GitHub: https://github.com/laRusers/presentations
    Follow us on Twitter: @la_Rusers
    Check out more events: https://socalr.org/

    ***** DON'T BE SHY *******
    Reach out if you want to be our speaker. First-time speakers are welcome

    1
  • [OCRUG X-Post] Virtual Data Science Hackathon - **Signup on Eventbrite

    Cross-post for OCRUG:
    ***REGISTER ON EVENTBRITE***
    https://www.eventbrite.com/e/ocrug-virtual-data-science-hackathon-tickets-144355327671

    ******************************************************************

    We are excited to announce that the second Orange Country R Users Group (OCRUG) data science hackathon now has tickets available. It is a weekend-long event.

    During this event, we will be exploring an interesting data set, starting from the raw data all the way through to a final "end product" (e.g. a data visualization, an interesting data insight, a predictive model, etc.).

    This event is open to all experience levels, from the complete beginner to the highly experienced. Participants will work in teams to "hack" through the data and will present their work at the end of the event for prizes. You are free to create your own team, or we will assist in team creation if you do not have one. This is a great way to network, share, and learn from others.

    The main goal of the hackathon is to promote teamwork and foster education and learning in a welcoming environment. Though teams will be competing to present their best work at the end of the event for prizes, communication and providing assistance across teams is highly encouraged. We will provide in-person training during the hackathon and online training prior to the event.

    We have set up a GitHub repository that has more information about the hackathon. Such as the schedule; setting up slack and much more. Further information will be provided prior to the event.

    All participants must abide by the OCRUG Code of Conduct and R Consortium Code of Conduct.

    This will be an online event.

  • LA R Users: Mar Meeting 2021

    Online event

    LA R Users Group will have our March meeting online on Zoom (https://usc.zoom.us/j/92828963940).

    **Talk 1**

    In this talk, attendees will get an introduction to writing command-line interfaces in R. The talk begins with a discussion of fundamental command-line topics like managing output streams, exit codes, and environment variables. After that, the talk will explain each argument to the "Rscript" executable. The talk will then describe how to write your own R programs that take in command-line arguments, using builtin functions like `commandArgs()` and external packages like {argparse} and {crayon}. The talk concludes with a demo of two R command-line interfaces used in the {lightgbm} project: one for running linting with {lintr} in continuous integration and one for building a C++ library using CMake. Sample code will be available on GitHub prior to and following the talk.

    **Speaker**: James Lamb

    James Lamb is an engineer at Saturn Cloud, where he works on a team building a managed Dask + Kubernetes product. He is a maintainer on {lightgbm}, and has made many contributions to other open source data science projects, including {xgboost} and prefect. He is also a maintainer and co-author of two other packages on CRAN: {pkgnet} and {uptasticsearch}. He holds masters degrees in Applied Economics (2014) and Data Science (2018). Before joining Saturn, he worked as an IoT Data Scientist at Amazon Web Services and Uptake.

    **Talk 2**

    While working remotely, I often have client projects where RStudio is unavailable for data security reasons or otherwise. R is feature-rich environment for terminal users as well, and I will discuss the underappreciated
    functions around highlighting, graphics, editor integration, and other nice-to-haves for day-to-day
    usage of the terminal.

    **Speaker**: Neal Fultz

    Neal Fultz is a long-time friend of the LA RUG/Tech community. Check out his Github https://github.com/nfultz or his past talks at https://nfultz.github.io/talks

    **Schedule**
    6:20 Open Zoom
    6:25 Breakout room social
    6:30 James
    7:00 Neal
    7:20~8:00 Q&A and social (Combined)

    **Code of Conduct**: https://github.com/laRusers/codeofconduct

    **LA R Users Group**
    Invite yourself to our Slack group: https://socalrug.herokuapp.com/
    Ask us any questions by email: [masked]
    Find our previous talks on GitHub: https://github.com/laRusers/presentations
    Follow us on Twitter: @la_Rusers
    Check out more events: https://socalr.org/

    ***** DON'T BE SHY *******
    Reach out if you want to be our speaker. First-time speakers are welcome

  • Unified Data Applications with Shiny on Delta Lakes

    Online event

    **Talk**

    Shiny is arguably the most popular framework among data scientists for building advanced data apps. Once you build the Shiny application two questions loom: how to keep the input data up-to-date, and where to host it. The answers are often related. For example, if your data is in an on-premise database, you cannot host your app in the cloud. As data grows and changes, keeping the input to the application becomes more challenging. I have seen many enterprise users, implementing a two-step architecture. They run regular batch jobs to fetch and summarize data from their data lake or data warehouse into a staging environment. The Shiny app, then, loads staged data and presents advanced analytics to end-users.

    In this talk, I will show how you can use Apache Spark and the Delta Lake open source projects in your Shiny applications to directly load data from the Lake House. I will discuss how such a unified approach can remove several "moving parts" and simplify your work. I will present what the Lake House architecture is, and how R programs can interact with the Lake House using SparkR or sparkly. I will also demo examples of interactively developing and hosting simple Shiny apps that access large data on Databricks.

    **Speaker**: Hossein Falaki

    Currently, I am a staff software engineer at Databricks. I joined Databricks in December 2013 as one of the first software engineers. As an early employee, I had the opportunity to wear different hats including development, product management, data science, and field engineering. I have been presenting my work at leading industry conferences.

    As a software engineer, in addition to regular software development responsibilities, I championed and implemented several key features in Databricks product and contributed to Apache Spark. These include integration with third-party visualization libraries, end-to-end implementation of R Notebooks, integration with SparkR, integration with sparklyr, programmable input widgets, and data ingest UI. I also made several contributions to Apache Spark open-source project, including the CSV data source.

    As a founding member of the data science team, I built our first usage monitoring dashboards using our product and performed several deep dives and advanced analyses on topics of interest to the executive team.

    http://www.falaki.net/

    **Schedule**
    6:20 Open zoom
    6:30-7:30 Presentation
    ~7:30 Virtual Social

    **Code of Conduct**: https://github.com/laRusers/codeofconduct

    **LA R Users Group**
    Invite yourself to our Slack group: https://socalrug.herokuapp.com/
    Ask us any questions by email: [masked]
    Find our previous talks on GitHub: https://github.com/laRusers/presentations
    Follow us on Twitter: @la_Rusers
    Check out more events: https://socalr.org/

    ***** DON'T BE SHY *******
    Reach out if you want to be our speaker. First-time speakers are welcome

  • [OCRUG X-Post] Modeling Normally Distributed Data with Repeated Measures

    Cross-post for OCRUG:
    ***REGISTER ON EVENTBRITE***
    https://www.eventbrite.com/e/ocrug-modeling-normally-distributed-data-with-repeated-measures-tickets-135236340535

    Develop the practical skills and foundation knowledge to effectively use some of the most common regression models used by data scientists.
    About this Event

    OCRUG understands that these are extraordinary times and we endeavour to keep our events free or very low cost. If you would like to attend the event but the registration fee would put a financial strain on you, please reach out to [masked]. There are a limited number of complimentary tickets available.

    # Abstract
    This workshop will give you the practical skills and foundational knowledge to effectively use some powerful regression models used by data scientists. When data are collected on the same subjects repeatedly over time (for example, in clinical trials or cohort studies) or under different conditions (for example, in a designed experiment), the measurements within the same individual are modeled as having correlated values. At the workshop, we will consider several models that can be employed to model a normally distributed response variable. The models that we will consider are: random slope and intercept (mixed-effects) model and generalized estimating equations models with unstructured, autoregressive, compound symmetric (exchangeable), and independent working correlation matrices. All models will be run in R version 4.0.3.

    The course will be structured as follows. For each part, we will first discuss the theory, then work through an example. After that, the participants will work in small groups in break-out rooms to do hands-on exercises to help reinforce the material. All the files and Rstudio will be made available to the participants.

    We would like to use the RStudio Cloud. If you are not familiar with this technology, the participants use a web browser to access RStudio. The environment will be setup and loaded with the code and data that is needed. This way, participants can focus on building models.

    The material covered by the workshop will be taken from my recently published book “Advanced Regression Models with SAS and R Applications” (https://www.amazon.com/dp/1138049018/), CRC Press, 2018.

    # About the Instructor
    Dr. Olga Korosteleva, is a professor of Statistics at the Department of Mathematics and Statistics at California State University, Long Beach (CSULB). She received her Bachelor’s degree in Mathematics in 1996 from Wayne State University in Detroit, and a Ph.D. in Statistics from Purdue University in West Lafayette, Indiana, in 2002. Since then she has been teaching mostly Statistics courses in the Master’s program in Applied Statistics at CSULB, and loving it!

    Dr. Olga is an undergraduate advisor for students majoring in Mathematics with an option in Statistics. She is also the faculty supervisor for the Statistics Student Association. She is also the immediate past-president of the Southern California Chapter of the American Statistical Association (SCASA). Dr. Olga is the editor-in-chief of SCASA’s monthly eNewsletter and the author (co-author) of four statistical books.

    # Schedule
    * 06:30-06:40 Introduction
    * 06:40-07:30 Mixed-effects Model for Normal Response
    * 07:30-07:50 Mixed-effects Model Exercise
    * 07:50-08:00 Mixed-effects Model Solution
    * 08:00-08:10 Break
    * 08:10-08:30 Generalized Estimating Equations (GEE) Model for Normal Response
    * 08:30-08:50 GEE Exercise
    * 08:50-09:00 GEE Solution
    * 09:00-09:30 Additional Exercise and Solution
    * 09:30-09:45 Wrap up

    # Sponsors
    This event is sponsored by the University of California, Paul Merage School of Business. https://merage.uci.edu/

    # Code of Conduct
    https://github.com/ocrug/hackathon-2019-05/blob/master/code-of-conduct.md

    2
  • [OCRUG X-Post] Book club - Mastering Spark with R

    Online event

    ** PLEASE RSVP AT OCRUG. JOIN THE WAITLIST AND ORGANIZER WILL PROVIDE DETAILS **
    https://www.meetup.com/OC-RUG/events/275783285/

    Are you looking to improve our R skills? Come just us for a book club. It will be headed up by John Peach. The idea is to motivate us to read books and discussing them together. This will help all of us to develop our skills.

    The format will be that the leader will prepare a short summary of the material and present it to the group. We will then discuss the material and do some exercises.

    The expectations are that you will attend regularly. If you cannot make a commitment to attend, please wait until the next book club starts. It is also expected that participants will take turns being the leader. The leader should prepare a summary of the material that was read and present it to the group.

    We will meet each week, Monday evening between 6:30 and 8:00 PM for six weeks.

    Book: Mastering Spark with R
    https://therinspark.com/
    https://www.amazon.com/dp/149204637X/

    Session 1:
    * C1: Introduction
    * C2: Getting Started
    * C3: Analysis
    * Exercise

    Session 2:
    * C4: Modeling
    * C5: Pipelines
    * Exercise

    Session 3:
    * C6: Clusters
    * C7: Connections

    Session 4:
    * C8: Data
    * C9: Tuning
    * Exercise

    Session 5:
    * C10: Extensions
    * C11: Distributed R

    Session 6:
    * C12: Streaming
    * C13: Contributing
    * Exercise

  • [OCRUG X-Post] Book club - R for Data Science

    Online event

    ** PLEASE RSVP AT OCRUG**
    https://www.meetup.com/OC-RUG/events/275719547/

    Are you looking to improve our R skills? Come just us for a book club. It will be headed up by Xi Chen. The idea is to motivate us to read books and discussing them together. This will help all of us to develop our skills.

    The format will be that the leader will prepare a short summary of the material and present it to the group. We will then discuss the material and do some exercises. The majority of the time will be working on the exercises.

    The expectations are that you will attend regularly. If you cannot make a commitment to attend, please wait until the next book club starts. It is also expected that participants will take turns being the leader. The leader should prepare a summary of the material that was read and present it to the group.

    We will meet each week, Wednesday evening between 6:30 and 8:00 PM for six weeks.

    Book: R for data science
    https://r4ds.had.co.nz
    https://www.amazon.com/dp/1491910399/

    Session 1: explore
    * data visualization
    * Data transformation with dplyr
    * Workflow: start from a small project
    * Exercise

    Session 2: Wrangle (part I)
    * Dataframe vs tibbles
    * Import data
    * Tidy data with tidyr
    * Exercise

    Session 3: Wrangle (part II)
    * Relational data
    * Strings
    * Factors
    * Dates and time
    * Exercise

    Session 4: Program
    * pipes
    * Function
    * Vector
    * Iteration
    * Exercise

    Session 5: Model
    * Model building
    * Many models with purrr and broom
    * Exercise

    Session 6: Communicate
    * R markdown
    * Graphics
    * Exercise

  • LA R Users: The Download (Recent Updates in the R Markdown Family)

    LA R Users Group will have our December meeting online on Zoom (https://usc.zoom.us/j/92828963940).

    **Talk**

    Title: The Download

    Abstract: The R Markdown family of packages has grown a lot over the past few years! While each new package is truly a bundle of joy, the past few months we have worked hard to make our family of existing packages more consistent, supportive, and intuitive. In this talk, I’ll share some of what we are up to lately and what to expect, with highlights from the distill, blogdown, bookdown, and xaringan packages.

    **Speaker**: Alison Hill

    Alison Hill is a data scientist, behavioral scientist, and an award-winning educator. At RStudio, Dr. Hill works to expand how data scientists can communicate when they use RStudio’s tools for collaborating, sharing, and presenting. Alison loves teaching, and has led advanced workshops on data science communication and machine learning at rstudio::conf, R / Medicine, and R in Pharma. She is also an international keynote speaker (https://alison.rbind.io/talks), co-developer of the palmerpenguins (https://allisonhorst.github.io/palmerpenguins/) and distill (https://rstudio.github.io/distill/) R packages, and co-author of the book blogdown: Creating Websites with R Markdown (https://bookdown.org/yihui/blogdown/)

    **Code of Conduct**: https://github.com/laRusers/codeofconduct

    **Schedule**
    5:20 Open zoom
    5:30-6:30 Presentation
    ~7:20 Virtual Social

    **LA R Users Group**
    Invite yourself to our Slack group: https://socalrug.herokuapp.com/
    Ask us any questions by email: [masked]
    Find our previous talks on GitHub: https://github.com/laRusers/presentations
    Follow us on Twitter: @la_Rusers
    Check out more events: https://socalr.org/

    ***** DON'T BE SHY *******
    Reach out if you want to be our speaker. First-time speakers are welcome

    2
  • [OCRUG X-Post] Portfolio Website Building Tutorial

    Online event

    Cross-post for OCRUG: we encourage you to RSVP at https://www.meetup.com/OC-RUG/events/274071838/

    Are you interested in building a portfolio website to showcase your work? Then come to this tutorial and get hands-on instruction on how to build-out a data analyst/data scientist website. We will be using Rstudio to create the prototype site and publish it using github and netlify.

    Space is limited.

    Before the event, please make sure that you meet the following requirements:
    * A recent version of R and Rstudio
    * Github account - https://github.com
    * Netlify account - https://www.netlify.com/

    Zoom Information:
    https://oracle.zoom.us/j/92244636410?pwd=emlKV1hZK0crTG5CbDN3cEl5U2Fodz09

    Meeting ID:[masked]
    Password:[masked]
    [masked] US (San Jose)

    2
  • LA R Users: Reproducible computation at scale in R with targets

    LA R Users Group will have our October meeting online on Zoom (https://usc.zoom.us/j/92828963940).

    **Talk**

    Title: Reproducible computation at scale in R with targets

    Abstract: Ambitious workflows in R, such as machine learning analyses, can be difficult to manage. A single round of computation can take several hours to complete, and routine updates to the code and data tend to invalidate hard-earned results. You can enhance the maintainability, hygiene, speed, scale, and reproducibility of such projects with the targets R package. targets resolves the dependency structure of your analysis pipeline, skips tasks that are already up to date, executes the rest with optional distributed computing, and manages data storage for you. It surpasses the permanent limitations of its predecessor, drake, and provides increased efficiency and a smoother user experience. This talk demonstrates how to create and maintain a Bayesian model validation project using targets-powered automation.

    Slides: https://wlandau.github.io/larug2020
    Materials: https://github.com/wlandau/larug2020

    **Speaker**: Will Landau

    Will Landau received his PhD in Statistics at Iowa State University in 2016. His dissertation research introduced a novel fully Bayesian, hierarchical model-driven, GPU-accelerated approach to the analysis of heterosis gene expression data (Landau, Niemi, and Nettleton 2019). He currently works at Eli Lilly and Company, where he develops capabilities for clinical statisticians. Will is the creator and maintainer of rOpenSci’s drake R package.

    **Code of Conduct**: https://github.com/laRusers/codeofconduct

    **Schedule**
    6:30 Open zoom
    6:40-7:40 Presentation
    ~8:00 Virtual Social

    **LA R Users Group**
    Invite yourself to our Slack group: https://socalrug.herokuapp.com/
    Ask us any questions by email: [masked]
    Find our previous talks on GitHub: https://github.com/laRusers/presentations
    Follow us on Twitter: @la_Rusers
    Check out more events: https://socalr.org/

    ***** DON'T BE SHY *******
    Reach out if you want to be our speaker. First-time speakers are welcome

    1