LA East R Users: Forecasting with ARIMA in R and Cancer Mutations Scores


Details
Talk 1: Time Series & ARIMA forecasting
Speaker: Anthony Doan
Time series are data that is time dependent and it is everywhere. Planning about the future using statistical models can help companies to maximize their profit, allocate resource efficiently, and find unexpected shortfalls. An example is Uber using the time series model to forecast/predict. Statistical time series models are currently the most accurate class of model out there compare to machine learning time series models and this talk will go over one of the most used statistical model in the industry, ARIMA. The presentation will start by describing the characteristics of time series data, how to identify time series data, how to model using ARIMA, and how finally how to assess the model's performance. The main focus will be on modeling time series data for forecasting/predicting. The example code will be in R.
Anthony has a Bachelor in Computer Science and a Master in Applied Statistic. He has been working in the industry for a little over a decade as a programmer and about three years as a statistical modeler. His previous talk can be found here: https://github.com/mythicalprogrammer/timeseries_meetup
Talk 2: Classification and Statistical Analysis of Cancer Mutations Scores
Speaker: Yemi Odeyemi (Ph.D. Candidate in Data Science at Chapman University)
The talk will describe part of Yemi's doctoral work on building a statistical and predictive model to classify driver-passenger mutations. A Logit model is used with 10-fold cross-validation. The data was preprocessed to impute missing values using the rule-of-thumb approach, removal of redundant features and feature scaling. Feature selection was determined using a stepwise approach based on AIC. The objective was to determine the optimal class boundary for the probability for discretization. The models were evaluated with Receiver Operator Characteristics - Area under the curve (ROC-AUC) which is based on sensitivity and specificity.
Timeline:
6:30: Socializing
7:00-8:00: Talks
8:00: Socializing
Address:
Room 115/116
USC
2001 N. Soto St.
Los Angeles, CA 90032
Invite yourself to our Slack group: http://bit.ly/laerug
Ask us any questions by email: laerusers@gmail.com
Find our previous talks on GitHub: https://github.com/laeRusers/presentations
Follow us on Twitter: @laeRusers
Check out more events: https://laocr.org/

Sponsors
LA East R Users: Forecasting with ARIMA in R and Cancer Mutations Scores