Let's get together for another Hackathon! The goal is to use publicly available datasets to extract new insights. It's pretty amazing how much we can learn with some very basic exploratory data analysis in R, from there, the sky is the limit.
We'll start with Elke giving us an example of how she did a really awesome project on small business in San Diego using data from California's Employment Development Department.
If we have time, Juliana will also give an example of how you can use natural language processing with freely available abstracts from scientific papers to learn a bunch of new sciency things!
After that we'll divide into small groups to analyze other datasets or further explore one of the topics above.
You're welcome to use any dataset that interests you. Try google dataset search to see if anything out there sparks your curiosity. Kaggle also has loads of cool datasets available.
We'll end the evening sharing what the different groups have done.
Hope to see you all soon!
You can find links to the resources mentioned above below:
California EDD Labor Market Information Resources and Data: https://www.labormarketinfo.edd.ca.gov/LMID/Size_of_Business_Data.html
Biomedical natural language processing datasets and models: http://bio.nlplab.org/
Google's dataset search: https://datasetsearch.research.google.com/
Kaggle datasets: https://www.kaggle.com/datasets