We are using this listing to gauge interest in setting up a Kaggle Club.
If you are interested - could you please introduce yourself in the comments section.
Optionally simply state what tools you use - and maybe what general level you reckon you are at.
("Dont know yet" is an acceptable answer to both questions)
Distributed databases can make so many things easier for a developer... but not always for DevOps. OK, almost never for DevOps. Kubernetes has come to the rescue with an easy application orchestration!
It’s straightforward to do the orchestration leaning on relational databases as a data layer.
However, it’s becoming a bit trickier to do the same when a distributed SQL database or other kind of distributed storage is used instead.
In this talk you will learn how Kubernetes can orchestrate distributed database like Apache Ignite, in particular:
* Cluster Assembling - database nodes auto-discovery in Kubernetes.
* Database Resilience - automated horizontal scalability.
* Database Availability - what’s the role of Kubernetes and the database.
* Utilizing both RAM and disk - set up Apache Ignite in a way to get in-memory performance with durability of disk.
An unheralded, but critical important, component of data sciences is the management of data. The data.table R package provides a high-performance version of base R's data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.
Data.table performs extremely wel for large datasets, and also offers powerful indexing, transformation/grouping, and merging/joining.
Features of data.table
Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread). Offers a natural and flexible syntax, for faster development.
About the presenter: Kevin O’Brien is a Limerick based data-scientist that specializes in agriculture and forestry, working with R and Python. He is also a teaching fellow in UL.