Zum Inhalt springen

Python and Big Data Frameworks

Foto von Bagmeet
Hosted By
Bagmeet
Python and Big Data Frameworks

Details

The Python community has developed a powerful and convenient data analysis software stack. But although the PyData tools and libraries become more and more popular, they are often still associated with small to medium-scale data analysis. On the other side "Big Data" is sometimes treated as a Trademark-feature of Hadoop, Spark, & co.

In this talk we will present pragmatic approaches on how to bridge these two seemingly separate worlds and develop data-heavy software in Python while using cluster-computing technology around the Hadoop ecosystem in the backend. Main topics of the talk:

• Introduction: "Big Data" technology, scalable data engineering, Python and parallel/distributed computing, Python as glue language

• DevOps HowTo: Pragmatic approaches for developers/analysts to setup virtual development cluster environments using tools/systems like Docker, Vagrant, Ansible, AWS/EC2, VirtualBox

• Interactive demo using Python and technologies like HDFS, Hadoo MapReduce, Impala, Spark, Storm, Kafka, GraphLab (selection, not all)

Expected audience, skill level: Intermediate. Familiarity with IPython and PyData tools is helpful but not necessary. We will only go into a few advanced topics towards the end, otherwise the intention of this talk is to give a broad overview of relevant technologies in the field along with some concrete pragmatic approaches to get started quickly.

Speaker: Frank Kaufer is co-founder of bakdata (http://www.bakdata.com), a Berlin-based company providing individual data engineering solutions using a wide range of technologies including distributed data processing frameworks like Hadoop or Spark, relational databases, ETL tools and various programming languages, not only but also Python. Before founding bakdata, Frank has worked as a freelance developer, IT consultant and IT architect focusing on topics at the intersection of Artificial Intelligence, Distributed Systems and Data Engineering. For more than 10 years now, he has used Python in many data-oriented software projects.

Hosted at

http://photos2.meetupstatic.com/photos/event/b/b/c/e/600_433128078.jpeg

Photo of PyData Berlin group
PyData Berlin
Mehr Events anzeigen
Sky Lounge @ Zalando Tech
Mollstr. 1 · Berlin