Saeed Amen - Making Python parallel with large datasets


London Marriott Hotel Canary Wharf

22 Hertsmere Rd · London

How to find us

Look for Ginger Room in Marriott Canary Wharf (which is 9 minutes walk from the Canary Wharf Underground station or 1 minute walk from the West India Quay DLR).

Location image of event venue



Python is a great language for data science. When working with large datasets which don't fit entirely in memory, we may need to use some different approaches. In this talk we will discuss various Python libraries which are ideal for working with large time series datasets in a pandas-like way, including dask and vaex. We shall also explore how to make computation parallel in Python, talking about the differences between threading and multiprocessing, and wrappers like concurrent.futures. We shall also talk about using the very powerful celery to distribute tasks. We shall illustrate the talk with a Jupyter notebook, including examples from finance (such as using FX tick datasets).

Brief Bio

Saeed Amen is the founder of Cuemacro. Over the past fifteen years, Saeed Amen has developed systematic trading strategies at major investment banks including Lehman Brothers and Nomura. He is also the author of Trading Thalesians: What the ancient world can teach us about trading today (Palgrave Macmillan) and is the coauthor of The Book of Alternative Data (Wiley), due in 2020. Through Cuemacro, he now consults and publishes research for clients in the area of systematic trading. He has developed many Python libraries including finmarketpy and tcapy for transaction cost analysis. His clients have included major quant funds and data companies such as Bloomberg. He has presented his work at many conferences and institutions which include the ECB, IMF, Bank of England and Federal Reserve Board. He is also a co-founder of the Thalesians.