Skip to content

[online] Hong Kong Machine Learning Meetup Season 4 Episode 4

Photo of Gautier Marti
Hosted By
Gautier M.
[online] Hong Kong Machine Learning Meetup Season 4 Episode 4

Details

This meetup will be online only. The event is generously sponsored by Darwinex https://hubs.li/H0WbChT0 (a platform for algo traders).

We are looking for nice locations in Hong Kong to host the in-person meetups for the next episodes, and speakers from Hong Kong as well! Please, reach out to us if you have something interesting to present related to machine learning, or if you know a company or venue which is willing to host the event!

Talk 1: Asset Pricing with Panel Trees under Global Split Criteria

Sean Xin He, City University of Hong Kong (CityU)

Abstract:
We introduce a class of interpretable tree-based models (P-Trees) for analyzing panel data, with iterative and global (instead of recursive and local) splitting criteria to avoid overfitting and improve model performance. We apply P-Tree to generate a stochastic discount factor model and test assets for cross-sectional asset pricing. Unlike other tree algorithms, P-Trees accommodate imbalanced panels of asset returns and grow under the no-arbitrage condition. P-Trees also graphically capture nonlinearity and interaction effects and accommodate regime-switching and interactions between macroeconomic states and firm characteristics. For example, P-Tree identifies inflation as the most important macro predictor with regime-switching in U.S. equity data. Based on multiple pricing, prediction, and investment metrics, we find that (boosted or time-series) P-Trees outperform standard factor models and PCA latent factor models. An equal-weighted portfolio for five factors generated by P-Trees delivers an excess alpha of 1.09% against the Fama-French 3-factor benchmark, producing an annualized Sharpe ratio of 1.98 out of sample. Data-driven cutpoints in P-Trees reveal that long-run reversal, volume volatility, and industry-adjusted market equity drive cross-sectional return variations, consistent with variable importance analysis using random forests.

paper: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3949463

Talk 2: Top2Vec: Distributed Representations of Topics, with application on 2020 10-K business descriptions

A quick walk-through Top2Vec, a novel approach to topic modeling.

blog: https://marti.ai/ml/2021/11/14/top2vec-10k-business.html

Talk 3: 3D Infomax improves GNNs for Molecular Property Prediction

Hannes Stark, MIT Research Intern, https://hannes-stark.com

Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Including 3D molecular structure as input to learned models improves their performance for many molecular tasks. However, this information is infeasible to compute at the scale required by several real-world applications. We propose pre-training a model to reason about the geometry of molecules given only their 2D molecular graphs. Using methods from self-supervised learning, we maximize the mutual information between 3D summary vectors and the representations of a Graph Neural Network (GNN) such that they contain latent 3D information. During fine-tuning on molecules with unknown geometry, the GNN still generates implicit 3D information and can use it to improve downstream tasks. We show that 3D pre-training provides significant improvements for a wide range of properties, such as a 22% average MAE reduction on eight quantum mechanical properties. Moreover, the learned representations can be effectively transferred between datasets in different molecular spaces.

paper: https://arxiv.org/pdf/2110.04126.pdf

Photo of Hong Kong Machine Learning Meetup group
Hong Kong Machine Learning Meetup
See more events