Tue, Oct 28 · 6:00 PM CET
Knowledge-augmented Graph Machine Learning for Drug Discovery and Its Next Generation in the Era of Large Language Models by Zhiqiang ZHONG
The integration of external knowledge with machine learning has become increasingly important in scientific discovery, particularly in biomedicine. In this talk, I will first introduce Knowledge-augmented Graph Machine Learning (KaGML) for drug discovery, which combines structured biomedical knowledge with graph-based learning to enhance prediction accuracy and interpretability in low-data regimes. Building upon these insights, I will then discuss how Large Language Models (LLMs) can propel the next generation of KaML. LLMs can act as general knowledge sources and reasoning engines to automate key steps of knowledge preparation and integration, enabling flexible, end-to-end learning pipelines. I will present recent results from our studies showing that LLMs can (i) augment annotations, (ii) automatically select and weigh auxiliary tasks, and (iii) correct model predictions post-hoc. Together, these results illustrate the potential of LLM-enhanced KaML (LeKaML) as a foundation for more scalable, interpretable, and knowledge-driven AI systems.
Automated synthesis, debug, and deployment of end-to-end machine learning pipelines by Raoni Lourenço
Building machine learning (ML) models requires a complex, multistep process to be carried out by users with domain knowledge, mathematical competence, and computer science skills. Automated Machine Learning (AutoML) has emerged to simplify the application of ML techniques and reduce the need for expert users. This new research topic has many exciting challenges. In this talk, we present our research aimed at addressing three fundamental challenges in automating machine learning: pipeline synthesis, debugging, and deployment. We developed AlphaAutoML, an AutoML system based on meta-reinforcement learning using sequence models with self-play for the first challenge. Inspired by AlphaZero, we frame the problem of pipeline synthesis for model discovery as a single-player game where the player iteratively builds a pipeline by selecting actions (insertion, deletion, and replacement of pipeline components). A neural network receives the entire pipeline, data meta-features, and the problem as input, and outputs action probabilities and estimates for the pipeline's performance. A Monte Carlo Tree Search uses the network probabilities to run simulations that terminate at actual pipeline evaluations. For the second challenge, we present BugDoc, a system that automatically infers the root causes and succinct explanations of failures for black-box pipelines. BugDoc uses provenance from previous runs of a given pipeline to derive hypotheses for the errors and then iteratively runs new pipeline configurations to test these hypotheses. Finally, we reflect upon pipeline deployment considerations in the context of a No-Code AI platform.