Multimodal LLMs & Skill Extraction and Classification using LLMs
Details
Multimodal Large Language Models by Anselmo Talotta
This presentation explores the field of multimodal learning, which aims to enhance machine perception by integrating diverse data types such as text, images, audio, and video. The talk covers key developments in multimodal AI, from early fusion techniques to more advanced approaches like CLIP (Contrastive Language-Image Pre-training) and recent Multimodal Large Language Models.
Key topics include the challenges of combining different data modalities, the application of transformer architectures in multimodal contexts, and emerging capabilities in zero-shot learning. The presentation discusses practical applications such as visual question answering and text-based image retrieval, while also addressing current limitations of multimodal systems.
Skill Extraction and Classification in Online Job Vacancies Using NLP and Large Language Models by Najada Feimi
This presentation explores the extraction and classification of skills from Online Job Vacancies (OJV) across France, Luxembourg, Belgium, and Germany. The approach leverages a combination of cosine similarity and transformer-based embeddings to validate OJV data against official registries and retrieve industry NACE codes, ensuring accuracy in job market analysis. A key focus is the automatic extraction of skills from job descriptions using Mixtral (Mistralai/Mixtral-8x7B-Instruct-v0.1). The extracted skills are then classified into meaningful categories, including Digital, Data-Related, AI, Frontier, Prediction, Judgment, Decision-Making, Social and Other.
The talk discusses the challenges of skill identification, the impact of LLM-generated extractions, and the effectiveness of classification models in structuring Labor Market data. Finally, it addresses the strengths and limitations of using LLMs for workforce analysis and future directions for improving skill taxonomies.

