AI Safety Fundamentals Week 6

Name: AI Safety Fundamentals Week 6
Start: 2023-12-07T18:30:00+01:00
End: 2023-12-07T20:30:00+01:00

Hosted by Nico H.

Meet the group

AI Safety Aachen

No reviews yet

Details

Hello everyone! 👋
Our next meetup will be on Thursday 07.12 at 18:30 at CARL S03 😃

This week we will start to learn about interpretability. You have probably heard that current NNs are mostly black box systems. The goal of the field of interpretability is to change this and find ways to make computations done by AIs more human understandable. This understanding can then be used for creating safer systems by e.g. creating neural lie detectors to catch deceptive models or otherwise show that computations done by AIs satisfy safety properties.

We will start with time for reading the material at 18:30-19:30. Here the focus is on understanding the content, but small discussions and questions are welcome too.

The main discussion part of the meetup will be from 19:30-20:30 with an option to go eat dinner together afterwards. We will have various discussion prompts to explore our ideas around the topics from the week's reading and AI Safety in general. If you prefer reading the material at home you can come at 19:30.

Core reading for this week:
https://distill.pub/2020/circuits/zoom-in/
https://arxiv.org/abs/1610.01644 Sections 1 and 3
https://rome.baulab.info/

Optional reading for this week:
https://www.alignmentforum.org/posts/yRAo2KEGWenKYZG9K/discovering-language-model-behaviors-with-model-written

Looking forward to seeing you! 😊

Artificial Intelligence Applications

Ethics

Philosophy & Ethics

Public Safety

Safety

AI Safety Fundamentals Week 6

AI Safety Aachen

Details

Members are also interested in