DeepSeek mHC + Improving Claude Code's Code
Details
Details
This will be a journal club event
Two Talks:
- mHC: Manifold-Constrained Hyper-Connections (link to paper)
- Understanding and Improving the Source Code for Claude Code
Speakers
- Rakshak Talwar, Principal AI Engineer at FindHelp
- Eugene De Hoyos, Software Engineer at 5OC
Abstract
1. Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a general framework that projects the residual connection space of HC onto a specific manifold to restore the identity mapping property, while incorporating rigorous infrastructure optimization to ensure efficiency. Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability. We anticipate that mHC, as a flexible and practical extension of HC, will contribute to a deeper understanding of topological architecture design and suggest promising directions for the evolution of foundational models.
2. What could you do if you had access to the alleged Claude Code source code?
When the Claude Code CLI client was released last spring, it's source code maps were uploaded to NPM, and the source code was reconstructed from it.
We'll talk about a year-long journey of first fixing the codebase to simply get it to run, refactoring it to understand and improve its design, back-porting newer features (like oauth and new tools) via reverse engineering, and ultimately extending the tool and building upon it.
The goal of this talk is to inspire other practitioners to delve into the inner workings of coding agents. You'll walk out with a better grasp of the internal anatomy of a coding agent, the underlying APIs, and simple techniques you can use to inspect what's going on under the hood. Knowing your tools can only make you better at using them.
You will finally understand why `/compact` is so bad, and how it could be better implemented. Bring your questions, and we'll try to answer them by looking at the source!
Info
Austin Deep Learning Journal Club is group for committed machine learning practitioners and researchers alike. The group typically meets every first Tuesday of each month to discuss research publications. The publications are usually the ones that laid foundation to ML/DL or explore novel promising ideas and are selected by a vote. Participants are expected to read the publications to be able to contribute to discussion and learn from others. This is also a great opportunity to showcase your implementations to get feedback from other experts.
Sponsors:
Thank you to Station Austin for sponsoring Austin Deep Learning. Station Austin is the center of gravity for entrepreneurs in Texas. They bring together the best entrepreneurs in the state and connect them with their first investors, employees, mentors, and customers. To sign up for a Station Austin membership, click here.
AI summary
By Meetup
Journal club for committed ML practitioners and researchers to discuss foundational or novel ML/DL papers; outcome: receive feedback on your implementations.
AI summary
By Meetup
Journal club for committed ML practitioners and researchers to discuss foundational or novel ML/DL papers; outcome: receive feedback on your implementations.
