Skip to content

#19.05 - Transfer Learning for small data sets - Open source code security

D
Hosted By
Data Science meetup N.
#19.05 - Transfer Learning for small data sets - Open source  code security

Details

• "Cross domain residual Transfer Learning for person re-identification", François Brémond (INRIA, Labo STARS)

• "Automated classification of security fixes in open-source code repositories", Antonino Sabetta and Rocío Cabrera Lozoya (SAP Security Research)

F. Brémond:
We present a novel way to transfer model weights from one domain to another using residual learning framework instead of direct fine-tuning. We also argue for hybrid models that use learned (deep) features and statistical metric learning for multi-shot person re-identification when training sets are small. This is in contrast to popular end-to-end neural network based models or models that use hand-crafted features with adaptive matching models (neural nets or statistical metrics).

Our experiments demonstrate that a hybrid model with residual transfer learning gives comparable performance than an end-to-end model on large datasets and can yield significantly better re-identification performance when the training set is small. On iLIDS-VID and PRID datasets, we achieve rank-1 recognition rates of 89.8% and 95%, respectively, which is a significant improvement over state-of-the-art.

Khan & Brémond (2019): "Cross domain Residual Transfer Learning for Person Re-identification"

A. Sabetta & R. Cabrera Lozoya:
The vulnerability management process of a software with open source components is challenging due its dependence on non-reliable standard sources of advisories and vulnerability data (such the National Vulnerability Database, NVD). Previous efforts aimed to reduce this dependency by directly analyzing source code for the automatic detection of commits that are security-relevant.

In our previous work, we treated source code changes as documents in natural language processing, potentially ignoring the structured nature of source code.

In our recent work, we incorporate the semantic properties of code into our analysis. We leverage on state-of-the art approaches to generate distributed code representations by analyzing and aggregating paths extracted from the abstract syntax tree of the code. We extend one of such approaches (code2vec), to represent of code changes (commits). We use a dataset of vulnerabilities (and commits fixing them) affecting open-source components used in SAP software. This dataset was manually collected and curated by the team operating the vulnerability assessment tool known internally to SAP as Vulas. We show how this representation can be used to identify commits that address security bugs.

Ponta, Plate, Sabetta, Bezzi & Dangremont (2019): "A manually-curated dataset of fixes to vulnerabilities of open source software"

Sabetta & Bezzi (2018): "A practical approach to the automatic classification of security-relevant commits", ICSME 2018

Ponta, Plate & Sabetta (2018): "Beyond metadata - code-centric und usage-based analysis of known vulnerabilities in open-source software", ICSME 2018 – recipient of the IEEE/TCS Distinguished Paper Award

Photo of Data Science Meetup - Nice - Sophia-Antipolis group
Data Science Meetup - Nice - Sophia-Antipolis
See more events
Route des Colles · Sophia-Antipolis