#19.05 - Transfer Learning for small data sets - Open source code security

Name: #19.05 - Transfer Learning for small data sets - Open source code security
Start: 2019-04-30T12:30:00+02:00
End: 2019-04-30T14:30:00+02:00
Location: Learning Center - Campus Sophia Polytech

Hosted by Data Science meetup N.

Data Science Meetup - Nice - Sophia-Antipolis

Details

• "Cross domain residual Transfer Learning for person re-identiﬁcation", François Brémond (INRIA, Labo STARS)

• "Automated classification of security fixes in open-source code repositories", Antonino Sabetta and Rocío Cabrera Lozoya (SAP Security Research)

F. Brémond:
We present a novel way to transfer model weights from one domain to another using residual learning framework instead of direct ﬁne-tuning. We also argue for hybrid models that use learned (deep) features and statistical metric learning for multi-shot person re-identiﬁcation when training sets are small. This is in contrast to popular end-to-end neural network based models or models that use hand-crafted features with adaptive matching models (neural nets or statistical metrics).

Our experiments demonstrate that a hybrid model with residual transfer learning gives comparable performance than an end-to-end model on large datasets and can yield signiﬁcantly better re-identiﬁcation performance when the training set is small. On iLIDS-VID and PRID datasets, we achieve rank-1 recognition rates of 89.8% and 95%, respectively, which is a signiﬁcant improvement over state-of-the-art.

Khan & Brémond (2019): "Cross domain Residual Transfer Learning for Person Re-identiﬁcation"

A. Sabetta & R. Cabrera Lozoya:
The vulnerability management process of a software with open source components is challenging due its dependence on non-reliable standard sources of advisories and vulnerability data (such the National Vulnerability Database, NVD). Previous efforts aimed to reduce this dependency by directly analyzing source code for the automatic detection of commits that are security-relevant.

In our previous work, we treated source code changes as documents in natural language processing, potentially ignoring the structured nature of source code.

In our recent work, we incorporate the semantic properties of code into our analysis. We leverage on state-of-the art approaches to generate distributed code representations by analyzing and aggregating paths extracted from the abstract syntax tree of the code. We extend one of such approaches (code2vec), to represent of code changes (commits). We use a dataset of vulnerabilities (and commits fixing them) affecting open-source components used in SAP software. This dataset was manually collected and curated by the team operating the vulnerability assessment tool known internally to SAP as Vulas. We show how this representation can be used to identify commits that address security bugs.

Ponta, Plate, Sabetta, Bezzi & Dangremont (2019): "A manually-curated dataset of fixes to vulnerabilities of open source software"

Sabetta & Bezzi (2018): "A practical approach to the automatic classification of security-relevant commits", ICSME 2018

Ponta, Plate & Sabetta (2018): "Beyond metadata - code-centric und usage-based analysis of known vulnerabilities in open-source software", ICSME 2018 – recipient of the IEEE/TCS Distinguished Paper Award

Data Science Meetup - Nice - Sophia-Antipolis

#19.05 - Transfer Learning for small data sets - Open source code security

Data Science Meetup - Nice - Sophia-Antipolis

Details

Related topics

You may also like