Using NLP to Predict the Severity of Cyber Security Vulnerabilities

Readme

File Size	1.7 KB
File Format	Plain text

Download file View file Download file

Report

File Size	1.59 MB
File Format	Portable Document Format

Download file View file Download file

Scripts

File Size	241 MB
File Format	ZIP Format
Scope And Content	Everything from GitHub https://github.com/twlim1/VulnerWatch.
Technical Details	AWS EC2: Image: Deep Learning AMI (Amazon Linux 2) Version 46.0 Instance: t2.xlarge instance (non GPU version) Instance: p2.xlarge instance (GPU version) Docker image: openSUSE Leap distribution Python packages: Flask==2.0.1 requests==2.25.1 APScheduler==3.7.0 psycopg2-binary==2.8.6 numpy==1.19.5 torch==1.8.1 transformers==4.6.1 pytz==2021.1 transformers==4.5.1

Download file View file Download file

Input data

File Size	38.9 MB
File Format	ZIP Format
Scope And Content	CVE json data downloaded from NVD.

Download file View file Download file

Output data

File Size	3.03 GB
File Format	ZIP Format
Scope And Content	Eight fine-tuned BERT models.

Download file View file Download file

Poster

File Size	610 KB
File Format	Portable Document Format

Download file View file Download file

Final presentation

File Size	4.7 MB
File Format	Portable Document Format

Download file View file Download file

Tableau dashboard

File Size	266 KB
File Format	Extensible Markup Language

Download file View file Download file

Collections

Data Science & Engineering Master of Advanced Study (DSE MAS) Capstone Projects

Educational Dataset Service Collection

Cite This Work

Cook, Bryan; Janamian, Saba; Lim, Teck; Logan, James; Ulloa, Ivan; Altintas, Ilkay; Gupta, Amarnath (2021). Using NLP to Predict the Severity of Cyber Security Vulnerabilities. In Data Science & Engineering Master of Advanced Study (DSE MAS) Capstone Projects. UC San Diego Library Digital Collections. https://doi.org/10.6075/J0TX3F89

Description

Cyber-attacks continue to be one of the world’s foremost safety and economic threats, and, in recent years, have become more numerous and severe. Cybersecurity engineers use industry-standard “Common Vulnerabilities and Exposure” (CVE) records to understand and address known threats. CVE records generally contain “Common Vulnerability Scoring System” (CVSS) scores, which indicate a human-determined level of severity. These scores are important to cybersecurity engineers in threat prioritization. Unfortunately, nearly half of all CVE records have not yet been assigned CVSS v3 scores, a critical component of the overall CVSS score. The VulnerWatch product is introduced as a machine learning solution for predicting CVSS v3 scores. Bidirectional Encoder Representation (BERT) is used on CVE record text descriptions to predict eight metrics that, in aggregate, indicate a CVSS v3 score. VulnerWatch provides the user with a prioritized list of CVE records that do not have human-determined CVSS v3 scores, along with a predicted score. It also allows the engineer to manually enter text describing threats and receive a predicted CVSS v3 score in near real-time. The accuracy of predictions for metrics determining CVSS v3 scores is favorable, averaging close to 0.9, with similar levels of precision and recall. Resultant CVSS v3 score predictions are also favorably accurate (MSE = 1.27, MAE = 0.5, R2= 0.51). At this level of accuracy, VulnerWatch is deemed to be successful in providing a valuable tool in combatting cyber-attacks.

Creation Date

2021-01 to 2021-06

Date Issued

2021

Advisors

Contributors

Series

Topics

Formats View formats within this collection

Language

English

Identifier

Doi: https://doi.org/10.6075/J0TX3F89

Related Resources

Source data

BERT model implementation was derived from: https://www.chrismccormick.ai/

CVE Data downloaded via REST API from: https://nvd.nist.gov/

Other resource

Bozorgi, Mehran, et al. “Beyond Heuristics.” Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '10, 2010. https://doi.org/10.1145/1835804.1835821

Elbaz, Clément, et al. “Fighting N-Day Vulnerabilities with Automated CVSS Vector Prediction at Disclosure.” Proceedings of the 15th International Conference on Availability, Reliability and Security, 2020. https://doi.org/10.1145/3407023.3407038

Khazaei, Atefeh, et al. “An Automatic Method for CVSS Score Prediction Using Vulnerabilities Description.” Journal of Intelligent & Fuzzy Systems, vol. 30, no. 1, 2015, pp. 89–96. https://doi.org/10.3233/ifs-151733

License

Creative Commons Attribution 4.0 International Public License

Rights Holder

Cook, Bryan; Janamian, Saba; Lim, Teck; Logan, James; Ulloa, Ivan

Copyright

Under copyright (US)

Use: This work is available from the UC San Diego Library. This digital copy of the work is intended to support research, teaching, and private study.

Constraint(s) on Use: This work is protected by the U.S. Copyright Law (Title 17, U.S.C.). Use of this work beyond that allowed by "fair use" or any license applied to this work requires written permission of the copyright holder(s). Responsibility for obtaining permissions and any use and distribution of this work rests exclusively with the user and not the UC San Diego Library. Inquiries can be made to the UC San Diego Library program having custody of the work.

Digital Object Made Available By

Research Data Curation Program, UC San Diego, La Jolla, 92093-0175 (https://lib.ucsd.edu/rdcp)

Last Modified

2024-06-28