The vulnerablecode's discuss from nexb

See https://openwrt.org/docs/guide-developer/security
It is unclear if there is anything that can be collected at all.

Define basic models for storing essentials of vulnerabilities, packages and their relations

Here is a rough sketch:

Create JSON API that accepts a scancode package name and version

.... and return essential data about the found vulnerability if any with

vulnerabilities references
packages references

Collect AIX /IBM security advisories

See http://aix.software.ibm.com/aix/efixes/security and ftp://aix.software.ibm.com/aix/efixes/security

Should we store data of Debian releases apart from Jessie?

We are presently storing data for just Debian's Jessie release. Should we store data of other releases, too?

Improve vulnerability severity or scoring storage design

We presently have just CVSS as a vulnerability severity indicator. Some datasets classify vulnerability as a textual indicator, High, Low, etc.

We can add:

vulnerability_score=models.TextField(max_length=50, help_text="Severity of the vulnerability")

This will ensure that we can save both Textual & Non-CVSS severity indicators, that any dataset provides.

Collect NixOS

The base is there:

Collect Red Hat RHSA

https://www.redhat.com/security/data/oval/
http://www.redhat.com/security/data/oval/com.redhat.rhsa-all.xml
https://www.redhat.com/security/data/metrics/

Evolve logics to deal with duplicates in data.

Example of: GET /vulncode_app/data/prototypejs

{
    "name": "prototypejs",
    "version": [
        "0",
        "1.6.0.2-1"
    ],
    "vulnerabilities": [
        {
            "summary": "Unspecified vulnerability in Prototype JavaScript framework (prototypejs) before 1.6.0.2 allows attackers to make \"cross-site ajax requests\" via unknown vectors.",
            "reference_id": "CVE-2008-7220"
        },
        {
            "summary": "Unspecified vulnerability in Prototype JavaScript framework (prototypejs) before 1.6.0.2 allows attackers to make \"cross-site ajax requests\" via unknown vectors.",
            "reference_id": "CVE-2008-7220"
        },
        {
            "summary": "Unspecified vulnerability in Prototype JavaScript framework (prototypejs) before 1.6.0.2 allows attackers to make \"cross-site ajax requests\" via unknown vectors.",
            "reference_id": "CVE-2008-7220"
        },
        {
            "summary": "The Prototype (prototypejs) framework before 1.5.1 RC3 exchanges data using JavaScript Object Notation (JSON) without an associated protection scheme, which allows remote attackers to obtain the data via a web page that retrieves the data through a URL in the SRC attribute of a SCRIPT element and captures the data using other JavaScript code, aka \"JavaScript Hijacking.\"",
            "reference_id": "CVE-2007-2383"
        }
    ]
}

Collect CentOS vulnerabilities

There are CESA aka. CentOS Security Advisory but this is not entirely clear how they differ from RHSA (RedHat)
See for instance https://lists.centos.org/pipermail/centos-announce/2019-September/023448.html
and https://access.redhat.com/errata/RHSA-2019:2729
These may be published only as a mailing list post?

Track licenses for each data pointers and records

We need to decide what we want to do wrt. licenses for data.
See https://cve.mitre.org/about/termsofuse.html for instance for the CVE/NVD.
There are a few ways to think about this:

we are storing only pointers so there is no licenses issues to track as we are not storing third-party data
we are storing only pointers and caching existing data so we should handle this in a way similar to what search engine do.
we are storing data so we should track licenses either per-record or per source

Each of these cases may have an impact of the resulting data licenses, which should be as open as possible (ideally some CC0-1.0)

django based app, which combines DRF api & cve-search instance.

To Do:

Initialise a django based app. ✅
Using the cve-search api code return CVE-IDS. ✅
Output the data in JSON format. ❎

EDIT:

Additions in second point:

CVSS Score ❎
Summary ❎
URL ❎

Ability to query the public CVE-search instance and return vulnerabilities

Given a package identifier input, return if there is a known vulnerability for it.

Package identifiers:

name
name+version

Behind the scenes:

Query the CVE-search API and try to find a match for the package
Return results

Step 1. would after that be replaced by a local query, to the local db, where the aggregated and correlated vulnerability data would be populated from the scrapers, but let's not store anything for now, simply get the data on demand.

Essential modifications required in Models.

This is our present model.

Some things to consider:

CVSS should be in Vulnerability Reference and not in Vulnerability, since, CVSS scores are assigned to particular CVE-IDs
There should be some distinction in b/w a vulnerable package version and fixed version. Atm, we store just the fixed version.

Evolve data dump to store or not to store data without a package name.

In reference to: #26 (comment)

Data dump logic shouldn't store data incase there isn't a package name / version associated with it.

Add support for Microsoft vulnerabilities and CVRF

MSFT releases their vulnerabilities using this CVRF format https://www.icasi.org/cvrf/

See https://github.com/Microsoft/MSRC-Microsoft-Security-Updates-API for details

@mschiffm https://github.com/mschiffm/cvrfparse is a library to likely handle this alright

Missing to store data for all Debian "releases"

Currently for debian vulnerability data, we are missing to store all the affected/resolved version for each of the various distro "releases".
Example:
Following is the JSON snippet taken from debian security tracker:

{"mimetex": {

"CVE-2009-2458": {
    "scope": "remote", 
    "debianbug": 537254, 
    "description": "Multiple stack-based buffer overflows in mimetex.cgi in mimeTeX", 
    "releases": 
        {"stretch": 
            {"status": "resolved", 
             "repositories": {"stretch": "1.74-1"}, 
             "urgency": "medium", 
             "fixed_version": "1.50-1.1"}, 
        "jessie": 
            {"status": "resolved", 
            "repositories": {"jessie": "1.74-1"}, 
            "urgency": "medium", 
            "fixed_version": "1.50-1.1"}, 
        "buster": 
            {"status": "resolved", 
            "repositories": {"buster": "1.74-1"}, 
            "urgency": "medium", 
            "fixed_version": "1.50-1.1"}, 
        "wheezy": 
            {"status": "resolved", 
            "repositories": {"wheezy": "1.73-2"}, 
            "urgency": "medium", 
            "fixed_version": "1.50-1.1"}, 
        "sid": 
            {"status": "resolved", 
            "repositories": {"sid": "1.74-1"}, 
            "urgency": "medium", 
            "fixed_version": "1.50-1.1"}}}, 

"CVE-2009-2459": 
    {"scope": "un-remote", 
    "debianbug": 537254, 
    "description": "Multiple unspecified vulnerabilities in mimeTeX.", 
    "releases": 
    {"stretch": 
        {"status": "resolved", 
        "repositories": {"stretch": "1.74-1"}, 
        "urgency": "medium", 
        "fixed_version": "1.50-1.1"}, 
    "jessie": 
        {"status": "not-resolved", 
        "repositories": {"jessie": "1.74-1"}, 
        "urgency": "medium", 
        "fixed_version": "1.50-1.1"}, 
    "buster": 
        {"status": "resolved", 
        "repositories": {"buster": "1.74-1"}, 
        "urgency": "medium", 
        "fixed_version": "1.50-1.1"}, 
    "wheezy": 
        {"status": "resolved", 
        "repositories": {"wheezy": "1.73-2"},
        "urgency": "medium", 
        "fixed_version": "1.50-1.1"}, 
    "sid": 
        {"status": "resolved", 
        "repositories": {"sid": "1.74-1"}, 
        "urgency": "medium", 
        "fixed_version": "1.50-1.1"}}}}

Following is the final set of created records:

[
    {
        'fixed_version': '1.50-1.1',
        'package_name': 'mimetex',
        'status': 'resolved',
        'urgency': 'medium',
        'vulnerability_id': 'CVE-2009-2458',
        'description': 'Multiple stack-based buffer overflows in mimetex.cgi in mimeTeX'
    },
    {
        'fixed_version': '1.50-1.1',
        'package_name': 'mimetex',
        'status': 'not-resolved',
        'urgency': 'medium',
        'vulnerability_id': 'CVE-2009-2459',
        'description': 'Multiple unspecified vulnerabilities in mimeTeX.'
    }
]

As we can see clearly it is missing to store all the affected/resolved version for each of the various distro "releases".

Replace objects.create with objects.update_or_create.

Replace objects.create with objects.update_or_create along with new tests regarding this.

Use OVAL export for Ubuntu vulnerabilities source

The current data source for Ubuntu vulnerability information is an HTML page. Apart from the fact that this is not really meant to be machine-readable, it also does not include version information for affected/fixed packages.
Looking at quay/clair#804, it seems that the source we want to use are XML files in OVAL format available at https://people.canonical.com/~ubuntu-security/oval/.

Add vmware/photon/wiki/Security-Advisories vulnerabilities advisories

https://github.com/vmware/photon/wiki/Security-Advisories

Collect FreeBSD

See https://www.freebsd.org/security/ and https://www.freebsd.org/security/advisories.html
And also

http://vuxml.freebsd.org/ and http://www.vuxml.org/freebsd/
https://github.com/freebsd/freebsd-ports/tree/master/security/vuxml
http://www.vuxml.org/freebsd/
https://github.com/pombredanne/checkvuln (weirdly enough I am the head fork there but that's originally by @dyntopia ;) )

Collect OpenVAS Vulnerability Tests pointers

OpenVAS started as a FOSS fork of Nessus before it went closed source and is maintained by @greenbone @janowagner and team.
See the plugins (OpenVAS Vulnerability Tests) feed at https://community.greenbone.net/t/about-greenbone-community-feed-gcf/1224
See also http://www.openvas.org/ and https://en.wikipedia.org/wiki/OpenVAS

Collect gentoo

See GLSA example https://security.gentoo.org/glsa/201909-03
https://www.gentoo.org/support/security/ and feed at https://security.gentoo.org/subscribe

Refactor the current API view into a proper DRF view

From #26 (comment)

pombredanne 9 days ago Owner
I would expect the returned payload to have some "header" data (e.g. some tool version, what was the query made, number of results returned, ... And to have the list of packages returned under the packages: element

See http://www.django-rest-framework.org/api-guide/generic-views/ and http://www.django-rest-framework.org/api-guide/viewsets/

Provide CLI for data import

Implement a custom Django management command that allows

listing all available data sources
importing data from one or more sources by name (e.g. "debian")
importing data from all sources

Collect Debian vulnerability data

Debian security page is at : https://security-tracker.debian.org/tracker/
The main data source is a json file at: https://security-tracker.debian.org/tracker/data/json

Improve setup instructions in README

Relationships between a Package and Vulnerability are not created

During the data dump, debian_dump and ubuntu_dump https://github.com/nexB/vulnerablecode/blob/develop/app/vulncode_app/data_dump.py#L29 and https://github.com/nexB/vulnerablecode/blob/develop/app/vulncode_app/data_dump.py#L47

The Vulnerability and Package objects are created but not the relationship (ImpactedPackage, ResolvedPackage) between the 2.

This make the data un-usable.

Add basic requirements for prod and tests

Collect data from scrapers and dump in the database.

All code related to dumping data collected from scrapers will be referenced with this issue.

Package.version field needs proper definition

Currently, if I look at the JSON data from debian security tracker I find that it does not say which version is vulnerable but it says which is resolved/fixed. However, if I look at the JSON data from arch linux security tracker I find that it contains both fixed version as well as vulnerable version. This creates ambiguity while dumping the data to the Package.version field. While in the case of debian we simply dump fixed_version value from the JSON data to Package.version field, which value should be dumped in Package.version field for arch linux, i.e, affected_version or fixed_version value?

arch linux security tracker data - https://security.archlinux.org/json
debian security tracker data - https://security-tracker.debian.org/tracker/data/json

Flesh README to explain the basic and minimal getting started

We need to have a proper README

Collect vulnerabilities from Snyk

The data is at https://github.com/snyk/vulnerabilitydb and covers some maven, npm and rubygems vulnerabilities. The license of the data is AGPL, and the impact of such a software license on other data is not clear and needs to be clarified before doing anything with this

Add pylint.

Add pylint to CI, to check basic things in the code.

Setup Travis for CI tests

Collect vulnerabilities and package references from cve-search (and/or via4cve)

At first the goal is to collect data exposed by the API of https://cve.circl.lu/
There is also a dump for via4 data available at https://www.cve-search.org/feeds/via4.json
Longer term, we should setup our own instance of cve-search instead of using the public site API.

Collect Ubuntu vulnerability data

There are a couple ways to get Ubuntu vulnerabilities data from the main https://people.canonical.com/~ubuntu-security page:

An XML oval feed at https://people.canonical.com/~ubuntu-security/oval/
HTML pages that can be scraped at https://people.canonical.com/~ubuntu-security/cve/ ... for instances at https://people.canonical.com/~ubuntu-security/cve/main.html
possibly other sources such as

a bzr repo
an RSS/Atom feeds for higher level security notices for several vulns at once

Some notes:

About oval

The oval feed is likely a preferred option if it is comprehensive and contains data we need. This needs to be verified. For oval there is likely parsing code that could be reused in via4cve ... see this or this for redhat that may be using oval.

About the bzr repo

There may be also interesting stuff in the bzr repo:

there is a README that looks maintained and up-to-date... and quite dense!
for instance this looks like deb822 documents, one for each vulnerability: http://bazaar.launchpad.net/~ubuntu-security/ubuntu-cve-tracker/master/files/head:/active/
the HTML pages in 2. seem to be created using scripts found there: http://bazaar.launchpad.net/~ubuntu-security/ubuntu-cve-tracker/master/files/head:/scripts/ and specifically here: http://bazaar.launchpad.net/~ubuntu-security/ubuntu-cve-tracker/master/view/head:/scripts/html-report ... therefore there may be an option to build on this rather than scraping the HTML.
the scripts ./scripts/oval_lib.py and ./scripts/generate-oval are likely used to generate the oval XML too.

Import of Debian and Ubuntu packages ignores status field

While going through the debian security tracker data, I noticed that it contains fixed_version data. But in the data_dump file the mappings are created in the ImpactedReference table. See here

vulnerablecode/vulnerabilities/data_dump.py

Line 46 in 55a633d

ImpactedPackage.objects.create(

. But logically it should be ResolvedReference table.