cveproject / cvelistv5 Goto Github PK

View Code? Open in Web Editor NEW

543.0 543.0 112.0 276.24 MB

CVE cache of the official CVE List in CVE JSON 5 format

cvelistv5's People

Contributors

Stargazers

Watchers

Forkers

johanwange divd-nl deckhandfirststar01 karmzl srmish-jfrog isabella232 seanpm2001 brettp jfrog-cve-bot hkong-mitre qpc-github berniehuang2008 460602754796 sergo2106 data-gami filipdadgar attacker-codeninja jdawg24 hkong agileadaptivetools rcoscali haidao elrorens athu-tran uthnart xinian-66 kostto-ray st-rnd black-archivers gizzzmo threatensics excusemeq inqwi 123assassin mdiqbalahmad miguel736j hkong2 vindawg138 codebycody gmc187 firasdlke zeekey0427 b3n4kh shilxfly zmanion werogit1 meishao slaim1322111 f0r3ns1cat0r cyb3rc0d3 gka8543 m-nj yourm-8 jinyaxuan santrancisco rcm361 jillumio cvemd xunthin99 xsizxenjin sndwave-ark tofigh54 noah-jaffe drfadyn zhalijia91825 tshepo666 scottlondonpentiq xulei1112 nerd-media lynch1981 gisellesec vsa77 nathanak matrix-compute patriciawi eric-qiu1 kurtseifried machine391

cvelistv5's Issues

Trying

Question about JSON 5.0 format

Hey there,

I've discovered this repository by accident, and didn't know about the cvelist repository either before.

Currently, the (old) CVE website announced that there's the new JSON 5.0 format being used for the CVE List downloads page starting late spring this year.

I am just a little confused as to "how official" this repository is, due to it only having 4 stars 😞 and as there's not any mention on the new and neither the old CVE website.

For what it's worth: I love the new JSON 5.0 format, as it makes parsing and keeping track of everything much much easier. With the old format you always had to scrape a lot of details and unnecessarily drain mitre's / NVD's servers and/or proprietary databases that probably just scraped NVD themselves, and all this can be avoided with the new format and makes access to those details much more public.

Now my question is mostly about the contents of this repository.

Is this repository the new go-to resource for the new CVEs after the migration?
The cvelist repository still uses the JSON 4.0 format, so I'm assuming this will be deprecated after the migration?

Thanks in advice.

FEEDBACK - Missing license file

Please fill out the following sections

Is there a problem using the GitHub Repository?
Yes. There is no license.

Do you have any suggestions on how we could improve the repository?
Add a license, CC0 is probably best for content.

Please provide any other comments here

As per CWE, they have deployed the CC0 license: https://github.com/CWE-CAPEC/CWE-Submissions/issues/30

Published date changed on thousands of CVEs

Making a plot of published dates in the JSON 5.0 has two weird jumps in it.

There are 10,151 CVEs with the "datePublished" value on 2022-10-03. They have different times in the data, but all within that day.
9,921 of those also have the same date in the "last modified" field.
For example, when I look at CVE-2010-1124 in the old website, it has "date record created" as "20100326", but in the new website it claims both published and updated on 2022-10-03.
Random sample of CVEs published on Oct 3, 2022: CVE-2015-8758, CVE-2010-2983, CVE-2008-7279, CVE-2002-2386, CVE-2009-2610, CVE-2018-1999030, CVE-2018-1000198, CVE-2018-20371, CVE-2005-4691, CVE-2003-1567, CVE-2011-4391, CVE-2018-14492, CVE-2013-4487, CVE-2013-0317, CVE-2009-3256, CVE-2004-2712, CVE-2018-8710, CVE-2005-1652, CVE-2018-16978, CVE-2011-4771
When I compared an earlier version of the plot (below, using the legacy data), I also noticed a bump of 3,122 CVEs on 2017-05-11. Haven't dug into it, but I imagine it's similar?

List of `assignerShortName` and `assignerOrgId`

Hello.

I'm working on a software that consumes CVE data and I'd like to know if there is such a thing as a list of assignerShortName and assignerOrgId, so that we can easily match an ID with an organization, programmatically.

From this repository, I can see that, for example:

assignerShortName	assignerOrgId
canonical	cc1ad9ee-3454-478d-9317-d3e869d708bc
debian	79363d38-fa19-49d1-9214-5f28da3f3ac5
mitre	8254265b-2729-46b6-b9e3-3dfca2d5bfca
...	...

I could just iterate all over this project and create the list myself but it would be nice to avoid that and use an official source (an API, for example). I saw the List of Partners section on the web but it doesn't seem to include the organization ID (and even if it does, web scrapping would be required to consume that).

For the case there is no such thing and it's me who has to create this list: Are these IDs going to change someday? Or are they fixed?

Thanks in advance,
and great work here, BTW.

Multiple empty JSONs

Not sure what caused this but there are many empty records. eg.,

https://github.com/CVEProject/cvelistV5/blob/ee0bf97032cd4f2b2cb5874ba19b16a84cb12c85/review_set/2022/24xxx/CVE-2022-24112.json

Official way to synchronize the JSON 5.0 feeds

Hello,

First of all thank you for the awesome work you do concerning the CVE ecosystem!

I'm the developer of a CVE-related tool, and I would like to add the MITRE in my sources (instead of only relying on NVD for now). But to be honest I don't really know how to parse your feed.

So I would like to ask you the official and recommended way to synchronize our local databases with the new JSON 5.0 CVE list.

I searched on your blog posts and if I'm not wrong you're currently in "Soft Deploy" state, meaning CNAs now use the new format to declare CVEs. The "Hard Deploy" is targeted for 1st QT, 2023. At this moment we (as consumers) will be able to officially use the JSON 5.0 feeds.

But where to find the list please? I think the old format (csv, html, text, xml) will be removed, so maybe you will provide an API (or something similar as the NVD does) to fetch the last changes?

Or maybe this current repo (cveproject/cvelistv5) will become our official data feeds? If yes do you recommend to use the recent_activities.json file to detect the changes or simply periodically git pull and parse the new diffs?

Thank you in advance for your answer,
Nicolas

Cve project

"opertion" misspelling

https://raw.githubusercontent.com/CVEProject/cvelistV5/main/.github/workflows/dist/index.js at c376add has

            activityLog.writeRecentFile();
        }
        console.log(`opertion completed in ${super.timerSinceStart() / 1000 / 60} minutes at ${_core_dateUtils_js__WEBPACK_IMPORTED_MODULE_5__/* .DateUtils.getIsoDate */ .E.getIsoDate()}`);

and

            console.log(`no new or updated CVEs`);
        }
        console.log(`opertion completed in ${super.timerSinceStart() / 1000} seconds at ${_core_dateUtils_js__WEBPACK_IMPORTED_MODULE_6__/* .DateUtils.getIsoDate */ .E.getIsoDate()}`);

where operation was intended.

CVE-2023-38894 in Tree-kit

Hi all,
Just a heads up there's a new prototype pollution CVE for cronvel/tree-kit found here: https://nvd.nist.gov/vuln/detail/CVE-2023-38894
More info about the CVE: https://www.code-intelligence.com/blog/treekit-prototype-pollution-cve-2023-38894
I don't have a json version for the CVE yet sorry.
Cheers

Vendors, Products and Versions are totally messed up

Hey there,

in the cvelist, all vendors and products and their versions are totally messed up.

First off, there seem to be more than one notation for the meaning of "n/a" (aka null). So far I've identified these notations: n/a, * n/a *, *** n/a ***, NONE, None, none, no, null, [UNKNOWN], [Unknown], Unknown.

Additionally, all vendors and products are messed up. Sometimes there's the product field containing the actual versions that are affected in a comma separated list. Sometimes the Vendor is redundantly marked e.g. as Example, Inc and Example Corporation and Example. Siemens alone has more than 10 different notations.

The versions themselves are a whole other story, because most of them are also totally invalid. Even when there's a lessThan field set, sometimes the value of it is set to None. It gets even more ridiculous when the same CVE has two different affected versions which logically contradict each other.

standardize encoding

Currently, most files have ascii encoding, but some are not. At least 213,576 files are ascii and at least 6,330 are non-ascii.

Please unify all json files to a single encoding. A git action to verify encoding would be helpful long term.

For a hacky check:

#!/usr/bin/env python3

import glob
import subprocess

count_ascii = 0
count_non_ascii = 0

for path in glob.glob("cves/**/*.json", recursive=True):
    res = subprocess.run(args=["file", "-bi", path], capture_output=True)
    if b"ascii" in res.stdout:
        count_ascii += 1
    else:
        count_non_ascii += 1
        print(path)

print(f"ascii:     {count_ascii}")
print(f"non-ascii: {count_non_ascii}")

Descriptions contain newlines and other characters

Hi,

While parsing CVE 5 records, I have found multiple issues with non-sanitized descriptions. That includes leading or trailing whitespaces, (multiple) newlines in the middle of a description, for example. Some cases:

./cves/2021/41xxx/CVE-2021-41144.json
"OpenMage LTS is an e-commerce platform. Prior to versions 19.4.22 and 20.0.19, a layout block was able to bypass the block blacklist to execute remote code. Versions 19.4.22 and 20.0.19 contain a patch for this issue.\n\n\n"

./cves/2021/45xxx/CVE-2021-45448.json
"Pentaho Business Analytics\n Server versions before 9.2.0.2 and 8.3.0.25 using the Pentaho \nAnalyzer plugin exposes a service endpoint for templates which allows a \nuser-supplied path to access resources that are out of bounds. \n\nThe software uses external input to construct a pathname that is intended to identify a file or \ndirectory that is located underneath a restricted parent directory, but the software does not \nproperly neutralize special elements within the pathname that can cause the pathname to \nresolve to a location that is outside of the restricted directory. By using special elements such as \n".." and "/" separators, attackers can escape outside of the restricted \nlocation to access files or directories that are elsewhere on the \nsystem.\n\n\n\n"

Would be nice to have descriptions somewhat sanitized into a single string.

`recent_activities.json` not being updated

Hi,

I'm working on small tool to sync new and updated CVEs to my app. To save network bandwidth and disk I/O, I use git to synchronize updates from this repository.

The tool rely on ./cves/recent_activities.json, but I don't see any updates to ./cves/recent_activities.json file even though there are updates under ./cves/[year]/ (since 3 weeks ago).

Is the recent_activities.json file under ./cves/ directory not being updated anymore?

Thank you

CVEs missing in deltas

I downloaded the following files from the releases area:

2023-10-15_all_CVEs_at_midnight.zip.zip
2023-10-16_all_CVEs_at_midnight.zip.zip
2023-10-15_delta_CVEs_at_xx00Z.zip, where xx runs from 00 through 23 - i.e. 24 zip files
2023-10-15_delta_CVEs_at_end_of_day.zip

I then unzipped all those files and proceeded to apply the deltas in each of the 25 files (24 hourly ones, plus the end-of-day one) to the 10/15 midnight snapshot (just snapshot henceforth). After doing that, I compared the contents of the 10/15 snapshot with those of the 10/16. I thought that, after applying all the deltas in the 25 delta files to the 10/15 snapshot its contents would be identical to those of the 10/16 snapshot.

However, they are not. For example, there is a file called CVE-2023-5591.json under cves/2023/5xxx in the 10/16 snapshot which is not present in the 10/15 snapshot after (or before, at that) applying the deltas. Looking into the deltas for 10/15 themselves, CVE-2023-5591.json is also not present in any of them: in the directory obtained from 2023-10-15_delta_CVEs_at_end_of_day.zip the last file is CVE-2023-5590.json.

I have noticed a similar behavior downloading the corresponding files for different dates: for the most part there will be differences between the midnight snapshot on a given day, with all of the 25 deltas applied, and the midnight snapshot for the next day; it is only occasionally that they both are identical.

Any idea what is going on here? At what point during 10/15 was CVE-2023-5591.json added? Am I missing something?

Update --minutes-ago does not appear to work correctly if an update has already occurred we issued this update

Steps to reproduce:

Manually ran Update action
Manually ran Update action--minutes-ago 2880 (we did not get all of the CVE changes in the last 48-hours)
Manually ran Update action--minutes-ago 4320 (we did not get all of the CVE changes in the last 72-hours)

Verify that the utilities is working correctly and the resulting /cves directory has the correct CVEs by running rebuild command

Kaschey758

BNB

Delta tarballs not generated at 0000Z, changes done in 2300-0000 missing

Looking at the releases page, I see there's a full release at 0000Z, and then incremental releases with a delta tarball from the last full release every hour from 0100Z up to 2300Z. However, it seems to me that any changes from 2300Z until 0000Z will not be available in a delta tarball. That makes those delta tarballs a little less useful, as one will need to download a full tarball once a day anyway to stay up to date.

Could it be possible to have the 0000Z release contain a delta tarball from the previous 0000Z release? So that every change (including those done between 2300 and 0000) would be available in a delta tarball.

no RESERVED CVEs

if attempting to use this repository as a definitive source of CVEs then it seems having those that have been reserved would be useful but are not currently available.
For example, right now CVE-2023-38545 exists in the cvelist repository... but not cvelistV5

no guarantee that a cveRecords property will exist

In https://raw.githubusercontent.com/CVEProject/cvelistV5/main/.github/workflows/dist/index.js at dbd65c7

const response = await cveService.cve({ queryString });
let cves = [];
response.cveRecords.forEach(obj => {

const cves = await service.cve({ queryString });
const cveIds = [];
cves.cveRecords.forEach(record => {

const cves = await service.cve({ queryString });
// console.log(`getCvesByPage().cves=${JSON.stringify(cves, null, 2)}`);
const cveIds = [];
cves.cveRecords.forEach(record => {

there's no guarantee that a query to a CVE Services API in AWS will have a response with the application/json content type. In recent and realistic cases, the response can instead have:

Content-type: text/html

<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
</body>
</html>

(for example, this was seen in production 2023-02-12T21:01Z)

For the text/html content type, Axios won't create a JavaScript object, and accessing the cveRecords property will fail.

There was a request for the CVE Services API documentation to mention that text/html may occur, but there was no action on this request: CVEProject/cve-services#549

To resolve this, one possibility is to read the cveRecords property only if the content type is application/json. (It is also realistic for the cveRecords property to be missing when the content type is application/json but the status is 429 - as shown in CVEProject/cve-services#885 - but this perhaps has not occurred in recent months.)

change user.name (or user.email)?

cvelistV5/.github/workflows/update.yml

Lines 48 to 49 in bb74fe0

	git config --global user.email "[email protected]"
	git config --global user.name "cvelistV5 Github Action"

results in this data format in the patch view of a commit:

https://github.com/CVEProject/cvelistV5/commit/bb74fe0f2f6fabe1d25aec108530e45a1b7fc644.patch

From: cvelistV5 Github Action <[email protected]>

Should the first 'h' be capitalized? (GitHub instead of Github)

Is there any potential advantage in using an email address that you control? For example:

https://github.com/CVEProject/cvelist/commit/f47fcda074db819cda8db7213116c29e21e15e14.patch

From: CVE Team <[email protected]>

118955 CVE records don't have an affected product/vendor or version

I have a question regarding the quality of the dataset.

From all CVEs that ...

have not been rejected
have not been reserved

... 118955 records have not a valid affected software in their details. With some random picks to verify, the software is only noted down in the descriptions[] fields as text, but are not set inside the containers/cna/affected Array inside the JSON file.

Is this a mistake in the database export, the CVE website doesn't list any details in the rendered fields on the website?

I've generated a list of those records that do not contain valid affected fields and exported them here as a gist.

legacy severity data preservation

Will legacy severity information, such as CVSS 2.0 vectors, be archived in cvelistV5?

As an example, https://github.com/CVEProject/cvelistV5/blob/main/cves/2012/2xxx/CVE-2012-2125.json does not contain the CVSS 2.0 score from https://nvd.nist.gov/vuln/detail/CVE-2012-2125