cveproject / cvelistv5 Goto Github PK
View Code? Open in Web Editor NEWCVE cache of the official CVE List in CVE JSON 5 format
CVE cache of the official CVE List in CVE JSON 5 format
Hey there,
I've discovered this repository by accident, and didn't know about the cvelist repository either before.
Currently, the (old) CVE website announced that there's the new JSON 5.0 format being used for the CVE List downloads page starting late spring this year.
I am just a little confused as to "how official" this repository is, due to it only having 4 stars ๐ and as there's not any mention on the new and neither the old CVE website.
For what it's worth: I love the new JSON 5.0 format, as it makes parsing and keeping track of everything much much easier. With the old format you always had to scrape a lot of details and unnecessarily drain mitre's / NVD's servers and/or proprietary databases that probably just scraped NVD themselves, and all this can be avoided with the new format and makes access to those details much more public.
Now my question is mostly about the contents of this repository.
cvelist
repository still uses the JSON 4.0 format, so I'm assuming this will be deprecated after the migration?Thanks in advice.
Please fill out the following sections
Is there a problem using the GitHub Repository?
Yes. There is no license.
Do you have any suggestions on how we could improve the repository?
Add a license, CC0 is probably best for content.
Please provide any other comments here
As per CWE, they have deployed the CC0 license: https://github.com/CWE-CAPEC/CWE-Submissions/issues/30
Making a plot of published dates in the JSON 5.0 has two weird jumps in it.
Hello.
I'm working on a software that consumes CVE data and I'd like to know if there is such a thing as a list of assignerShortName
and assignerOrgId
, so that we can easily match an ID with an organization, programmatically.
From this repository, I can see that, for example:
assignerShortName | assignerOrgId |
---|---|
canonical | cc1ad9ee-3454-478d-9317-d3e869d708bc |
debian | 79363d38-fa19-49d1-9214-5f28da3f3ac5 |
mitre | 8254265b-2729-46b6-b9e3-3dfca2d5bfca |
... | ... |
I could just iterate all over this project and create the list myself but it would be nice to avoid that and use an official source (an API, for example). I saw the List of Partners section on the web but it doesn't seem to include the organization ID (and even if it does, web scrapping would be required to consume that).
For the case there is no such thing and it's me who has to create this list: Are these IDs going to change someday? Or are they fixed?
Thanks in advance,
and great work here, BTW.
Not sure what caused this but there are many empty records. eg.,
Hello,
First of all thank you for the awesome work you do concerning the CVE ecosystem!
I'm the developer of a CVE-related tool, and I would like to add the MITRE in my sources (instead of only relying on NVD for now). But to be honest I don't really know how to parse your feed.
So I would like to ask you the official and recommended way to synchronize our local databases with the new JSON 5.0 CVE list.
I searched on your blog posts and if I'm not wrong you're currently in "Soft Deploy" state, meaning CNAs now use the new format to declare CVEs. The "Hard Deploy" is targeted for 1st QT, 2023. At this moment we (as consumers) will be able to officially use the JSON 5.0 feeds.
But where to find the list please? I think the old format (csv
, html
, text
, xml
) will be removed, so maybe you will provide an API (or something similar as the NVD does) to fetch the last changes?
Or maybe this current repo (cveproject/cvelistv5) will become our official data feeds? If yes do you recommend to use the recent_activities.json
file to detect the changes or simply periodically git pull
and parse the new diffs?
Thank you in advance for your answer,
Nicolas
https://raw.githubusercontent.com/CVEProject/cvelistV5/main/.github/workflows/dist/index.js at c376add has
activityLog.writeRecentFile();
}
console.log(`opertion completed in ${super.timerSinceStart() / 1000 / 60} minutes at ${_core_dateUtils_js__WEBPACK_IMPORTED_MODULE_5__/* .DateUtils.getIsoDate */ .E.getIsoDate()}`);
and
console.log(`no new or updated CVEs`);
}
console.log(`opertion completed in ${super.timerSinceStart() / 1000} seconds at ${_core_dateUtils_js__WEBPACK_IMPORTED_MODULE_6__/* .DateUtils.getIsoDate */ .E.getIsoDate()}`);
where operation
was intended.
Hi all,
Just a heads up there's a new prototype pollution CVE for cronvel/tree-kit found here: https://nvd.nist.gov/vuln/detail/CVE-2023-38894
More info about the CVE: https://www.code-intelligence.com/blog/treekit-prototype-pollution-cve-2023-38894
I don't have a json version for the CVE yet sorry.
Cheers
Hey there,
in the cvelist, all vendors and products and their versions are totally messed up.
First off, there seem to be more than one notation for the meaning of "n/a" (aka null
). So far I've identified these notations: n/a
, * n/a *
, *** n/a ***
, NONE
, None
, none
, no
, null
, [UNKNOWN]
, [Unknown]
, Unknown
.
Additionally, all vendors and products are messed up. Sometimes there's the product
field containing the actual versions that are affected in a comma separated list. Sometimes the Vendor is redundantly marked e.g. as Example, Inc
and Example Corporation
and Example
. Siemens alone has more than 10 different notations.
The versions themselves are a whole other story, because most of them are also totally invalid. Even when there's a lessThan
field set, sometimes the value of it is set to None
. It gets even more ridiculous when the same CVE has two different affected versions which logically contradict each other.
Currently, most files have ascii encoding, but some are not. At least 213,576 files are ascii and at least 6,330 are non-ascii.
Please unify all json files to a single encoding. A git action to verify encoding would be helpful long term.
For a hacky check:
#!/usr/bin/env python3
import glob
import subprocess
count_ascii = 0
count_non_ascii = 0
for path in glob.glob("cves/**/*.json", recursive=True):
res = subprocess.run(args=["file", "-bi", path], capture_output=True)
if b"ascii" in res.stdout:
count_ascii += 1
else:
count_non_ascii += 1
print(path)
print(f"ascii: {count_ascii}")
print(f"non-ascii: {count_non_ascii}")
Hi,
While parsing CVE 5 records, I have found multiple issues with non-sanitized descriptions. That includes leading or trailing whitespaces, (multiple) newlines in the middle of a description, for example. Some cases:
./cves/2021/41xxx/CVE-2021-41144.json
"OpenMage LTS is an e-commerce platform. Prior to versions 19.4.22 and 20.0.19, a layout block was able to bypass the block blacklist to execute remote code. Versions 19.4.22 and 20.0.19 contain a patch for this issue.\n\n\n"
./cves/2021/45xxx/CVE-2021-45448.json
"Pentaho Business Analytics\n Server versions before 9.2.0.2 and 8.3.0.25 using the Pentaho \nAnalyzer plugin exposes a service endpoint for templates which allows a \nuser-supplied path to access resources that are out of bounds.ย \n\nThe software uses external input to construct a pathname that is intended to identify a file or \ndirectory that is located underneath a restricted parent directory, but the software does not \nproperly neutralize special elements within the pathname that can cause the pathname to \nresolve to a location that is outside of the restricted directory. ย By using special elements such as \n".." and "/" separators, attackers can escape outside of the restricted \nlocation to access files or directories that are elsewhere on the \nsystem.\n\n\n\n"
Would be nice to have descriptions somewhat sanitized into a single string.
Hi,
I'm working on small tool to sync new and updated CVEs to my app. To save network bandwidth and disk I/O, I use git
to synchronize updates from this repository.
The tool rely on ./cves/recent_activities.json
, but I don't see any updates to ./cves/recent_activities.json
file even though there are updates under ./cves/[year]/
(since 3 weeks ago).
Is the recent_activities.json
file under ./cves/
directory not being updated anymore?
Thank you
I downloaded the following files from the releases area:
2023-10-15_all_CVEs_at_midnight.zip.zip
2023-10-16_all_CVEs_at_midnight.zip.zip
2023-10-15_delta_CVEs_at_xx00Z.zip, where xx runs from 00 through 23 - i.e. 24 zip files
2023-10-15_delta_CVEs_at_end_of_day.zip
I then unzipped all those files and proceeded to apply the deltas in each of the 25 files (24 hourly ones, plus the end-of-day one) to the 10/15 midnight snapshot (just snapshot henceforth). After doing that, I compared the contents of the 10/15 snapshot with those of the 10/16. I thought that, after applying all the deltas in the 25 delta files to the 10/15 snapshot its contents would be identical to those of the 10/16 snapshot.
However, they are not. For example, there is a file called CVE-2023-5591.json under cves/2023/5xxx in the 10/16 snapshot which is not present in the 10/15 snapshot after (or before, at that) applying the deltas. Looking into the deltas for 10/15 themselves, CVE-2023-5591.json is also not present in any of them: in the directory obtained from 2023-10-15_delta_CVEs_at_end_of_day.zip the last file is CVE-2023-5590.json.
I have noticed a similar behavior downloading the corresponding files for different dates: for the most part there will be differences between the midnight snapshot on a given day, with all of the 25 deltas applied, and the midnight snapshot for the next day; it is only occasionally that they both are identical.
Any idea what is going on here? At what point during 10/15 was CVE-2023-5591.json added? Am I missing something?
Steps to reproduce:
--minutes-ago 2880
(we did not get all of the CVE changes in the last 48-hours)--minutes-ago 4320
(we did not get all of the CVE changes in the last 72-hours)Verify that the utilities is working correctly and the resulting /cves
directory has the correct CVEs by running rebuild command
BNB
Looking at the releases page, I see there's a full release at 0000Z, and then incremental releases with a delta tarball from the last full release every hour from 0100Z up to 2300Z. However, it seems to me that any changes from 2300Z until 0000Z will not be available in a delta tarball. That makes those delta tarballs a little less useful, as one will need to download a full tarball once a day anyway to stay up to date.
Could it be possible to have the 0000Z release contain a delta tarball from the previous 0000Z release? So that every change (including those done between 2300 and 0000) would be available in a delta tarball.
if attempting to use this repository as a definitive source of CVEs then it seems having those that have been reserved would be useful but are not currently available.
For example, right now CVE-2023-38545 exists in the cvelist repository... but not cvelistV5
In https://raw.githubusercontent.com/CVEProject/cvelistV5/main/.github/workflows/dist/index.js at dbd65c7
const response = await cveService.cve({ queryString });
let cves = [];
response.cveRecords.forEach(obj => {
const cves = await service.cve({ queryString });
const cveIds = [];
cves.cveRecords.forEach(record => {
const cves = await service.cve({ queryString });
// console.log(`getCvesByPage().cves=${JSON.stringify(cves, null, 2)}`);
const cveIds = [];
cves.cveRecords.forEach(record => {
there's no guarantee that a query to a CVE Services API in AWS will have a response with the application/json content type. In recent and realistic cases, the response can instead have:
Content-type: text/html
<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
</body>
</html>
(for example, this was seen in production 2023-02-12T21:01Z)
For the text/html content type, Axios won't create a JavaScript object, and accessing the cveRecords property will fail.
There was a request for the CVE Services API documentation to mention that text/html may occur, but there was no action on this request: CVEProject/cve-services#549
To resolve this, one possibility is to read the cveRecords property only if the content type is application/json. (It is also realistic for the cveRecords property to be missing when the content type is application/json but the status is 429 - as shown in CVEProject/cve-services#885 - but this perhaps has not occurred in recent months.)
cvelistV5/.github/workflows/update.yml
Lines 48 to 49 in bb74fe0
https://github.com/CVEProject/cvelistV5/commit/bb74fe0f2f6fabe1d25aec108530e45a1b7fc644.patch
From: cvelistV5 Github Action <[email protected]>
Should the first 'h' be capitalized? (GitHub instead of Github)
Is there any potential advantage in using an email address that you control? For example:
https://github.com/CVEProject/cvelist/commit/f47fcda074db819cda8db7213116c29e21e15e14.patch
From: CVE Team <[email protected]>
I have a question regarding the quality of the dataset.
From all CVEs that ...
... 118955 records have not a valid affected software in their details. With some random picks to verify, the software is only noted down in the descriptions[]
fields as text, but are not set inside the containers/cna/affected
Array inside the JSON file.
Is this a mistake in the database export, the CVE website doesn't list any details in the rendered fields on the website?
I've generated a list of those records that do not contain valid affected
fields and exported them here as a gist.
Will legacy severity information, such as CVSS 2.0 vectors, be archived in cvelistV5?
As an example, https://github.com/CVEProject/cvelistV5/blob/main/cves/2012/2xxx/CVE-2012-2125.json does not contain the CVSS 2.0 score from https://nvd.nist.gov/vuln/detail/CVE-2012-2125
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.