Coder Social home page Coder Social logo

cvelistv5's People

Contributors

hkong avatar hkong-mitre avatar jwhitmore-mitre avatar rbrittonmitre avatar rroberge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cvelistv5's Issues

Question about JSON 5.0 format

Hey there,

I've discovered this repository by accident, and didn't know about the cvelist repository either before.

Currently, the (old) CVE website announced that there's the new JSON 5.0 format being used for the CVE List downloads page starting late spring this year.

I am just a little confused as to "how official" this repository is, due to it only having 4 stars ๐Ÿ˜ž and as there's not any mention on the new and neither the old CVE website.

For what it's worth: I love the new JSON 5.0 format, as it makes parsing and keeping track of everything much much easier. With the old format you always had to scrape a lot of details and unnecessarily drain mitre's / NVD's servers and/or proprietary databases that probably just scraped NVD themselves, and all this can be avoided with the new format and makes access to those details much more public.

Now my question is mostly about the contents of this repository.

  • Is this repository the new go-to resource for the new CVEs after the migration?
  • The cvelist repository still uses the JSON 4.0 format, so I'm assuming this will be deprecated after the migration?

Thanks in advice.

Published date changed on thousands of CVEs

Making a plot of published dates in the JSON 5.0 has two weird jumps in it.
image

image

List of `assignerShortName` and `assignerOrgId`

Hello.

I'm working on a software that consumes CVE data and I'd like to know if there is such a thing as a list of assignerShortName and assignerOrgId, so that we can easily match an ID with an organization, programmatically.

From this repository, I can see that, for example:

assignerShortName assignerOrgId
canonical cc1ad9ee-3454-478d-9317-d3e869d708bc
debian 79363d38-fa19-49d1-9214-5f28da3f3ac5
mitre 8254265b-2729-46b6-b9e3-3dfca2d5bfca
... ...

I could just iterate all over this project and create the list myself but it would be nice to avoid that and use an official source (an API, for example). I saw the List of Partners section on the web but it doesn't seem to include the organization ID (and even if it does, web scrapping would be required to consume that).

For the case there is no such thing and it's me who has to create this list: Are these IDs going to change someday? Or are they fixed?

Thanks in advance,
and great work here, BTW.

Official way to synchronize the JSON 5.0 feeds

Hello,

First of all thank you for the awesome work you do concerning the CVE ecosystem!

I'm the developer of a CVE-related tool, and I would like to add the MITRE in my sources (instead of only relying on NVD for now). But to be honest I don't really know how to parse your feed.

So I would like to ask you the official and recommended way to synchronize our local databases with the new JSON 5.0 CVE list.

I searched on your blog posts and if I'm not wrong you're currently in "Soft Deploy" state, meaning CNAs now use the new format to declare CVEs. The "Hard Deploy" is targeted for 1st QT, 2023. At this moment we (as consumers) will be able to officially use the JSON 5.0 feeds.

But where to find the list please? I think the old format (csv, html, text, xml) will be removed, so maybe you will provide an API (or something similar as the NVD does) to fetch the last changes?

Or maybe this current repo (cveproject/cvelistv5) will become our official data feeds? If yes do you recommend to use the recent_activities.json file to detect the changes or simply periodically git pull and parse the new diffs?

Thank you in advance for your answer,
Nicolas

"opertion" misspelling

https://raw.githubusercontent.com/CVEProject/cvelistV5/main/.github/workflows/dist/index.js at c376add has

            activityLog.writeRecentFile();
        }
        console.log(`opertion completed in ${super.timerSinceStart() / 1000 / 60} minutes at ${_core_dateUtils_js__WEBPACK_IMPORTED_MODULE_5__/* .DateUtils.getIsoDate */ .E.getIsoDate()}`);

and

            console.log(`no new or updated CVEs`);
        }
        console.log(`opertion completed in ${super.timerSinceStart() / 1000} seconds at ${_core_dateUtils_js__WEBPACK_IMPORTED_MODULE_6__/* .DateUtils.getIsoDate */ .E.getIsoDate()}`);

where operation was intended.

Vendors, Products and Versions are totally messed up

Hey there,

in the cvelist, all vendors and products and their versions are totally messed up.

First off, there seem to be more than one notation for the meaning of "n/a" (aka null). So far I've identified these notations: n/a, * n/a *, *** n/a ***, NONE, None, none, no, null, [UNKNOWN], [Unknown], Unknown.

Additionally, all vendors and products are messed up. Sometimes there's the product field containing the actual versions that are affected in a comma separated list. Sometimes the Vendor is redundantly marked e.g. as Example, Inc and Example Corporation and Example. Siemens alone has more than 10 different notations.

The versions themselves are a whole other story, because most of them are also totally invalid. Even when there's a lessThan field set, sometimes the value of it is set to None. It gets even more ridiculous when the same CVE has two different affected versions which logically contradict each other.

standardize encoding

Currently, most files have ascii encoding, but some are not. At least 213,576 files are ascii and at least 6,330 are non-ascii.

Please unify all json files to a single encoding. A git action to verify encoding would be helpful long term.

For a hacky check:

#!/usr/bin/env python3

import glob
import subprocess

count_ascii = 0
count_non_ascii = 0

for path in glob.glob("cves/**/*.json", recursive=True):
    res = subprocess.run(args=["file", "-bi", path], capture_output=True)
    if b"ascii" in res.stdout:
        count_ascii += 1
    else:
        count_non_ascii += 1
        print(path)

print(f"ascii:     {count_ascii}")
print(f"non-ascii: {count_non_ascii}")

Descriptions contain newlines and other characters

Hi,

While parsing CVE 5 records, I have found multiple issues with non-sanitized descriptions. That includes leading or trailing whitespaces, (multiple) newlines in the middle of a description, for example. Some cases:

./cves/2021/41xxx/CVE-2021-41144.json
"OpenMage LTS is an e-commerce platform. Prior to versions 19.4.22 and 20.0.19, a layout block was able to bypass the block blacklist to execute remote code. Versions 19.4.22 and 20.0.19 contain a patch for this issue.\n\n\n"

./cves/2021/45xxx/CVE-2021-45448.json
"Pentaho Business Analytics\n Server versions before 9.2.0.2 and 8.3.0.25 using the Pentaho \nAnalyzer plugin exposes a service endpoint for templates which allows a \nuser-supplied path to access resources that are out of bounds.ย \n\nThe software uses external input to construct a pathname that is intended to identify a file or \ndirectory that is located underneath a restricted parent directory, but the software does not \nproperly neutralize special elements within the pathname that can cause the pathname to \nresolve to a location that is outside of the restricted directory. ย By using special elements such as \n".." and "/" separators, attackers can escape outside of the restricted \nlocation to access files or directories that are elsewhere on the \nsystem.\n\n\n\n"

Would be nice to have descriptions somewhat sanitized into a single string.

`recent_activities.json` not being updated

Hi,

I'm working on small tool to sync new and updated CVEs to my app. To save network bandwidth and disk I/O, I use git to synchronize updates from this repository.

The tool rely on ./cves/recent_activities.json, but I don't see any updates to ./cves/recent_activities.json file even though there are updates under ./cves/[year]/ (since 3 weeks ago).

Is the recent_activities.json file under ./cves/ directory not being updated anymore?

Thank you

CVEs missing in deltas

I downloaded the following files from the releases area:

2023-10-15_all_CVEs_at_midnight.zip.zip
2023-10-16_all_CVEs_at_midnight.zip.zip
2023-10-15_delta_CVEs_at_xx00Z.zip, where xx runs from 00 through 23 - i.e. 24 zip files
2023-10-15_delta_CVEs_at_end_of_day.zip

I then unzipped all those files and proceeded to apply the deltas in each of the 25 files (24 hourly ones, plus the end-of-day one) to the 10/15 midnight snapshot (just snapshot henceforth). After doing that, I compared the contents of the 10/15 snapshot with those of the 10/16. I thought that, after applying all the deltas in the 25 delta files to the 10/15 snapshot its contents would be identical to those of the 10/16 snapshot.

However, they are not. For example, there is a file called CVE-2023-5591.json under cves/2023/5xxx in the 10/16 snapshot which is not present in the 10/15 snapshot after (or before, at that) applying the deltas. Looking into the deltas for 10/15 themselves, CVE-2023-5591.json is also not present in any of them: in the directory obtained from 2023-10-15_delta_CVEs_at_end_of_day.zip the last file is CVE-2023-5590.json.

I have noticed a similar behavior downloading the corresponding files for different dates: for the most part there will be differences between the midnight snapshot on a given day, with all of the 25 deltas applied, and the midnight snapshot for the next day; it is only occasionally that they both are identical.

Any idea what is going on here? At what point during 10/15 was CVE-2023-5591.json added? Am I missing something?

Delta tarballs not generated at 0000Z, changes done in 2300-0000 missing

Looking at the releases page, I see there's a full release at 0000Z, and then incremental releases with a delta tarball from the last full release every hour from 0100Z up to 2300Z. However, it seems to me that any changes from 2300Z until 0000Z will not be available in a delta tarball. That makes those delta tarballs a little less useful, as one will need to download a full tarball once a day anyway to stay up to date.

Could it be possible to have the 0000Z release contain a delta tarball from the previous 0000Z release? So that every change (including those done between 2300 and 0000) would be available in a delta tarball.

no RESERVED CVEs

if attempting to use this repository as a definitive source of CVEs then it seems having those that have been reserved would be useful but are not currently available.
For example, right now CVE-2023-38545 exists in the cvelist repository... but not cvelistV5

no guarantee that a cveRecords property will exist

In https://raw.githubusercontent.com/CVEProject/cvelistV5/main/.github/workflows/dist/index.js at dbd65c7

const response = await cveService.cve({ queryString });
let cves = [];
response.cveRecords.forEach(obj => {
const cves = await service.cve({ queryString });
const cveIds = [];
cves.cveRecords.forEach(record => {
const cves = await service.cve({ queryString });
// console.log(`getCvesByPage().cves=${JSON.stringify(cves, null, 2)}`);
const cveIds = [];
cves.cveRecords.forEach(record => {

there's no guarantee that a query to a CVE Services API in AWS will have a response with the application/json content type. In recent and realistic cases, the response can instead have:

Content-type: text/html

<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
</body>
</html>

(for example, this was seen in production 2023-02-12T21:01Z)

For the text/html content type, Axios won't create a JavaScript object, and accessing the cveRecords property will fail.

There was a request for the CVE Services API documentation to mention that text/html may occur, but there was no action on this request: CVEProject/cve-services#549

To resolve this, one possibility is to read the cveRecords property only if the content type is application/json. (It is also realistic for the cveRecords property to be missing when the content type is application/json but the status is 429 - as shown in CVEProject/cve-services#885 - but this perhaps has not occurred in recent months.)

change user.name (or user.email)?

git config --global user.email "[email protected]"
git config --global user.name "cvelistV5 Github Action"

results in this data format in the patch view of a commit:

https://github.com/CVEProject/cvelistV5/commit/bb74fe0f2f6fabe1d25aec108530e45a1b7fc644.patch

From: cvelistV5 Github Action <[email protected]>

Should the first 'h' be capitalized? (GitHub instead of Github)

Is there any potential advantage in using an email address that you control? For example:

https://github.com/CVEProject/cvelist/commit/f47fcda074db819cda8db7213116c29e21e15e14.patch

From: CVE Team <[email protected]>

118955 CVE records don't have an affected product/vendor or version

I have a question regarding the quality of the dataset.

From all CVEs that ...

  • have not been rejected
  • have not been reserved

... 118955 records have not a valid affected software in their details. With some random picks to verify, the software is only noted down in the descriptions[] fields as text, but are not set inside the containers/cna/affected Array inside the JSON file.

Is this a mistake in the database export, the CVE website doesn't list any details in the rendered fields on the website?

I've generated a list of those records that do not contain valid affected fields and exported them here as a gist.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.