Coder Social home page Coder Social logo

anthonyharrison / sbom4python Goto Github PK

View Code? Open in Web Editor NEW
21.0 21.0 6.0 160 KB

A tool to generate a SBOM (Software Bill of Materials) for an installed Python module

License: Apache License 2.0

Python 100.00%
cyclonedx devsecops python sbom sbom-generator security spdx

sbom4python's People

Contributors

anthonyharrison avatar vargenau avatar you-ne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sbom4python's Issues

SPDX Relationships Semantics

First of all, thanks for the work on the nice and lightweight cli tool for creating SBOMs for Python projects.

Regarding SPDX SBOMs, I assume that sbom4pyhton currently generates dependencies with worng semantics. If I read the SPDX documentation on relationships correctly, I assume that a DEPENDS_ON relationship is more appropriate than a CONTAINS relationship to express the build and run dependency between two packages. CONTAINS is suitable for archieves, which physically contain a other files.

Example:

pip show jinja2
Name: Jinja2
Version: 3.1.2
Summary: A very fast and expressive template engine.
Home-page: https://palletsprojects.com/p/jinja/
Author: Armin Ronacher
Author-email: [email protected]
License: BSD-3-Clause
Location: /Users/david/repos/python/sbom4python/env/lib/python3.10/site-packages
Requires: MarkupSafe
Required-by: Flask

Extract from the generated sbom.spdx.json:

{
      "spdxElementId": "SPDXRef-Package-4-jinja2",
      "relatedSpdxElement": "SPDXRef-Package-5-markupsafe",
      "relationshipType": "CONTAINS"
}

Dependency Tree vs Dependency Graph

As far as I have understood the implementation of the scanner.py module, it builds a dependency tree. In the end, every package A has at most one unique parent, which requires A.

IMO this assumption is not correct, because multiple packages may require A as a first-level dependency:

Example (obtained using pip show XY):

  • flask requires: click, itsdangerous, Jinja2, Werkzeug
  • jinja2requires: MarkupSafe
  • Werkzeug requires: MarkupSafe

Here, MarkupSafe has multiple parents and thus one relationship is missed.

sbom4python should rather build a dependency graph than a dependency in order to ensure not to miss any relationships in the SBOMs.

Problems running sbom4python on Windows

Hi Anthony,

I tried to run sbom4python on Windows. The result was

PS E:\Software\Python\sbom4python> sbom4python -m capycli --sbom cyclonedx --format json -o sbom_AH.json
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Program Files\Python311\Scripts\sbom4python.exe\__main__.py", line 4, in <module>
  File "C:\Program Files\Python311\Lib\site-packages\sbom4python\cli.py", line 14, in <module>
    from sbom4python.scanner import SBOMScanner
  File "C:\Program Files\Python311\Lib\site-packages\sbom4python\scanner.py", line 12, in <module>
    from sbom4files.filescanner import FileScanner
  File "C:\Program Files\Python311\Lib\site-packages\sbom4files\filescanner.py", line 7, in <module>
    import magic
  File "C:\Program Files\Python311\Lib\site-packages\magic\__init__.py", line 209, in <module>
    libmagic = loader.load_lib()
               ^^^^^^^^^^^^^^^^^
  File "C:\Program Files\Python311\Lib\site-packages\magic\loader.py", line 49, in load_lib
    raise ImportError('failed to find libmagic.  Check your installation')
ImportError: failed to find libmagic.  Check your installation

I assume all libraries have been installed:

pip list
Package                  Version     Editable project location
------------------------ ----------- ----------------------------------------
aiofiles                 22.1.0
aiosqlite                0.18.0
anyio                    3.6.2
argon2-cffi              21.3.0
argon2-cffi-bindings     21.2.0
arrow                    1.2.3
asttokens                2.2.1
attrs                    22.1.0
Babel                    2.12.1
backcall                 0.2.0
beautifulsoup4           4.11.2
binaryornot              0.4.4
black                    23.3.0
bleach                   5.0.1
BomConverter             0.1         E:\Siemens\Software\Python\bom-converter
boolean.py               4.0
certifi                  2022.12.7
cffi                     1.15.1
chardet                  5.1.0
charset-normalizer       2.1.1
click                    8.1.3
colorama                 0.4.6
comm                     0.1.2
commonmark               0.9.1
contourpy                1.0.7
cycler                   0.11.0
cyclonedx-bom            3.11.0
cyclonedx-python-lib     3.1.5
dateparser               1.1.7
debugpy                  1.6.6
decorator                5.1.1
defusedxml               0.7.1
docutils                 0.19
et-xmlfile               1.1.0
executing                1.2.0
fastjsonschema           2.16.3
filelock                 3.11.0
flake8                   5.0.4
fonttools                4.39.0
fqdn                     1.5.1
idna                     3.4
importlib-metadata       5.2.0
ipykernel                6.21.3
ipython                  8.11.0
ipython-genutils         0.2.0
isoduration              20.11.0
isort                    5.12.0
jaraco.classes           3.2.3
jedi                     0.18.2
Jinja2                   3.1.2
json5                    0.9.11
jsonpointer              2.3
jsonschema               4.17.3
jupyter_client           8.0.3
jupyter_core             5.2.0
jupyter-events           0.6.3
jupyter_server           2.4.0
jupyter_server_fileid    0.8.0
jupyter_server_terminals 0.4.4
jupyter_server_ydoc      0.6.1
jupyter-ydoc             0.2.3
jupyterlab               3.6.1
jupyterlab-pygments      0.2.2
jupyterlab_server        2.20.0
keyring                  23.13.1
kiwisolver               1.4.4
lib4sbom                 0.4.0
license-expression       30.1.0
MarkupSafe               2.1.2
matplotlib               3.7.1
matplotlib-inline        0.1.6
mccabe                   0.7.0
mistune                  2.0.5
more-itertools           9.0.0
mpmath                   1.3.0
mypy                     1.3.0
mypy-extensions          1.0.0
nbclassic                0.5.3
nbclient                 0.7.2
nbconvert                7.2.9
nbformat                 5.7.3
nest-asyncio             1.5.6
networkx                 3.1
notebook                 6.5.3
notebook_shim            0.2.2
numdifftools             0.9.41
numpy                    1.24.2
openpyxl                 3.1.2
packageurl-python        0.10.4
packaging                23.0
pandas                   1.5.3
pandocfilters            1.5.0
parso                    0.8.3
pathspec                 0.11.1
pickleshare              0.7.5
Pillow                   9.4.0
pip                      22.3.1
pip-requirements-parser  32.0.1
pkginfo                  1.9.2
platformdirs             3.1.1
prometheus-client        0.16.0
prompt-toolkit           3.0.38
psutil                   5.9.4
pure-eval                0.2.2
pycodestyle              2.9.1
pycparser                2.21
pyflakes                 2.5.0
Pygments                 2.13.0
pyparsing                3.0.9
pyrsistent               0.19.2
pyrsistent               0.19.2
python-dateutil          2.8.2
python-debian            0.1.49
python-json-logger       2.0.7
python-magic             0.4.27
python-magic-bin         0.4.14
pytz                     2022.7.1
pytz-deprecation-shim    0.1.0.post0
pywin32                  305
pywin32-ctypes           0.2.0
pywinpty                 2.0.10
PyYAML                   6.0
pyzmq                    25.0.0
readme-renderer          37.3
regex                    2022.10.31
regex                    2022.10.31
requests                 2.28.1
requests-toolbelt        0.10.1
reuse                    1.1.2
rfc3339-validator        0.1.4
rfc3986                  2.0.0
rfc3986-validator        0.1.1
rich                     12.6.0
sbom2dot                 0.3.0
sbom4files               0.3.0
sbom4python              0.10.0
scipy                    1.10.1
semantic-version         2.10.0
Send2Trash               1.8.0
setuptools               67.6.0
six                      1.16.0
sniffio                  1.3.0
sortedcontainers         2.4.0
soupsieve                2.4
stack-data               0.6.2
standard-bom-validator   0.1
StandardBomValidator     0.1
sympy                    1.11.1
terminado                0.17.1
tinycss2                 1.2.1
toml                     0.10.2
toml                     0.10.2
tomli                    2.0.1
torch                    2.0.0
tornado                  6.2
traitlets                5.9.0
twine                    4.0.2
types-colorama           0.4.15.12
typing_extensions        4.5.0
tzdata                   2022.7
tzdata                   2022.7
tzlocal                  4.2
tzlocal                  4.2
uri-template             1.2.0
urllib3                  1.26.13
wcwidth                  0.2.6
webcolors                1.12
webencodings             0.5.1
websocket-client         1.5.1
wheel                    0.34.2
y-py                     0.5.9
ypy-websocket            0.8.2
zipp                     3.11.0

I am running Python 3.11.0 on Windows 10.

Feature Request: License URL from product source

I noticed the new version has the SPDX short-form license and a url like so:

           "id": "PSF-2.0",
            "url": "https://opensource.org/licenses/Python-2.0"

I've been told by a few of our teams that they'd rather have a link to the license where it appears in the source for the component, rather than a central site with license texts. Apparently we've had some issues where the reported license doesn't match the one in the code, and one of our legal reps now requires everyone to dig up the source code license file to verify and validate.

Obviously this isn't your problem, since you don't have to work with a grumpy legal rep, but I figured I'd put it in as a feature request just in case you can come up with a genius way to make this an option in the future. (Since github has a standard location for licenses, it might be easy to find in some cases but likely not all.)

Invalid SPDX generated

The SPDX file is in some cases invalid because of incorrect license identifiers.

scancode-toolkit.spdx.txt

Examples in the above scan:

PackageLicenseConcluded: Apache-2
PackageLicenseConcluded: ASL 2.0
PackageLicenseConcluded: BSD
PackageLicenseConcluded: LGPL
PackageLicenseConcluded: MIT/X

I understand the information is taken from a package metadata that is not in SPDX format, but you should not output it as it is.
Or you are able to map it to a correct SPDX identifier, or you should create a custom LicenseRef-

BUGFIX: various bugs when a line of `pip show module` does not contain an ":" delimited entry.

Description

Whenever the result of pip show module, that is stored in out, then parsed line by line (which are stored in the array entry);
contains a line that:

  1. either contains a line wich can't be split with ":" as a delimiter because it doesn't contain it.
  2. either a line that contains the ":" character, but where it is not used to delimit a relevant field for sbom4python.

A bug arises.

In case 1, the execution stops with a "list index out of range" error.
In case 2, the resulting SBOM contains meaningless fields.

CASE 1 error trace:

Traceback (most recent call last):
  File "/home/user/project/.venv/bin/sbom4python", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/cli.py", line 134, in main
    sbom_scan.process_python_module(module_name)
  File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/scanner.py", line 213, in process_python_module
    self.analyze(self.get("Name"), self.get("Requires"))
  File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/scanner.py", line 207, in analyze
    if self.process_module(r, parent):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/scanner.py", line 78, in process_module
    line.split(f"{entry[0]}:", 1)[1].strip().rstrip("\n")
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

It seems that some modules return a uncommonly formatted value when pip show module, for example kiwisolver.

Reproduction

To reproduce this error:
after creating a fresh .venv (python 3.11.2 for me), run :

pip install sbom4python kiwisolver
sbom4python -d -m kiwisolver

Regressions in v0.4.0

I have regenerated SBOM for cve-bin-tool and I see a couple of regressions.

The most obvious one is that whitespaces between words are now replaced with underscores:

-PackageSupplier: Person: Terri Oda
+PackageSupplier: Person: Terri_Oda
-PackageSupplier: Organization: Andrew Svetlov <[email protected]>
+PackageSupplier: Organization: Andrew_Svetlov_<[email protected]>

Some PackageSuppliers have gone missing but I guess it could be some real-life change? I don't know how you get this info.
Example from aiosignal and idna:

-PackageSupplier: Person: Nikolay Kim
+PackageSupplier: NOASSERTION
-PackageSupplier: Person: Kim Davies
+PackageSupplier: NOASSERTION

Some licences were lost. Example from idna:

-##### Reported license BSD-3-Clause
-PackageLicenseConcluded: BSD-3-Clause
-PackageLicenseDeclared: BSD-3-Clause
+##### Reported license
+PackageLicenseConcluded: NOASSERTION
+PackageLicenseDeclared: NOASSERTION

failed to use this tool to generate SBOM from utf-8 files

Thank you four your work, but I encountered the following error when using SBOM to generate files, could you please give me some advice? The logs are as followed:
D:\tool\software\anaconda\envs\KNsbom\Scripts>sbom4python -m D:\the_code\python --sbom spdx --format json -o D:\the_code\STM32project\2-1STM32-22112801\output_sbom.json
Traceback (most recent call last):
File "D:\tool\software\anaconda\envs\KNsbom\lib\runpy.py", line 197, in _run_module_as_main
return run_code(code, main_globals, None,
File "D:\tool\software\anaconda\envs\KNsbom\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\tool\software\anaconda\envs\KNsbom\Scripts\sbom4python.exe_main
.py", line 7, in
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\sbom4python\cli.py", line 131, in main
sbom_scan = SBOMScanner(
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\sbom4python\scanner.py", line 27, in init
self.sbom_package = SBOMPackage()
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\lib4sbom\data\package.py", line 13, in init
self.license = LicenseScanner()
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\lib4sbom\license.py", line 18, in init
self.licenses = json.load(licfile)
File "D:\tool\software\anaconda\envs\KNsbom\lib\json_init
.py", line 293, in load
return loads(fp.read(),
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 87462: illegal multibyte sequence

It seems this the work has voked in dealing with files coding not in 'gbk'. I doubt if i use the wrong command, the command is as followed:
sbom4python -m D:\the_code\python --sbom spdx --format json -o D:\the_code\STM32project\2-1STM32-22112801\output_sbom.json

Could you please give me some advice? Thank you very much.

Feature request: Including optional feature's dependencies

I recently noticed a case where an SBOM that included twisted as a dependency was not listed as a dependency. After careful review, I found that twisted was installed as twisted[tls] and, as a consequence, additional sub-dependencies are installed. I unsuccessfully tried generating an sbom for twisted[tls]. As a workaround, I had to generate SBOMs for the additional sub-dependencies and merge them. It would be great if these can be added automatically by sbom4python given the correct command line input.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.