anthonyharrison / sbom4python Goto Github PK
View Code? Open in Web Editor NEWA tool to generate a SBOM (Software Bill of Materials) for an installed Python module
License: Apache License 2.0
A tool to generate a SBOM (Software Bill of Materials) for an installed Python module
License: Apache License 2.0
First of all, thanks for the work on the nice and lightweight cli tool for creating SBOMs for Python projects.
Regarding SPDX SBOMs, I assume that sbom4pyhton currently generates dependencies with worng semantics. If I read the SPDX documentation on relationships correctly, I assume that a DEPENDS_ON
relationship is more appropriate than a CONTAINS
relationship to express the build and run dependency between two packages. CONTAINS
is suitable for archieves, which physically contain a other files.
Example:
pip show jinja2
Name: Jinja2
Version: 3.1.2
Summary: A very fast and expressive template engine.
Home-page: https://palletsprojects.com/p/jinja/
Author: Armin Ronacher
Author-email: [email protected]
License: BSD-3-Clause
Location: /Users/david/repos/python/sbom4python/env/lib/python3.10/site-packages
Requires: MarkupSafe
Required-by: Flask
Extract from the generated sbom.spdx.json
:
{
"spdxElementId": "SPDXRef-Package-4-jinja2",
"relatedSpdxElement": "SPDXRef-Package-5-markupsafe",
"relationshipType": "CONTAINS"
}
As far as I have understood the implementation of the scanner.py
module, it builds a dependency tree. In the end, every package A has at most one unique parent, which requires A.
IMO this assumption is not correct, because multiple packages may require A as a first-level dependency:
Example (obtained using pip show XY
):
flask
requires: click, itsdangerous, Jinja2, Werkzeugjinja2
requires: MarkupSafeWerkzeug
requires: MarkupSafeHere, MarkupSafe
has multiple parents and thus one relationship is missed.
sbom4python should rather build a dependency graph than a dependency in order to ensure not to miss any relationships in the SBOMs.
Hi Anthony,
I tried to run sbom4python on Windows. The result was
PS E:\Software\Python\sbom4python> sbom4python -m capycli --sbom cyclonedx --format json -o sbom_AH.json
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Program Files\Python311\Scripts\sbom4python.exe\__main__.py", line 4, in <module>
File "C:\Program Files\Python311\Lib\site-packages\sbom4python\cli.py", line 14, in <module>
from sbom4python.scanner import SBOMScanner
File "C:\Program Files\Python311\Lib\site-packages\sbom4python\scanner.py", line 12, in <module>
from sbom4files.filescanner import FileScanner
File "C:\Program Files\Python311\Lib\site-packages\sbom4files\filescanner.py", line 7, in <module>
import magic
File "C:\Program Files\Python311\Lib\site-packages\magic\__init__.py", line 209, in <module>
libmagic = loader.load_lib()
^^^^^^^^^^^^^^^^^
File "C:\Program Files\Python311\Lib\site-packages\magic\loader.py", line 49, in load_lib
raise ImportError('failed to find libmagic. Check your installation')
ImportError: failed to find libmagic. Check your installation
I assume all libraries have been installed:
pip list
Package Version Editable project location
------------------------ ----------- ----------------------------------------
aiofiles 22.1.0
aiosqlite 0.18.0
anyio 3.6.2
argon2-cffi 21.3.0
argon2-cffi-bindings 21.2.0
arrow 1.2.3
asttokens 2.2.1
attrs 22.1.0
Babel 2.12.1
backcall 0.2.0
beautifulsoup4 4.11.2
binaryornot 0.4.4
black 23.3.0
bleach 5.0.1
BomConverter 0.1 E:\Siemens\Software\Python\bom-converter
boolean.py 4.0
certifi 2022.12.7
cffi 1.15.1
chardet 5.1.0
charset-normalizer 2.1.1
click 8.1.3
colorama 0.4.6
comm 0.1.2
commonmark 0.9.1
contourpy 1.0.7
cycler 0.11.0
cyclonedx-bom 3.11.0
cyclonedx-python-lib 3.1.5
dateparser 1.1.7
debugpy 1.6.6
decorator 5.1.1
defusedxml 0.7.1
docutils 0.19
et-xmlfile 1.1.0
executing 1.2.0
fastjsonschema 2.16.3
filelock 3.11.0
flake8 5.0.4
fonttools 4.39.0
fqdn 1.5.1
idna 3.4
importlib-metadata 5.2.0
ipykernel 6.21.3
ipython 8.11.0
ipython-genutils 0.2.0
isoduration 20.11.0
isort 5.12.0
jaraco.classes 3.2.3
jedi 0.18.2
Jinja2 3.1.2
json5 0.9.11
jsonpointer 2.3
jsonschema 4.17.3
jupyter_client 8.0.3
jupyter_core 5.2.0
jupyter-events 0.6.3
jupyter_server 2.4.0
jupyter_server_fileid 0.8.0
jupyter_server_terminals 0.4.4
jupyter_server_ydoc 0.6.1
jupyter-ydoc 0.2.3
jupyterlab 3.6.1
jupyterlab-pygments 0.2.2
jupyterlab_server 2.20.0
keyring 23.13.1
kiwisolver 1.4.4
lib4sbom 0.4.0
license-expression 30.1.0
MarkupSafe 2.1.2
matplotlib 3.7.1
matplotlib-inline 0.1.6
mccabe 0.7.0
mistune 2.0.5
more-itertools 9.0.0
mpmath 1.3.0
mypy 1.3.0
mypy-extensions 1.0.0
nbclassic 0.5.3
nbclient 0.7.2
nbconvert 7.2.9
nbformat 5.7.3
nest-asyncio 1.5.6
networkx 3.1
notebook 6.5.3
notebook_shim 0.2.2
numdifftools 0.9.41
numpy 1.24.2
openpyxl 3.1.2
packageurl-python 0.10.4
packaging 23.0
pandas 1.5.3
pandocfilters 1.5.0
parso 0.8.3
pathspec 0.11.1
pickleshare 0.7.5
Pillow 9.4.0
pip 22.3.1
pip-requirements-parser 32.0.1
pkginfo 1.9.2
platformdirs 3.1.1
prometheus-client 0.16.0
prompt-toolkit 3.0.38
psutil 5.9.4
pure-eval 0.2.2
pycodestyle 2.9.1
pycparser 2.21
pyflakes 2.5.0
Pygments 2.13.0
pyparsing 3.0.9
pyrsistent 0.19.2
pyrsistent 0.19.2
python-dateutil 2.8.2
python-debian 0.1.49
python-json-logger 2.0.7
python-magic 0.4.27
python-magic-bin 0.4.14
pytz 2022.7.1
pytz-deprecation-shim 0.1.0.post0
pywin32 305
pywin32-ctypes 0.2.0
pywinpty 2.0.10
PyYAML 6.0
pyzmq 25.0.0
readme-renderer 37.3
regex 2022.10.31
regex 2022.10.31
requests 2.28.1
requests-toolbelt 0.10.1
reuse 1.1.2
rfc3339-validator 0.1.4
rfc3986 2.0.0
rfc3986-validator 0.1.1
rich 12.6.0
sbom2dot 0.3.0
sbom4files 0.3.0
sbom4python 0.10.0
scipy 1.10.1
semantic-version 2.10.0
Send2Trash 1.8.0
setuptools 67.6.0
six 1.16.0
sniffio 1.3.0
sortedcontainers 2.4.0
soupsieve 2.4
stack-data 0.6.2
standard-bom-validator 0.1
StandardBomValidator 0.1
sympy 1.11.1
terminado 0.17.1
tinycss2 1.2.1
toml 0.10.2
toml 0.10.2
tomli 2.0.1
torch 2.0.0
tornado 6.2
traitlets 5.9.0
twine 4.0.2
types-colorama 0.4.15.12
typing_extensions 4.5.0
tzdata 2022.7
tzdata 2022.7
tzlocal 4.2
tzlocal 4.2
uri-template 1.2.0
urllib3 1.26.13
wcwidth 0.2.6
webcolors 1.12
webencodings 0.5.1
websocket-client 1.5.1
wheel 0.34.2
y-py 0.5.9
ypy-websocket 0.8.2
zipp 3.11.0
I am running Python 3.11.0 on Windows 10.
I noticed the new version has the SPDX short-form license and a url like so:
"id": "PSF-2.0",
"url": "https://opensource.org/licenses/Python-2.0"
I've been told by a few of our teams that they'd rather have a link to the license where it appears in the source for the component, rather than a central site with license texts. Apparently we've had some issues where the reported license doesn't match the one in the code, and one of our legal reps now requires everyone to dig up the source code license file to verify and validate.
Obviously this isn't your problem, since you don't have to work with a grumpy legal rep, but I figured I'd put it in as a feature request just in case you can come up with a genius way to make this an option in the future. (Since github has a standard location for licenses, it might be easy to find in some cases but likely not all.)
The SPDX file is in some cases invalid because of incorrect license identifiers.
Examples in the above scan:
PackageLicenseConcluded: Apache-2
PackageLicenseConcluded: ASL 2.0
PackageLicenseConcluded: BSD
PackageLicenseConcluded: LGPL
PackageLicenseConcluded: MIT/X
I understand the information is taken from a package metadata that is not in SPDX format, but you should not output it as it is.
Or you are able to map it to a correct SPDX identifier, or you should create a custom LicenseRef-
Whenever the result of pip show module
, that is stored in out
, then parsed line by line (which are stored in the array entry
);
contains a line that:
sbom4python
.A bug arises.
In case 1, the execution stops with a "list index out of range" error.
In case 2, the resulting SBOM contains meaningless fields.
CASE 1 error trace:
Traceback (most recent call last):
File "/home/user/project/.venv/bin/sbom4python", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/cli.py", line 134, in main
sbom_scan.process_python_module(module_name)
File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/scanner.py", line 213, in process_python_module
self.analyze(self.get("Name"), self.get("Requires"))
File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/scanner.py", line 207, in analyze
if self.process_module(r, parent):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/project/.venv/lib/python3.11/site-packages/sbom4python/scanner.py", line 78, in process_module
line.split(f"{entry[0]}:", 1)[1].strip().rstrip("\n")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
It seems that some modules return a uncommonly formatted value when pip show module
, for example kiwisolver
.
To reproduce this error:
after creating a fresh .venv
(python 3.11.2
for me), run :
pip install sbom4python kiwisolver
sbom4python -d -m kiwisolver
I have regenerated SBOM for cve-bin-tool and I see a couple of regressions.
The most obvious one is that whitespaces between words are now replaced with underscores:
-PackageSupplier: Person: Terri Oda
+PackageSupplier: Person: Terri_Oda
-PackageSupplier: Organization: Andrew Svetlov <[email protected]>
+PackageSupplier: Organization: Andrew_Svetlov_<[email protected]>
Some PackageSupplier
s have gone missing but I guess it could be some real-life change? I don't know how you get this info.
Example from aiosignal and idna:
-PackageSupplier: Person: Nikolay Kim
+PackageSupplier: NOASSERTION
-PackageSupplier: Person: Kim Davies
+PackageSupplier: NOASSERTION
Some licences were lost. Example from idna:
-##### Reported license BSD-3-Clause
-PackageLicenseConcluded: BSD-3-Clause
-PackageLicenseDeclared: BSD-3-Clause
+##### Reported license
+PackageLicenseConcluded: NOASSERTION
+PackageLicenseDeclared: NOASSERTION
Actually when the package is downloaded through pip, a non-functional version is retrieved.
Thank you four your work, but I encountered the following error when using SBOM to generate files, could you please give me some advice? The logs are as followed:
D:\tool\software\anaconda\envs\KNsbom\Scripts>sbom4python -m D:\the_code\python --sbom spdx --format json -o D:\the_code\STM32project\2-1STM32-22112801\output_sbom.json
Traceback (most recent call last):
File "D:\tool\software\anaconda\envs\KNsbom\lib\runpy.py", line 197, in _run_module_as_main
return run_code(code, main_globals, None,
File "D:\tool\software\anaconda\envs\KNsbom\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\tool\software\anaconda\envs\KNsbom\Scripts\sbom4python.exe_main.py", line 7, in
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\sbom4python\cli.py", line 131, in main
sbom_scan = SBOMScanner(
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\sbom4python\scanner.py", line 27, in init
self.sbom_package = SBOMPackage()
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\lib4sbom\data\package.py", line 13, in init
self.license = LicenseScanner()
File "D:\tool\software\anaconda\envs\KNsbom\lib\site-packages\lib4sbom\license.py", line 18, in init
self.licenses = json.load(licfile)
File "D:\tool\software\anaconda\envs\KNsbom\lib\json_init.py", line 293, in load
return loads(fp.read(),
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 87462: illegal multibyte sequence
It seems this the work has voked in dealing with files coding not in 'gbk'. I doubt if i use the wrong command, the command is as followed:
sbom4python -m D:\the_code\python --sbom spdx --format json -o D:\the_code\STM32project\2-1STM32-22112801\output_sbom.json
Could you please give me some advice? Thank you very much.
I recently noticed a case where an SBOM that included twisted
as a dependency was not listed as a dependency. After careful review, I found that twisted was installed as twisted[tls]
and, as a consequence, additional sub-dependencies are installed. I unsuccessfully tried generating an sbom for twisted[tls]
. As a workaround, I had to generate SBOMs for the additional sub-dependencies and merge them. It would be great if these can be added automatically by sbom4python
given the correct command line input.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.