dlint-py / dlint Goto Github PK

View Code? Open in Web Editor NEW

162.0 2.0 17.0 540 KB

Dlint is a tool for encouraging best coding practices and helping ensure Python code is secure.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

python security linter static-analysis flake8 dlint

dlint's People

Contributors

Stargazers

Watchers

Forkers

hartwork drewdennison lyz-code admdev8 0x1mahmoud djmattyg007 jkklapp hughdavenport jonyscathe kianmeng clavedeluna sobolevn ericbn rc-mattschwager jogo arpitjain799 aw-junaid

dlint's Issues

DUO107 whitelist from xml.etree.ElementTree import Element, SubElement

defusedxml is not capable of creating Element or SubElement

Whitelisting in bad_xml_use.py

@property
    def whitelisted_modules(self):
        return [
            'xml.sax.saxutils',
            'xml.etree.ElementTree.Element',
            'xml.etree.ElementTree.SubElement',
        ]

would help solve this.

A work around could then be used as follows:

from xml.etree.ElementTree import Element, SubElement

import defusedxml.ElementTree as ET

ET.Element = _ElementType = Element
ET.SubElement = SubElement

Note that we still disallow

from xml.etree.ElementTree import parse

Relationship to bandit

What is the relationship between dlint and bandit?

What I can see:

dlint is a flake8 plugin while bandit is a project of its own (there is flake8-bandit, though)
bandit is part of PyCQA which includes also flake8 and flake8-bugbear
bandit has 2700 GitHub stars, dlint has 45

Could somebody maybe point out reasons to use one or the other? Do you maybe use both together? Is there an overlap between the communities?

Refactor redos alternation detection to better handle overlap

Alternation detection currently compares different expression types (e.g. literals, not literals, negate literals, ranges, dots) to ranges. It doesn't check all combinations of aforementioned types. E.g.

$ python -m dlint.redos -p '(a|[a-c])+'
('(a|[a-c])+', True)
$ python -m dlint.redos -p '(a|aa)+'
('(a|aa)+', False)

Though a and aa clearly overlap. We should check all combinations of previously mentioned expression types against each other. This will likely require a refactor to make the comparisons easier.

Some thoughts on a refactor: my initial thoughts are to turn each expression type into a list of ranges it supports, e.g.

Literal a becomes range a through a
Range [a-zA-Z] becomes range a through z and range A through Z
Category \w becomes the range of printable characters

We'll probably need to support some kind of negation operation on our ranges to correctly/easily handle not literals (e.g. [^a]) and negate literals (e.g. [^abc]).

After we've created a unified type that can be instantiated from each expression type, then performing comparisons will be much easier. We'll simply have to iterate over the 2-combination of all ranges and see if there's any overlap.

False positive in redos detection when backtracking doesn't occur

In order for catastrophic backtracking to occur there must be a character that forces backtracking to occur. E.g.

>>> re.match('(a+)+', 'a' * 64 + 'c')
<_sre.SRE_Match at 0x7f9106d32430>
>>> re.match('(a+)+b', 'a' * 64 + 'c')
...Spins...

The first expression has nested quantifiers, so Dlint will detect it, but since there's nothing after the nested quantifier to backtrack the match no catastrophic backtracking will occur. Similarly:

$ python -m dlint.redos -p '(a+)+'
('(a+)+', True)
$ python -m dlint.redos -p '(a+)+b'
('(a+)+b', True)

We should avoid detecting the first expression because catastrophic backtracking cannot occur.

Do not package `tests` folder with distributed `dlint` package

Hi,

I have a small improvement suggestion for the contents of the dlint python package distributed in PyPi. This is a minor quality of life suggestion from the end user perspective.

When I install dlint on a fresh Python virtual environment, the installation also comes with the tests package

Python 3.10.12
miikama$ python3 -m venv env
miikama$ source env/bin/activate
(env) miikama$ ls env/lib/python3.10/site-packages/
_distutils_hack           pip                   pkg_resources  setuptools-59.6.0.dist-info
distutils-precedence.pth  pip-22.0.2.dist-info  setuptools
(env) miikama$ python3 -m pip install dlint
Collecting dlint
  Downloading dlint-0.14.1-py3-none-any.whl (77 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77.7/77.7 KB 1.4 MB/s eta 0:00:00
Collecting flake8>=3.6.0
  Downloading flake8-7.0.0-py2.py3-none-any.whl (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.6/57.6 KB 6.1 MB/s eta 0:00:00
Collecting pycodestyle<2.12.0,>=2.11.0
  Downloading pycodestyle-2.11.1-py2.py3-none-any.whl (31 kB)
Collecting pyflakes<3.3.0,>=3.2.0
  Downloading pyflakes-3.2.0-py2.py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.7/62.7 KB 7.2 MB/s eta 0:00:00
Collecting mccabe<0.8.0,>=0.7.0
  Downloading mccabe-0.7.0-py2.py3-none-any.whl (7.3 kB)
Installing collected packages: pyflakes, pycodestyle, mccabe, flake8, dlint
Successfully installed dlint-0.14.1 flake8-7.0.0 mccabe-0.7.0 pycodestyle-2.11.1 pyflakes-3.2.0
(env) miikama$ ls env/lib/python3.10/site-packages/
__pycache__               flake8                  pip-22.0.2.dist-info          pyflakes-3.2.0.dist-info
_distutils_hack           flake8-7.0.0.dist-info  pkg_resources                 setuptools
distutils-precedence.pth  mccabe-0.7.0.dist-info  pycodestyle-2.11.1.dist-info  setuptools-59.6.0.dist-info
dlint                     mccabe.py               pycodestyle.py                **tests**
dlint-0.14.1.dist-info    pip                     pyflakes
(env) miikama$ pip show dlint
Name: dlint
Version: 0.14.1
Summary: Dlint is a tool for encouraging best coding practices and helping ensure Python code is secure.
Home-page: https://github.com/dlint-py/dlint
Author:
Author-email:
License: BSD-3-Clause
Location: env/lib/python3.10/site-packages
Requires: flake8
Required-by:

It would be nice if dlint did not introduce a package tests into the global python distribution site-packages directory. Because, afterwards, if run from tests import ... this package is sometimes picked up even by accident.

My suggestion: Do not package tests folder with dlint package to prevent the introduction of the directory env/lib/python3.10/site-packages/tests/ after installation of dlint package

Have a great day! :)

Whitelist yaml detection when using SafeLoader

The following pyyaml calls should be safe:

yaml.load(..., Loader=yaml.SafeLoader)
yaml.load(..., Loader=yaml.CSafeLoader)

I believe this is equivalent to using safe_load, but I've encountered a few false positives in the wild using this code.

Add quantifiers in sequence to redos detection

Per Preventing Regular Expression Denial of Service (ReDoS), Dlint currently supports alternation and nested quantifiers. Redos can also occur via quantifiers in sequence. Let's add support for detecting this redos avenue to Dlint.

Stop recommending defusedxml instead of lxml / warn about specific lxml misuse

defusedxml.lxml was never production-grade, and is now deprecated, it has multiple incompatibilities with lxml "proper" and breaks code
lxml had options around sensitive XML features, and has added more after defused outlined issues:
- resolve_entities defaults to True, probably is what allows quadratic blowup and local entity expansion attacks (that lxml is not sujbect to billion laughs I expect means it doesn't do recursive entity expansion), it might be a bit brutal to require it being disabled however it's an option and lxml's FAQ provides a recipe for restricted entity expansion
- no_network defaults to True (no network lookup) and protects against external entity expansion and DTD retrieval, disabling it should probably be flagged
- huge_tree protects against xml bombs by default, enabling it should probably be flagged
- lxml's xinclude support is opt-in (also), use of these should probably be flagged

Finally xpath and xslt are a bit more complicated as they're "legit and safe" in the same sense e.g. database APIs are, running a "static" query should be safe (and lxml's xpath API supports parametrisation) but untrusted xpath and xslt injection / untrusted execution has similar issues to sql.

Add linter to detect input asking for password

Bad:

input("Password: ")

Good:

getpass.getpass("Password: ")

See getpass.

We can probably just look for input calls that contain the string literal 'password'.

Hashlib linter should not warn when `usedforsecurity=False` is set

This is a built-in flag that affirms the code is not using md5 for security-related purposes. In my code's case, I'm manually calculating a checksum as required by the AWS S3 API.

Flake8 v6 compatibility

Would it be possible to remove the upper bound on flake8 requirement?

https://iscinumpy.dev/post/bound-version-constraints/

If not, can we get that bumped up to flake8<7 please?

Update DUO108 documentation

DUO108 is outdated in the following ways:

it mentions Python 2, deprecated by dlint
it says it only runs for Python 2, but that doesn't seem to be the case
it suggests use of raw_input which is also deprecated in py3

Anything else?

Support for flake8==5.x.x

Flake8 5 came out a couple of days ago.
Does anything here need to change other than requirements.txt?

False positive in redos detection when re.DOTALL missing

The following expression doesn't ReDoS, but Dlint detects it:

re.search(r'(\n.*)+a', '\n' * 64 + 'b')

However, this expression does ReDoS:

re.search(r'(\n.*)+a', '\n' * 64 + 'b', re.DOTALL)

Fixing this requires a large amount of work for little gain in reducing false positives. The first example doesn't seem very common. We don't currently analyze the flags passed to re functions, so adding this functionality would take considerable work.

Support flake8>=4

Please upgrade dlint to support flake8>=4. There doesn't appear to be any breaking changes that will affect dlint.

https://flake8.pycqa.org/en/latest/release-notes/4.0.0.html#backwards-incompatible-changes

redos detection misses issues if the regex is provided via a variable

dlint detects redos issues only if the regex is hardcoded into the place where the re function is called. If the regex is stored in a constant, it doesn't catch it.

For example, this code would be flagged:

    text = re.sub(r"""(?i)\b((?:https?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)/)(?:[^\s()<>{}\[\]]+|\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\))+(?:\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\)|[^\s`!()\[\]{};:\'\'.,<>?«»“”‘’])|(?:(?<!@)[a-z0-9]+(?:[.\-][a-z0-9]+)*[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)\b/?(?!@)))""", "", text)

But this would not:

URL_REGEX = r"""(?i)\b((?:https?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)/)(?:[^\s()<>{}\[\]]+|\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\))+(?:\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\)|[^\s`!()\[\]{};:\'\'.,<>?«»“”‘’])|(?:(?<!@)[a-z0-9]+(?:[.\-][a-z0-9]+)*[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)\b/?(?!@)))"""  # noqa E501

text = re.sub(URL_REGEX, "", text)

In the original code where I discovered this, URL_REGEX was defined in a different module, then imported. It would be great to handle this as well (i.e.. the regex is in a variable in the same or in a different module than the re. call).

Handle kwargs passed as positional arguments

In the BadKwargUseLinter helper we can observe the following behavior:

def func(a, b='foo'):
    # ...
    return a * b

func(1, b='bar')  # Caught with BadKwargUseLinter
func(2, 'bar')  # Uncaught with BadKwargUseLinter

That is, kwargs passed as positional arguments will not be caught by BadKwargUseLinter. We could remedy this by supporting an additional piece of configuration in the BadKwargUseLinter.kwargs property. If we took another, optional configuration option like position we could detect this behavior.

This is somewhat of a fringe use-case. It may not be worth it to try and maintain the position of function kwargs since they could easily change out from underneath us.

Publish sources on PyPi

Installing the package using pip with the --no-binary flag fails, because no sdist is available on PyPi

$ python -m pip install dlint --no-binary :all:
ERROR: Could not find a version that satisfies the requirement dlint (from versions: none)
ERROR: No matching distribution found for dlint

Add linter for insecure JWT usage in python-jose

TODO: find all examples of insecure usage

https://github.com/mpdavis/python-jose

Drop Python 3.6 support

Python 3.6 was deprecated 8 months ago so it would be good to drop it. Locations to update:

setup.py 3.6 mention
look for any usage of sys.version_info that needs to be updated accordingly

Exception during ReDoS detection with malformed expression

Malformed expressions currently cause Dlint to raise an exception:

$ pipenv run python -m dlint.redos -p '(foo'
Traceback (most recent call last):
...
sre_constants.error: missing ), unterminated subpattern at position 0

Since we'll be running Dlint across many files and don't want it to crash we should handle this error gracefully. We can simply ignore malformed expressions.

pkgutil.iter_modules is evaluated once per module

Under python 3.11 and flake8 6.0 with flint 0.14.0 pkgutil.iter_modules is called via get_plugin_linter_classes once per module. On my Mac it takes over 100ms

In [2]: %timeit list(pkgutil.iter_modules())
122 ms ± 4.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

If running flake8 over several hundred files this adds up to 10s of seconds of extra wait time.

Add Python 3.9 support

Create a community Github Action for Dlint

Similar to these community actions: https://github.com/sdras/awesome-actions

We should build an action for Dlint. Bonus points for running Dlint against Dlint :)

Optimize which linters are run against the AST

Currently, when running the flake8 plugin, we run every linter against the AST that flake8 passes us and flake8 does rule filtering of results based on what's enabled. So we run all linters despite what's set via --select or --ignore. Instead we should only run linters if they're enabled. This will speed up Dlint's run time.

Remove python2 compatibility code

From a brief perusal of the codebase, it looks like there's still some code designed for python2 compatibility (such as unnecessary __future__ imports). It would be good to remove this, to avoid confusion for contributors.

Add linter for insecure crypto use in authlib library

The Python Authlib library provides various authentication functionality. There are some potential insecurities surrounding its crypto usage. These include:

JsonWebEncryption(algorithms=['RSA1_5'])
JsonWebSignature(algorithms=['RS256'])
JsonWebSignature(algorithms=['RS384'])
JsonWebSignature(algorithms=['RS512'])

These algorithms make use of RSA PKCSv1.5 which has some security considerations. Let's write a linter to check for this usage.

Add cache_info to benchmarking information in dlint.namespace

functools.lru_cache was recently added to dlint.namespace which provided a great speed up. We should output functools.lru_cache.cache_info information when benchmarking.

We should be able to:

Run the benchmarking code over a Python file
Output linter.namespace.illegal_module_imported.cache_info
Output linter.namespace.name_imported.cache_info
Consider caching asname_to_name as well

Outputting cache_info will tell us if we're efficiently caching information, and can allow for greater profiling and speed ups.

Performance issues on certain large files

E.g. Django tests/admin_views/tests.py

Dlint:

python -m flake8 --select=DUO --isolated django/tests/admin_views/tests.py
...
real	0m27.674s
user	0m27.638s
sys	0m0.037s

Regular Flake8:

python -m flake8 --isolated django/tests/admin_views/tests.py
...
real	0m1.715s
user	0m1.679s
sys	0m0.036s

Add linter for broken function level authorization

Per the OWASP API Security Top 10, broken function level authorization is a big security concern. Adding a linter to detect this would be very useful. Most Python web application frameworks use decorators on function-level API routes (e.g. rest_framework.decorators.api_view in Django REST framework, flask_login.login_required in Flask-Login).

One way I can envision implementing this would be looking for decorator anomalies in Python files that look like they contain API routes. E.g.

@api.route("/users")
@login_required
def users(request):
    ...

@api.route("/groups")
@login_required
def groups(request):
    ...

@api.route("/settings")
def settings(request):
    # Oops, did we forget @login_required?
    ...

@api.route("/jobs")
@login_required
def jobs(request):
    ...

If XX% of API routes in a file are missing what looks like an authentication decorator, we can flag the function missing the decorator. Another common one for authorization might look something like:

@app.route("/users", roles=[User.Admin])
def users(request):
    ...

@api.route("/groups", roles=[User.Admin])
def groups(request):
    ...

@api.route("/settings", roles=[User.Regular])
def settings(request):
    # Oops, can all users access this sensitive endpoint?
    ...

This may seem trivial, but it gets more difficult as you have many different authentication methods, authorization schemes, and user roles.

This will probably involve some of the following:

Looking for common API route decorators and systems used by major Python web frameworks.
Using this information to determine if we're in a API route module.
Determining what "unusual" looks like in this case (e.g. missing login_required).
Performing heuristics, possibly with a configurable threshold, to make the judgement whether a finding is in fact unusual.

There's also some low-hanging fruit here, like just searching for existing "security-off" switches for web framework routes, like:

django.views.decorators.csrf.csrf_exempt
flask_wtf.csrf.exempt
rest_framework.permissions.AllowAny or permission_classes = []
Likely many more in well-known third-party packages...

DUO121: Why is tempfile.mkstemp (with "s") being avoided?

Hi!

Section Correct code at https://github.com/dlint-py/dlint/blob/master/docs/linters/DUO121.md#correct-code seems overly complicated to me. Could you elaborate why tempfile.mkstemp is being avoided?

Thanks and best, Sebastian

Add category alternation detection support to redos linter

os.EX_OK not available on Windows

FYI: os.EX_OK isn't available on Windows and fails when running flake8 --print-dlint-linters

Not sure what the appropriate substitute is, but I'd be happy to test any changes

dlint/dlint/extension.py

Line 46 in 828a156

sys.exit(os.EX_OK)

Consider whitelisting code execution linters if argument is constant string

Many of Dlint's checks looking for code execution bugs due to user input aren't insecure if its argument is a constant string. E.g. eval("2+2"). There are a few linters where this is the case:

$ rg 'constant string'
docs/linters/DUO120.md
36:* Code may be safe if data passed to `marshal` is a constant string

docs/linters/DUO119.md
36:* Code may be safe if data passed to `shelve` is a constant string

docs/linters/DUO106.md
41:* Code may be safe if data passed to `os.system` is a constant string

docs/linters/DUO103.md
36:* Code may be safe if data passed to `pickle` is a constant string

docs/linters/DUO104.md
35:* Code may be safe if data passed to `eval` is a constant string

docs/linters/DUO105.md
42:* Code may be safe if data passed to `exec` is a constant string

docs/linters/DUO110.md
43:* Code may be safe if data passed to `compile` is a constant string and limits data size

We should consider adding logic to these linters to prevent false positives when the argument is a constant string.

Deprecate DUO101

DUO101 is Python 3.3 only, and these versions are no longer supported by dlint.

Other than removing the md file in docs and the linter itself plus its tests, what are the other deprecation procedures?

Add linter for detecting open files that are never closed

Most file IO should be using with statements to automatically close files, but there still may be instances of manual open and close calls (or lack thereof). See Reading and Writing Files.

There are lots of ways a file can be opened:

open
os.open
io.open (alias for builtin open)
tempfile.TemporaryFile|NamedTemporaryFile|SpooledTemporaryFile
tarfile.open
ZipFile.open
Others?

My first idea for an implementation is tracking variable instantiation of the above methods, then checking for a lack of a close call in the same scope.

Also would be good to check for a lack of closing a connection after opening one (e.g. in SQLAlchemy). There's probably lots of opportunity for this in DB connection libraries.

Remove Python 2.7 support

https://pythonclock.org/

It's happening... To continue supporting only official supported versions of Python we will be removing 2.7 support in January 2020. This will allow the project to remove 2-3 compatibility code (e.g. __future__ imports, etc).

Before dropping 2.7 support there will be one final release of Dlint that can be used with 2.7. This means future enhancements, bug fixes, etc will not be released as 2.7-compatible. Because we haven't released a major version of Dlint yet we can simply bump the minor version (if we'd already released a major version this would probably warrant a semver major version bump).

Add linter for insecure xmltodict use

The xmltodict library is a widely used XML parsing module. We should check for insecure use of this library. A couple checks come to mind:

The most obvious, and easiest, would be checking for falsey values in the disable_entities kwarg.
A more interesting and in-depth check would be checking if defusedexpat is installed. The library checks for this like so:

try:
    from defusedexpat import pyexpat as expat
except ImportError:
    from xml.parsers import expat

This means if it's not installed then the library is wide open to various XML attacks similar to those prevented in defusedxml. Further, the defusedexpat library itself appears to be unmaintained, so there may be some insecurities we could search for there as well.

Add linter for XML calls allowing external entities (including DTD)

Per the Python documentation:

Changed in version 3.7.1: The SAX parser no longer processes general external entities by default to increase security. Before, the parser created network connections to fetch remote files or loaded local files from the file system for DTD and entities. The feature can be enabled again with method setFeature() on the parser object and argument feature_external_ges.

We should look for explicit enabling of the following features:

Enabling these features allows for XML XXE including DTD retrieval. We should detect usage of these features.

False positive in redos detection when nested quantifier mutually exclusive

"A group that contains a token with a quantifier must not have a quantifier of its own unless the quantified token inside the group can only be matched with something else that is mutually exclusive with it." (Nested Quantifiers)

Dlint does not currently eliminate safe regular expressions that have nested quantifiers but they're mutually exclusive. Consider the example from the above link:

$ python -m dlint.redos -p '(x\w{1,10})+y'
('(x\\w{1,10})+y', True)

Dlint finds the nested quantifier. But it flags the corrected code as well:

$ python -m dlint.redos -p '(x[a-wyz0-9_]{1,10})+y'
('(x[a-wyz0-9_]{1,10})+y', True)

This example is okay because there's no character overlap inside the nested quantifier. We should fix this false positive.

Remove use of deprecated ast classes and methods

Per the Python 3.8 release notes:

ast classes Num, Str, Bytes, NameConstant and Ellipsis are considered deprecated and will be removed in future Python versions. Constant should be used instead. (Contributed by Serhiy Storchaka in bpo-32892.)

ast.NodeVisitor methods visit_Num(), visit_Str(), visit_Bytes(), visit_NameConstant() and visit_Ellipsis() are deprecated now and will not be called in future Python versions. Add the visit_Constant() method to handle all constant nodes. (Contributed by Serhiy Storchaka in bpo-36917.)

This isn't immediately necessary, but it's worth tracking. It's not clear when this change will happen, so we don't need a definite timeline. If we can wait until Python 2.7 support is dropped (#17) then this change will be a bit easier since many of these classes/methods are used in 2.7. For example, Dlint makes use of ast.Str and ast.NameConstant.

Add 'with' variable instantiation detection to BadNameAttributeUseLinter

I.e. Context managers.

The BadNameAttributeUseLinter helper currently supports normal variable instantiation of an object, e.g.

def func():
    foo = Foo()
    bar = foo.bad_function()  # Caught by BadNameAttributeUseLinter
    print(bar)

However, the helper does not support instantiation via the with statement, e.g.

def func():
    with Foo() as foo:
        bar = foo.bad_function()  # Not caught by BadNameAttributeUseLinter, yet
        print(bar)

This pattern is not as common as normal variable instantiation, however, it is worth detecting. One of our initial reasons for adding this helper was to catch insecure behavior in tarfile and zipfile, and both of these libraries have a common pattern of instantiation via the with statement.

Let's add context manager variable instantiation support to BadNameAttributeUseLinter.

Add 'Dlint verbose' mode as a flake8 formatter option

Flake8 allows for custom formatters: Developing a Formatting Plugin for Flake8.

I think there's value in having an output mode where things are very dense, and there's one finding per line. This is the current formatter for flake8. It'd also be great if there was a verbose, multi-line mode that included additional information. E.g. the physical line that fired the rule and/or some amount of surrounding lines, a link to the rule's documentation, suggestions for a fix, etc.

I'm envisioning that we develop a flake8 custom formatter that keys off a rule's code and "pretty prints" some or all of the above information to the output buffer.

Add linter for insecure JWT usage in pyjwt

TODO: find all insecure example usage

https://github.com/jpadilla/pyjwt