Coder Social home page Coder Social logo

lid's Introduction

License Identifier

The purpose of this program, license_identifier, is to scan the source code files and identify the license text region and the type of license.

CircleCI

License

Copyright (c) 2019, The Linux Foundation. All rights reserved.

SPDX-License-Identifier: BSD-3-Clause

See License.txt for full license text.

Installation

Installation for end users and client applications

If you wish to install license_identifier as an end user, or if you are developing an application that depends on license_identifier, please install it as follows:

# Set up a virtualenv
virtualenv ENV
source ENV/bin/activate

# Get the latest versions of pip and setuptools:
pip install -U setuptools pip

pip install git+https://github.com/codeauroraforum/lid.git

At this point, you can test the installation by running, for example:

./ENV/bin/license-identifier -I path/to/source/files

Note for the developers who want to integrate this module into their code: The program reads all the license files when it begins - it takes a few seconds. For efficiency gain, I would recommend instantiating one instance, and running the analyze_input_path method.

Running under pypy for improved performance

You need a recent version of pypy (5.4.1 or later), only newer Ubuntu releases have a sufficiently new version available, e.g. Ubuntu 16.10 onwards. Otherwise you need to install pypy from http://pypy.org. For example, to install from the pypy.org binary:

mkdir /opt/pypy
wget -qO - https://bitbucket.org/pypy/pypy/downloads/pypy2-v5.6.0-linux64.tar.bz2 | tar -xvj -C /opt/pypy --strip-components=1
ln -s /opt/pypy/bin/pypy /usr/local/bin/pypy

Once pypy is installed on the system, the only change to the process above is to create the virtualenv specifying the correct interpreter:

# Set up a virtualenv
virtualenv -p pypy ENV
source ENV/bin/activate

Alternatively if you have pypy installed locally provide the full path to the interpreter.

# Set up a virtualenv
virtualenv -p /path_to_pypy_install/bin/pypy ENV
source ENV/bin/activate

Then follow the remaining instructions above to install LiD and dependencies into the environment.

You can also use the dockerfile provided to spin up a container with the correct dependencies installed.

Installation for project maintainers

If you wish to install license_identifier for development and testing, please follow the instructions in this section.

Please use virtualenv:

virtualenv ENV
source ENV/bin/activate
pip install -U setuptools pip  # get the latest versions of pip and setuptools

To install dependencies:

make deps

To update the licenses from the web:

make update-licenses  # OPTIONAL

To generate the default license library as a pickle file:

make pickle

To run tests:

tox

Usage

usage: license-identifier -I '/your/input/file/dir_or_file' -F 'easy_read'

optional arguments:
-T, --threshold     Set the threshold for similarity measure (ranging from 0 to 1, default value is 0.04)
-L, --license_folder    Specify the directory where the license text files are.
-I, --input_path    Specify the input path that needs scanning - to a file or a directory (when pointed to a directory, it considers subdirectories recursively)
-F, --output_format Specify the output format (options are 'csv', 'easy_read')
-O, --output_file_path Specify the output directory and prefix of the file name.  User name, date, time and '.csv' will be added to the file name automatically.  (a must for 'csv' file format option)
-P, --pickle_file_path Specify the file where all the n-gram objects will be stored for the future runs

There are four main modes:

# 1. Use the default pickled license library file (recommended)
license-identifier -I /path/to/source/code

# 2. Use a particular pickled license library file
license-identifier -P /path/to/pickled_licenses -I /path/to/source_code

# 3. Use a license directory without building a pickled file (please make sure license files have .txt extensions)
license-identifier -L /path/to/license_directory -I /path/to/source_code

# 4. Build a pickled file from the specified license directory
license-identifier -L /path/to/license_directory -P /path/to/output_pickled_licenses

Integration

To call LiD, first instantiate a LicenseIdentifier object, and then call one of the "analyze_" methods on a file/directory path.

lid = license_identifier.LicenseIdentifier(
        threshold = 0.07,
        run_in_parallel=False)
results = lid.analyze_input_path(path_to_files)

The results will be named a list of named tuples for each file, each named tuple representing a detected license in that file. The named tuple contains the following fields: input_fp - input file path matched_license - matched license type score - Score using whole input test start_line_ind - Start line number end_line_ind - End line number start_offset - Start byte offset end_offset - End byte offset region_score - Score using only the license text portion found_region - Found license text original_region - Matched license text without context

Adding Licenses

If you want to add more licenses, please create a text file with the license text. Then, save it into the ./data/license_dir/custom folder. Then, build the n-gram license library using make pickle.

lid's People

Contributors

craigez avatar dependabot[bot] avatar jesseporter avatar michellesenar avatar rashmichitrakar avatar zbristow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

lid's Issues

CSV output in Python 3 mixes bytes and strings

When using -Fcsv on Python 3, the resulting file is a mix of bytestrings and regular strings:

b'./LICENSE.md',Apache-2.0,0.9009905536934706,4,0,176,0,10143,1.0,"b'\r\n                                 Apache License\r\n ...'"

Install instructions are broken

It says to do:

pip install --process-dependency-links git+https://github.qualcomm.com/qosp/lid.git

but a) --process-dependency-links is not accepted by pip and b) github.qualcomm.com is not available.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.