Coder Social home page Coder Social logo

kucherenko / jscpd Goto Github PK

View Code? Open in Web Editor NEW
4.5K 25.0 196.0 9.85 MB

Copy/paste detector for programming source code.

License: MIT License

JavaScript 6.87% HTML 1.49% TypeScript 48.90% APL 6.98% Brainfuck 2.19% C 0.95% C++ 3.12% C# 3.36% Objective-C 2.58% Java 5.59% Scala 0.59% CoffeeScript 0.61% CSS 0.59% D 3.16% Dart 1.95% Erlang 1.97% Go 0.37% Haskell 3.90% Haxe 0.82% Perl 4.01%
detector quality copy-paste duplicates cpd code-quality duplications detect-duplications clones-detection

jscpd's Introduction

jscpd

stand with Ukraine

npm jscpd license npm

jscpd CI codecov FOSSA Status Backers on Open Collective Sponsors on Open Collective

NPM

Copy/paste detector for programming source code, supports 150+ formats.

Copy/paste is a common technical debt on a lot of projects. The jscpd gives the ability to find duplicated blocks implemented on more than 150 programming languages and digital formats of documents. The jscpd tool implements Rabin-Karp algorithm for searching duplications.

Packages of jscpd

name version description
jscpd npm main package for jscpd (cli and API for detections included)
@jscpd/core npm core detection algorithm, can be used for detect duplication in different environments, one dependency to eventemitter3
@jscpd/finder npm detector of duplication in files
@jscpd/tokenizer npm tool for tokenize programming source code
@jscpd/leveldb-store npm LevelDB store, used for big repositories, slower than default store
@jscpd/html-reporter npm Html reporter for jscpd
@jscpd/badge-reporter npm Badge reporter for jscpd

Installation

$ npm install -g jscpd

Usage

$ npx jscpd /path/to/source

or

$ jscpd /path/to/code

or

$ jscpd --pattern "src/**/*.js"

More information about cli here.

Programming API

For integration copy/paste detection to your application you can use programming API:

jscpd Promise API

import {IClone} from '@jscpd/core';
import {jscpd} from 'jscpd';

const clones: Promise<IClone[]> = jscpd(process.argv);

jscpd async/await API

import {IClone} from '@jscpd/core';
import {jscpd} from 'jscpd';
(async () => {
  const clones: IClone[] = await jscpd(['', '', __dirname + '/../fixtures', '-m', 'weak', '--silent']);
  console.log(clones);
})();

detectClones API

import {detectClones} from "jscpd";

(async () => {
  const clones = await detectClones({
    path: [
      __dirname + '/../fixtures'
    ],
    silent: true
  });
  console.log(clones);
})()

detectClones with persist store

import {detectClones} from "jscpd";
import {IMapFrame, MemoryStore} from "@jscpd/core";

(async () => {
  const store = new MemoryStore<IMapFrame>();

  await detectClones({
    path: [
      __dirname + '/../fixtures'
    ],
  }, store);

  await detectClones({
    path: [
      __dirname + '/../fixtures'
    ],
    silent: true
  }, store);
})()

In case of deep customisation of detection process you can build your own tool with @jscpd/core, @jscpd/finder and @jscpd/tokenizer.

Start contribution

  • Fork the repo kucherenko/jscpd
  • Clone forked version (git clone https://github.com/{your-id}/jscpd)
  • Install dependencies (yarn install)
  • Add your changes
  • Add tests and check it with yarn test
  • Create PR

Who uses jscpd

  • GitHub Super Linter is combination of multiple linters to install as a GitHub Action
  • Code-Inspector is a code analysis and technical debt management service.
  • Mega-Linter is a 100% open-source linters aggregator for CI (GitHub Action & other CI tools) or to run locally
  • Codacy automatically analyzes your source code and identifies issues as you go, helping you develop software more efficiently with fewer issues down the line.
  • Natural is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing.

Backers

Thank you to all our backers! 🙏 [Become a backer]

Sponsors

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

ga tracker

License

MIT © Andrey Kucherenko

jscpd's People

Contributors

alexhladin avatar avgerin0s avatar darthwade avatar dependabot[bot] avatar dmi3y avatar gitter-badger avatar greenkeeper[bot] avatar hata6502 avatar jsoref avatar juanj avatar killermoehre avatar kucherenko avatar lo1tuma avatar loveky avatar mannyluvstacos avatar massongit avatar metalbass avatar mickdekkers avatar milahu avatar nvuillam avatar pustovitdmytro avatar qyz avatar sebastienelet avatar sevans-ge avatar snyk-bot avatar sobolevn avatar soullivaneuh avatar waffle-with-pears avatar whtsky avatar xcatliu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jscpd's Issues

Unknown mode: [object Object]

Hi,

I am getting below error. It used to work fine before. Any help would be appreciated.

My config file

.cpd.yaml

path:
path of the source
languages:
javascript
htmlmixed # html mixed source like knockout.js templates
exclude:
"/*.min.js"
"
/*.mm.js"
reporter: json

Error

C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd\node_modules\codemirror
addon\runmode\runmode.node.js:100
if (!mfactory) throw new Error("Unknown mode: " + spec);
^
Error: Unknown mode: [object Object]
at Object.exports.getMode (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\j
scpd\node_modules\codemirror\addon\runmode\runmode.node.js:100:24)
at evalmachine.:19:28
at Object.exports.getMode (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\j
scpd\node_modules\codemirror\addon\runmode\runmode.node.js:101:10)
at Object.exports.runMode (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\j
scpd\node_modules\codemirror\addon\runmode\runmode.node.js:106:22)
at TokenizerCodeMirror.tokenize (C:\Users\c0agav1\AppData\Roaming\npm\node_mod
ules\jscpd\src\tokenizer\TokenizerCodeMirror.coffee:41:16)
at TokenizerCodeMirror.tokenize (C:\Users\c0agav1\AppData\Roaming\npm\node_mod
ules\jscpd\src\tokenizer\TokenizerCodeMirror.coffee:1:1)
at Strategy.detect (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd\sr
c\strategy.coffee:29:47)
at Detector.start (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd\src
\detector.coffee:10:15)
at jscpd.run (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd\src\jscp
d.coffee:106:26)
at Object. (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd
\src\cli\cli.coffee:34:10)
at after (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd\node_modules
\cli\cli.js:1009:18)
at Object.cli.main (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd\no
de_modules\cli\cli.js:1014:9)
at Object. (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd
\src\cli\cli.coffee:27:5)
at Object. (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd
\src\cli\cli.coffee:1:1)
at Module._compile (module.js:456:26)
at Object.loadFile (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd\no
de_modules\coffee-script\lib\coffee-script\register.js:16:19)
at Module.load (C:\Users\c0agav1\AppData\Roaming\npm\node_modules\jscpd\node_m

Sass comments are not ignored

Sass silent comments (//) are not ignored by jscpd.

Steps to reproduce

Create two .scss files

1.scss

// Copyright (c) 2015, salesforce.com, inc. All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
// Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
// Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
// Neither the name of salesforce.com, inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

foo { margin: 0; }

2.scss

// Copyright (c) 2015, salesforce.com, inc. All rights reserved.
//
// Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
// Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
// Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
// Neither the name of salesforce.com, inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

bar { padding: 0; }

Run jscpd

jscpd --files "*.*" --languages css --skip-comments

Expected

Found 0 exact clones with 0 duplicated lines in 0 files

Actual

info:    Found 1 exact clones with 7 duplicated lines in 2 files

    - 1.scss: 1-8
     2.scss: 1-8

Files without an extension cause JSCPD to fail

If a file without an extension, for example crontab is picked up by jscpd, the build fails.

Stack trace:

>> Error: Cannot read property '1' of null
Warning: Cannot read property '1' of null Use --force to continue.
TypeError: Cannot read property '1' of null
  at TokenizerFactory.makeTokenizer (/Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt-jscpd-tweaked/node_modules/jscpd/src/tokenizer/TokenizerFactory.coffee:28:5)
  at Strategy.detect (/Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt-jscpd-tweaked/node_modules/jscpd/src/strategy.coffee:16:5)
  at Detector.start (/Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt-jscpd-tweaked/node_modules/jscpd/src/detector.coffee:9:5)
  at jscpd.run (/Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt-jscpd-tweaked/node_modules/jscpd/src/jscpd.coffee:105:26)
  at Object.<anonymous> (/Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt-jscpd-tweaked/tasks/jscpd.js:63:16)
  at Object.<anonymous> (/Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt/lib/grunt/task.js:264:15)
  at Object.thisTask.fn (/Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt/lib/grunt/task.js:82:16)
  at Object.<anonymous> (/Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt/lib/util/task.js:301:30)
  at Task.runTaskFn (/Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt/lib/util/task.js:251:24)
  at Task.<anonymous> (/Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt/lib/util/task.js:300:12)
  at /Users/pezcuckow/Nginx/html/GroupVitals/ci/node_modules/grunt/lib/util/task.js:227:11
  at process._tickCallback (node.js:419:13)

An in-range update of jscpd is breaking the build 🚨

Version 0.6.10 of jscpd just got published.

Branch Build failing 🚨
Dependency jscpd
Current Version 0.6.9
Type devDependency

This version is covered by your current version range and after updating it in your project the build failed.

As jscpd is “only” a devDependency of this project it might not break production or downstream projects, but “only” your build or test tools – preventing new deploys or publishes.

I recommend you give this issue a high priority. I’m sure you can resolve this 💪


Status Details
  • bitHound - Dependencies No failing dependencies. Details

  • bitHound - Code No failing files. Details

  • continuous-integration/travis-ci/push The Travis CI build failed Details

  • coverage/coveralls First build on greenkeeper/jscpd-0.6.10 at 91.932% Details

Commits

The new version differs by 2 commits .

  • d0cd0d2 update(package.json) update version to 0.6.10
  • a849916 feat(#15) add --config option to cli runner

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.


Your Greenkeeper Bot 🌴

An in-range update of js-yaml is breaking the build 🚨

Version 3.8.0 of js-yaml just got published.

Branch Build failing 🚨
Dependency js-yaml
Current Version 3.7.0
Type dependency

This version is covered by your current version range and after updating it in your project the build failed.

As js-yaml is a direct dependency of this project this is very likely breaking your project right now. If other packages depend on you it’s very likely also breaking them.
I recommend you give this issue a very high priority. I’m sure you can resolve this 💪


Status Details
  • bitHound - Code No failing files. Details

  • bitHound - Dependencies No failing dependencies. Details

  • continuous-integration/travis-ci/push The Travis CI build failed Details

  • coverage/coveralls First build on greenkeeper/js-yaml-3.8.0 at 91.932% Details

Commits

The new version differs by 8 commits .

  • dddcb5f 3.8.0 released
  • 686fa0b Merge pull request #327 from SmartBear/master
  • 9bd2c10 Report "duplicated mapping key" errors at start of block
  • 1918a7f fix formatting of the example
  • ee14eab clarify usage of style option
  • f6bafed Support node 6.+ Buffer API when available
  • d37ec86 Deps bump (eslint, esprima)
  • 1497178 Travis: update tested node.js versions to 4, 6 & 7

See the full diff.

Not sure how things should work exactly?

There is a collection of frequently asked questions and of course you may always ask my humans.


Your Greenkeeper Bot 🌴

Feature: XML Stylesheet

XML output is great, no formatting needed but I'd like if it had a stylesheet attached, or allowed me to specify one to the XML reporter as an option.

Path from config file does not work

Given the config file:

path:
  - "src"
debug: true

The output path is path = undefined/src. It seems like the rest of the program treats this is a no-op and scans the entire tree from the cwd.

Ignore lines that match a pattern

For example, I'd like to ignore require statements and module.export statements since those are repetitive by nature.

Example, the following code is duplicated in two of my files, but I think that's perfectly fine:

var
    _ = require('underscore'),
    Backbone = require('backbone'),
    LayoutView = require('LayoutView'),
    Moment = require('moment'),

    appVent = require('appVent'),
    format = require('format'),
    pluginAPI = require('pluginAPI'),

    personalAccountAppsTemplate = require('personalAccountAppsTemplate'),
    personalAccountAppsListEmptyTemplate = require('personalAccountAppsListEmptyTemplate'),

Feature request: Create a netbeans plugin.

I use Netbeans for my daily work, I can use this plugin with grunt for JS but I have to use a lot other languages like C/C++ or Java or PHP so what I would like to have is a plugin for netbeans where I can have options and where I can detect duplicated code for one language or more languages inside one project or multiple projects, that would be great.

Invalid yaml config upgrading from 0.4.0 to 0.4.1

This config used to work fine with v0.4.0 until v0.4.1 was released just now:

path:
  - .
languages:
  - javascript
  - coffeescript
exclude:
  - "node_modules/**"
  - "test/**"
  - "public/lib/**"
output: "cpd.xml"
File .../src/.cpd.yml not found in current directory, or it is broken

XML, XSLT support

can I use it on XSLT or XML files ?
htmlmixed does not seems to work.

Thanx for hint

Twig Support

Twig support would be awesome. is there a way you can add it?

Getting more done in GitHub with ZenHub

Hola! @kucherenko has created a ZenHub account for the kucherenko organization. ZenHub is the only project management tool integrated natively in GitHub – created specifically for fast-moving, software-driven teams.


How do I use ZenHub?

To get set up with ZenHub, all you have to do is download the browser extension and log in with your GitHub account. Once you do, you’ll get access to ZenHub’s complete feature-set immediately.

What can ZenHub do?

ZenHub adds a series of enhancements directly inside the GitHub UI:

  • Real-time, customizable task boards for GitHub issues;
  • Multi-Repository burndown charts, estimates, and velocity tracking based on GitHub Milestones;
  • Personal to-do lists and task prioritization;
  • Time-saving shortcuts – like a quick repo switcher, a “Move issue” button, and much more.

Add ZenHub to GitHub

Still curious? See more ZenHub features or read user reviews. This issue was written by your friendly ZenHub bot, posted by request from @kucherenko.

ZenHub Board

.cpd.yaml not working (bad merge logic)

if you specify languages in .yaml file that will be ignored, and overridden by CLI params, and since CLI params have defaults it does not work.

see comments below,

prepareOptions = function(options, config) {

// options is from CLI, languages is full array with all languages
var key, optionsNew, value;
// this is ok merges defaults with specified on confif file, config file wins
optionsNew = _.extend(optionsPreprocessor["default"], config);

for (key in options) {
value = options[key];
if (!(value === null)) {
optionsNew[key] = value; // !!! BUG this overwrites config with CLI specified defaults
}
}
if (typeof optionsNew.languages === 'string') {
optionsNew.languages = optionsNew.languages.split(',');
}

optionsNew.extensions = TokenizerFactory.prototype.getExtensionsByLanguages(optionsNew.languages);
return optionsNew;
};

to fix this remove defaults from CLI ?
"languages": ['g', "list of languages which scan for duplicates, separated with comma", "string", Object.keys(TokenizerFactory.prototype.LANGUAGES).join(',')], and keep them only in

optionsPreprocessor["default"] = {
languages: Object.keys(TokenizerFactory.prototype.LANGUAGES),
verbose: false,
debug: false,
files: null,
exclude: null,
"min-lines": 5,
"min-tokens": 70
};

Erroneous matches in large project

I have a large javascript project, over 300k LOC. jscpd has been useful in finding a lot of cp. However, it erroneously reports cp for a few files that do not have any similarities. I can't create an example right now. But, I'm wondering if you have any suggestion for where I might start looking for the bug. Thank you.

More responsive output

I have less file with:

.some-class-1 {
  .mixin();
}


.some-class-2 {
  .mixin();
}

My tokens option is 10

Then I run jscpd, I get output:

- my.less  1-4
  my.less  5-8

.some-class-1 {
  .mixin();
}

it is difficult to understand what code duplicate.

May be split output for not equal tokens:

.some-clas...
  .mixin();
}

Or use ansi colors for highlighting token ?

JSX files

We are using jscpd for JS files, is there a way to get it to work with React's JSX files as well?

setting limit through config file is not working

given .cpd.yaml:

limit: 10

jscpd --debug outputs:

info:    ----------------------------------------
info:    Options:
info:    languages = javascript,typescript,jsx,haxe,coffeescript,ruby,php,python,css,sass,java,csharp,go,clike,htmlmixed,yaml,erlang,swift,xml,puppet,twig,vue
info:    verbose = false
info:    debug = true
info:    files = null
info:    exclude = null
info:    min-lines = 5
info:    min-tokens = 70
info:    limit = 50
info:    config_file = .cpd.yaml
info:    reporter = xml
info:    extensions = js,es,es6,ts,tsx,jsx,hx,hxml,coffee,rb,php,phtml,py,less,css,scss,java,cs,go,cpp,c,m,h,html,htm,yaml,yml,erl,erlang,swift,xml,xsl,xslt,pp,puppet,twig,vue
info:    path = ...
info:    patterns = **/*.+(js|es|es6|ts|tsx|jsx|hx|hxml|coffee|rb|php|phtml|py|less|css|scss|java|cs|go|cpp|c|m|h|html|htm|yaml|yml|erl|erlang|swift|xml|xsl|xslt|pp|puppet|twig|vue)
info:    ----------------------------------------

while jscpd --debug --limit 10 outputs:

info:    ----------------------------------------
info:    Options:
info:    languages = javascript,typescript,jsx,haxe,coffeescript,ruby,php,python,css,sass,java,csharp,go,clike,htmlmixed,yaml,erlang,swift,xml,puppet,twig,vue
info:    verbose = false
info:    debug = true
info:    files = null
info:    exclude = null
info:    min-lines = 5
info:    min-tokens = 70
info:    limit = 10
info:    config_file = .cpd.yaml
info:    reporter = xml
info:    extensions = js,es,es6,ts,tsx,jsx,hx,hxml,coffee,rb,php,phtml,py,less,css,scss,java,cs,go,cpp,c,m,h,html,htm,yaml,yml,erl,erlang,swift,xml,xsl,xslt,pp,puppet,twig,vue
info:    path = ...
info:    patterns = **/*.+(js|es|es6|ts|tsx|jsx|hx|hxml|coffee|rb|php|phtml|py|less|css|scss|java|cs|go|cpp|c|m|h|html|htm|yaml|yml|erl|erlang|swift|xml|xsl|xslt|pp|puppet|twig|vue)
info:    ----------------------------------------

other options seems like working as expected

How to configute extensions for language ?

This line seems to deny any chances to set file extensions from congif file.

optionsNew.extensions = TokenizerFactory.prototype.getExtensionsByLanguages(optionsNew.languages);

we need a option to parametrize language->extension map

Thanx

Node version warning

Installing jscpd plugin on newest node creates a warning:

npm WARN engine [email protected]: wanted: {"node":"~0.10.x"} (current: {"node":"0.12.7","npm":"2.14.2"})

Please update package.json file (and plugin if needed) so it's compatible.

Out of memory error

dougwade main/corvair ‹dbw-test-gulp-generate-components*› » jscpd -e target/ -e node_modules/
info:    jscpd - copy/paste detector for programming source code, developed by Andrey Kucherenko
info:    Preprocessors running time: durationMs=12123
info:    Scanning 37213 files for duplicates...

<--- Last few GCs --->

  715581 ms: Mark-sweep 1378.7 (1441.2) -> 1378.7 (1441.2) MB, 3159.8 / 0 ms [allocation failure] [GC in old space requested].
  718731 ms: Mark-sweep 1378.7 (1441.2) -> 1378.7 (1441.2) MB, 3150.1 / 0 ms [last resort gc].
  721964 ms: Mark-sweep 1378.7 (1441.2) -> 1378.7 (1441.2) MB, 3233.3 / 0 ms [last resort gc].


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x2f3b9cac9e59 <JS Object>
    2: generateMap [/usr/local/lib/node_modules/jscpd/lib/tokenizer/TokenizerCodeMirror.js:~84] [pc=0xfd58aa1a903] (this=0x5da95c5ee79 <a TokenizerCodeMirror with map 0x1606a80801>)
    3: detect [/usr/local/lib/node_modules/jscpd/lib/strategy.js:39] [pc=0xfd58a972ec7] (this=0x5da95c5efe1 <a Strategy with map 0xf0a72e716b1>,map=0x5da95c5ef61 <a Map with map 0xf0a72e72159>,file=0x255070004ad9 <S...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
[1]    3952 abort      jscpd -e target/ -e node_modules
dougwade main/corvair ‹dbw-test-gulp-generate-components*› » node -v                                                                                                        134 ↵
v6.2.2

'js' isn't recognized as language identifier

This command line from README doesn't work:

jscpd --path my_project/ --languages js

this does:

jscpd --path my_project/ --languages javascript

You should either update the docs or the code

Walk back through the history and collect stats

I was curious about the trend of duplication in a repo. I put together this half-baked shell script to walk through the history of a repo, run jscpd once a month, and output the results.

# how many months to go back
m=0

# first day of this month
d=$(date -v1d "+%Y-%m-%d")

# date of the first commit
end=$(git log --reverse --pretty=format:"%cd" --date=short | head -1 )

#while the date is after the first commit
while [[ $d > $end ]]; do

  #go back n months
  d=$(date -v1d -v "-${m}m" "+%Y-%m-%d")

  #find the latest commit from that date
  commit=$(git log --until="$d" -n 1 --pretty=format:"%H")

  #check out the commit, run jscpd, add summary of output to results.txt
  if [[ -n $commit ]]; then
    git checkout $commit
    summary=$(jscpd -f "src/**/*.js" -e  --languages javascript | tail -5 | head -1)
    echo $d $summary >> results.txt
  fi

  m=$((m+1))

done

Here's an example of the output it produces (written to results.txt):

2017-01-01 7.01% (1541 lines) duplicated lines out of 21979 total lines of code.
2016-12-01 7.23% (1582 lines) duplicated lines out of 21871 total lines of code.
2016-11-01 7.36% (1564 lines) duplicated lines out of 21247 total lines of code.
2016-10-01 8.31% (1550 lines) duplicated lines out of 18644 total lines of code.
2016-09-01 8.75% (1489 lines) duplicated lines out of 17012 total lines of code.
2016-08-01 9.59% (1518 lines) duplicated lines out of 15835 total lines of code.
2016-07-01 12.70% (2016 lines) duplicated lines out of 15874 total lines of code.
2016-06-01 13.01% (1874 lines) duplicated lines out of 14399 total lines of code.
2016-05-01 13.82% (1802 lines) duplicated lines out of 13038 total lines of code.
2016-04-01 2.28% (233 lines) duplicated lines out of 10218 total lines of code.
2016-03-01 4.53% (483 lines) duplicated lines out of 10666 total lines of code.
2016-02-01 0.81% (71 lines) duplicated lines out of 8816 total lines of code.
2016-01-01 0.80% (50 lines) duplicated lines out of 6268 total lines of code.
2015-12-01 0.23% (11 lines) duplicated lines out of 4710 total lines of code.
2015-11-01 0.27% (11 lines) duplicated lines out of 4087 total lines of code.
2015-10-01 2.31% (153 lines) duplicated lines out of 6634 total lines of code.
2015-09-01 0.00% (0 lines) duplicated lines out of 1644 total lines of code.
2015-08-01 0.00% (0 lines) duplicated lines out of 1644 total lines of code.
2015-07-01 0.00% (0 lines) duplicated lines out of 654 total lines of code.
2015-06-01 0.00% (0 lines) duplicated lines out of 517 total lines of code.
2015-05-01 0.00% (0 lines) duplicated lines out of 174 total lines of code.
2015-04-01 0.00% (0 lines) duplicated lines out of 156 total lines of code.
2015-03-01 0.00% (0 lines) duplicated lines out of 92 total lines of code.
2015-02-01 0.00% (0 lines) duplicated lines out of 92 total lines of code.
2015-01-01 0.00% (0 lines) duplicated lines out of 92 total lines of code.

At this point, I've satisfied my curiosity but I want to save that script somewhere in case I decide to use it again or develop it further. I figured here is a good of a place as any. Feel free to close this issue. Or, if you like, consider it a feature request to develop a more robust historical reporting command.

Got an error while using the output option in the .cpd.yaml file

I got the following error when I use the output option in the configuration file .cpd.yaml:

fs.js:427
  return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
                 ^
TypeError: path must be a string
  at Object.fs.openSync (fs.js:427:18)
  at Object.fs.writeFileSync (fs.js:966:15)
  at Report.generate ([...]\node_modules\jscpd\src\report.coffee:31:8)
  at jscpd.run ([...]\node_modules\jscpd\src\jscpd.coffee:112:23)
  at Object.<anonymous> ([...]\node_modules\jscpd\src\cli\cli.coffee:33:10)
  at after ([...]\node_modules\jscpd\node_modules\cli\cli.js:1003:18)
  at Object.cli.main ([...]\node_modules\jscpd\node_modules\cli\cli.js:1008:9)
  at Object.<anonymous> ([...]\node_modules\jscpd\src\cli\cli.coffee:26:5)
  at Object.<anonymous> ([...]\node_modules\jscpd\src\cli\cli.coffee:1:1)
  at Module._compile (module.js:456:26)
  at Object.loadFile ([...]\node_modules\jscpd\node_modules\coffee-script\lib\coffee-s
cript\register.js:16:19)
  at Module.load ([...]\node_modules\jscpd\node_modules\coffee-script\lib\coffee-scrip
t\register.js:45:36)
  at Function.Module._load (module.js:312:12)
  at Module.require (module.js:364:17)
  at require (module.js:380:17)
  at Object.<anonymous> ([...]\node_modules\jscpd\bin\jscpd:4:1)
  at Module._compile (module.js:456:26)
  at Object.Module._extensions..js (module.js:474:10)
  at Module.load (module.js:356:32)
  at Function.Module._load (module.js:312:12)
  at Function.Module.runMain (module.js:497:10)
  at startup (node.js:119:16)
  at node.js:901:3

I have tried to use relative paths:
"jscpd-report.xml"
or:
"./jscpd-report.xml"
and absolute paths like (on Windows):
"C:\Users\myuser\jscpd-report.xml"
and absolute paths without backslash:
"C:/Users/myuser/jscpd-report.xml"

All of them with the same result: TypeError: path must be a string.

Here, you can find the .cpd.yaml file used:

languages:
  - javascript
files:
  - "**/*.js"
exclude:
  - "**/*.min.js"
output:
  - "jscpd-report.xml"

Unknown option --reporter

Installing through npm and running your example in the folder where my files are:

jscpd --files **/*.js --exclude **/*.min.js --reporter json --output report.json

gives me the error

ERROR: Unknown option --reporter

Any ideas why?

What priority? Config or console options?

At now config rewrite console options.
I think more usability will be when options rewrite config.
Example:

  • I use my default values for tokens and lines in config file.
  • If I want to check some other values one time, I should edit config.

It is slow workflow ))

Compile coffeescript before publishing to npm

If the jscpd code would be compiled to javascript before publishing it to npm then you could make the coffee-script dependency a devDependency.

This would be especially helpful if you want to use jscpd programmatically (see #38).

Add a licence file

Great project! To allow for greater adoption, I think LICENSE file should be added. I would be happy to submit a pull request if you can let me know what license you would like to use.

Allow programmatic usage

Hi,

I would like to use jscpd programmatically where I can provide the file content instead of letting jscpd reading the file from the file-system.

Background of this is, that I want to check files retrieved via the github API.


The API could look like this:

var Detector = require('jscpd').Detector;

var detector = new Detector();

detector.addFile('foo.js', 'console.log("hello world")');
detector.addFile('bar.js', 'console.log("hello world")');

var result = detector.run();

Running from config file takes much longer than running with command line options

I'm trying to move from running on the command line to running with a config file. I've found that to run in debug with the command line, it takes about 1 second to figure out which files are in the scan. Using a config file OTOH takes forever. The options seem to be the same:

Using the command line:

jscpd --path src --languages=javascript --exclude='**/*main.js','**/*build.js','**/webspec/**' --files="**/*.js" --reporter=json --debug=true
info:    jscpd - copy/paste detector for programming source code, developed by Andrey Kucherenko
warn:    File src/.cpd.yaml not found in current directory, or it is broken
warn:    File src/.cpd.yml not found in current directory, or it is broken
{ languages: [ 'javascript' ],
  verbose: null,
  debug: true,
  files: '**/*.js',
  exclude: '**/*main.js,**/*build.js,**/webspec/**',
  path: 'src',
  reporter: 'json',
  'min-lines': 5,
  'min-tokens': 70,
  'skip-comments': null,
  output: null,
  'xsl-href': null,
  extensions: [ 'js', 'es', 'es6' ] }
info:    ----------------------------------------
info:    Options:
info:    languages = javascript
info:    verbose = null
info:    debug = true
info:    files = **/*.js
info:    exclude = **/*main.js,**/*build.js,**/webspec/**
info:    path = src
info:    reporter = json
info:    min-lines = 5
info:    min-tokens = 70
info:    skip-comments = null
info:    output = null
info:    xsl-href = null
info:    extensions = js,es,es6
info:    ----------------------------------------

Using an options file takes 60 seconds and ignores the src parameter (starts to scan node_modules, etc):

jscpd 
info:    jscpd - copy/paste detector for programming source code, developed by Andrey Kucherenko
info:    Used config from /........./.cpd.yaml
{ languages: [ 'javascript' ],
  verbose: null,
  debug: true,
  files: '**/*.js',
  exclude: '**/*main.js,**/*build.js,**/webspec/**',
  'min-lines': 5,
  'min-tokens': 70,
  'skip-comments': null,
  output: null,
  reporter: 'json',
  'xsl-href': null,
  path: 'src',
  extensions: [ 'js', 'es', 'es6' ] }
info:    ----------------------------------------
info:    Options:
info:    languages = javascript
info:    verbose = null
info:    debug = true
info:    files = **/*.js
info:    exclude = **/*main.js,**/*build.js,**/webspec/**
info:    min-lines = 5
info:    min-tokens = 70
info:    skip-comments = null
info:    output = null
info:    reporter = json
info:    xsl-href = null
info:    path = src
info:    extensions = js,es,es6
info:    ----------------------------------------

crypto package clashes with nodes internal crypto package

It seems that the crypto package you use to create md5 hashes clashes with the package which comes with node.js.

Since npm (>= v3.0.0 i think) installs all packages on the highest level possible, there is in the root node_modules directory the crypto package you use.

At least on our dev systems with devPackages installed (gulp-jscpd which usses jscpd) we can't use nodes own crypto package

Is it possible that you can use another package to create the md5 hashes (e.g. md5)

If have no idea how to solve this issue, only to not use the jscpd code checker. Or is there a way to require the node internal package (i didn't found any)?

Regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.