Coder Social home page Coder Social logo

metricsgrimoire / cvsanaly Goto Github PK

View Code? Open in Web Editor NEW
48.0 48.0 31.0 3.83 MB

The CVSAnalY tool extracts information out of source code repository logs and stores it into a database.

Home Page: http://metricsgrimoire.github.com/CVSAnalY/

License: GNU General Public License v2.0

Python 100.00%

cvsanaly's People

Contributors

andygrunwald avatar bolche avatar canasdiaz avatar dicortazar avatar gpoo avatar jgbarah avatar linzhp avatar maelick avatar rodrigokuroda avatar sduenas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cvsanaly's Issues

Error while running CVSAnaly with opensuse kernel-source

I face the following error while trying to analyse the kernel-source repository
[see git clone git://gitorious.org/opensuse/kernel-source.git]

zoumpis@linux-sn3j:~/.cvsanaly2/cache> cvsanaly2 --db-user root --db-password root --db-database cvsanaly_kernel_sources /home/zoumpis/git/kernel-source/
Parsing log for /home/zoumpis/git/kernel-source/ (git)
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 504, in run
self.__target(_self.__args, *_self.__kwargs)
File "/usr/local/lib/python2.7/site-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/DBTempLog.py", line 131, in __writer
cursor.executemany (statement ("INSERT into _temp_log (rev, date, object) values (?, ?, ?)", self.db.place_holder), commits)
File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 223, in executemany
r = self._query('\n'.join([query[:p], ',\n'.join(q), query[e:]]))
File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 316, in _query
rowcount = self._do_query(q)
File "/usr/lib64/python2.7/site-packages/MySQLdb/cursors.py", line 280, in _do_query
db.query(q)
OperationalError: (2006, 'MySQL server has gone away')

FYI

zoumpis@linux-sn3j:~> cvsanaly2 -V
2.1.0

Files which are never created or modified

There are many files which are never created. For example in Tomboy I have 5143 entries in the files table but only 4514 are actually references in actions (i.e. 629 which are never touched). Moreover only 3202 have been created (added or copied) at least one. Using a bigger repository like Evolution this becomes even more enormous: on 4941692 file entries, only 19672 have been created!

Here are the queries I've used:

  • SELECT COUNT(DISTINCT f.id) FROM files f, repositories r WHERE f.repository_id = r.id AND r.name = ?;
  • SELECT COUNT(DISTINCT f.id) FROM files f, repositories r, actions a WHERE a.file_id = f.id AND a.type IN ("a", "c") AND f.repository_id = r.id AND r.name = ?;

I have tried to find out what is the source of the problem while crawling through the code but I still didn't find the origin of the problem. In general there are too many entries created in files table (like the enormous number of entries in Evolution) and my intuition is that this might be related to branches. For example if a file is created in the master branch, then a new branch is created and the file modified in this new branch, then a new file entry will be created (and also one for each of the parent directories).

This might be related to issue #3 as I have also seen in Tomboy 5 files for which there are two entries associated to the same commits. For one of them, the file is renamed in a branch but was created in another one, thus new entries are created in files and file_links here, then a second file_links is created for the action of renaming here

Content-Extension: AttributeError: 'NoneType' object has no attribute 'strip'

If i run CVSAnalY with the git repo of CVSAnalY with this config:

db_driver = 'mysql'
db_user = 'root'
db_password = None
db_hostname = 'localhost'

extensions = ['Content']

and command ./cvsanaly2 -d cvsanaly_before -f ./config .

I got some errors:

./cvsanaly2 -d cvsanaly_before -f ./config .
Parsing log for . (git)
Warning: Detected empty branch 'travisci-readme', it'll be ignored
Warning: Detected empty branch 'pep8-pycvsanaly2', it'll be ignored
Warning: Detected empty branch 'travisci', it'll be ignored
Warning: Detected empty branch 'pep8-tests', it'll be ignored
Warning: Detected empty branch 'pep8-root', it'll be ignored
Warning: Detected empty branch 'libresoft-utils', it'll be ignored
Database looks empty, removing cache file /Users/andygrunwald/.cvsanaly2/cache/[email protected]:andygrunwald_CVSAnalY.git
Executing extensions
Executing extension FileTypes
Executing extension Content
Exception in thread Thread-6:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 808, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 761, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Volumes/HDD/Development/CVSAnalY.git/pycvsanaly2/extensions/Jobs.py", line 49, in _job_thread
    job.run(repo, repo_uri)
  File "/Volumes/HDD/Development/CVSAnalY.git/pycvsanaly2/extensions/Content.py", line 61, in run
    self.path = self.path.strip('/')
AttributeError: 'NoneType' object has no attribute 'strip'

Exception in thread Thread-4:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 808, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 761, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Volumes/HDD/Development/CVSAnalY.git/pycvsanaly2/extensions/Jobs.py", line 49, in _job_thread
    job.run(repo, repo_uri)
  File "/Volumes/HDD/Development/CVSAnalY.git/pycvsanaly2/extensions/Content.py", line 61, in run
    self.path = self.path.strip('/')
AttributeError: 'NoneType' object has no attribute 'strip'

RepositoryHandler not found

Upon running python setup.py install, I received this error:

"error: Could not find suitable distribution for Requirement.parse('repositoryhandler>=0.3')"

It was fixed when I manually installed repositoryhandler. One solution could be to make repositoryhandler listed as a prerequisite or be automatically installed.

CommitsLOCDet duplicates date in several executions

If you execute the CommitsLOCDet extension more than one time against the same database the data is duplicated. The extension should remove the previous data, maybe removing the tables and creating new ones.

Error runnig libcvsanaly2 generation script

Output error

Traceback (most recent call last):
  File "~/projects/git/libcvsanaly2/scripts/generations.py",
 line 3, in <module>
    import Numeric
ImportError: No module named Numeric

Run command

python generations.py --db-output-user root --db-output-password admin 
--db-output-hostname localhost --db-output-database git_scm_log 
--db-driver mysql --db-user root --db-password admin --db-hostname
 localhost --db-database git_scm_log --db-driver mysql

Environment

Changed code metric

Hi,
Is there an extension or a complementary tool for CVSAnalY to calculate the changed code metric (a.k.a. code churn - added, removed, and changed lines) for each file revision?

Warning: Detected empty branch 'master', it'll be ignored

If you extract the test/input.tar.gz and start a analysis on this via cvsanaly2, the output says that the master branch is empty:

./cvsanaly2 --config-file ./config --db-database cvsanaly_before /Users/andygrunwald/Development/tmp/input
Parsing log for /Users/andygrunwald/Development/tmp/input (git)
Warning: Detected empty branch 'master', it'll be ignored
Executing extensions
Executing extension FileTypes
Executing extension Metrics
...

In my understanding a master branch cant be empty if there are commits in it. Isnt it?
Or i am wrong?

Error message wrongly raised: "Invalid extension Weeks"

When installing CVSAnalY from scratch and not all of the libraries and dependencies are installed, for specific extensions ("Months" and "Weeks") the "Invalid extension Weeks" (or Months) is raised.

To re-try this bug, please do not install the library to connect python and sqlite: python-pysqlite2. Then if CVSAnalY is run with the following options: "cvsanaly2 --db-user=xxx --db-password=xxx --db-database=xxx --no-parse --extensions=Weeks this will not work.

Thus, there is an error and one improvement:

  • Error: In first place, change the message error (it does not make sense to use Invalid Extension when the actual error is another one
  • Improvement: Indicate in a list of dependencies also this library

Cache file seems to be mandatory

We're having errors with a corrupt cache file. A cache file shouldn't be mandatory, if it is not present, it should be updated by cvsanaly and never a mandatory requirement.

pycvsanaly2.DBContentHandler.CacheFileMismatch:
Cache file /home/owl/.cvsanaly2/cache/https:__github.com_openstack-infra_devstack-gate.git 
is not up to date or it's corrupt: Commit id mismatch for revision cbb031a3fd0cb0b30d4e901a4a5313b1e91c333a (File Cache:25796, Database: 74265). 
It's not possible to continue, the cache file should be removed and the database cleaned up

Make it possible to run CVSAnalY in parallel

With the current version of CVSAnaly you can only run the tool once per database at the same time.
But you can analyze various repositories in one database.

For large analysis suites and a message driven and scalable systems it might be useful to run several cvsanaly process at the same time.
I got a real world use case for this.

One problem is the creation of a temp table with a fixed name. We should add a random postfix to the table name in such way to enable parallel processing.
I do not know if there are more bottlenecks. This has to be researched.

Mercurial support

Is there any support for mercurial repositories, or planning to implement it?

Crash while building filepaths

Running cvsanaly against PETALS repository, we get the next error

cvsanaly2 -u root -p root -d petals_cvsanaly --extensions=Metrics --metrics-all https://anonymous:[email protected]/svnroot/trunk/product/dev/prod/petals/`

Traceback (most recent call last):
  File "/usr/local/bin/cvsanaly2", line 5, in <module>
    pkg_resources.run_script('cvsanaly2==2.1.0', 'cvsanaly2')
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 499, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1235, in run_script
    execfile(script_filename, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/EGG-INFO/scripts/cvsanaly2", line 37, in <module>
    retval = pycvsanaly2.main.main (sys.argv[1:])
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/main.py", line 334, in main
    emg.run_extensions (repo, path or uri, db)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/ExtensionsManager.py", line 84, in run_extensions
    self.run_extension (name, extension, repo, uri, db)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/ExtensionsManager.py", line 56, in run_extension
    extension.run (repo, uri, db)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/extensions/Metrics.py", line 900, in run
    relative_path = fr.get_path ()
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/extensions/FileRevs.py", line 99, in get_path
    relative_path = self.fp.get_path (file_id, commit_id, self.repoid).strip ("/")
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/extensions/FilePaths.py", line 150, in get_path
    path = self.__build_path (file_id, adj)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/extensions/FilePaths.py", line 138, in __build_path
    id = adj.adj[id]
KeyError: 2638L

Error when using MySQL >=5.5 and creating tables using Myisam by default

By default CVSAnalY is creating main tables (actions, scmlog, etc) with the MyISAM engine.

If you are running extensions such as FileTypes, those are using the engine by default in the system. And this means that in MySQL >= 5.5, this won't properly work and an error like this one will be raised:

ERROR 1005 (HY000): Can't create table 'XXX' (errno: 150).

Thus, it is necessary to specifically indicate the engine in the extensions.

Using a Git library?

CVSAnalY is using repositoryhandler to parse the output of git command. My impression is that the output format of git command is slightly different in different version of git, which may break CVSAnalY. This post lists several Python Git modules. They may be more independent on the output format of git command.

Metrics extension fails on input.tar.gz analysis

If you extract the test/input.tar.gz and start a cvsanaly analysis on this with enabled Metrics extension, there is an error shown:

./cvsanaly2 --config-file ./config --db-database cvsanaly_before /Users/andygrunwald/Development/tmp/input
Parsing log for /Users/andygrunwald/Development/tmp/input (git)
Warning: Detected empty branch 'master', it'll be ignored
Executing extensions
Executing extension FileTypes
Executing extension Metrics
Error obtaining aaa/otherthing.renamed@51a3b654f252210572297f47597b31527c475fb8. Command ['git', 'show', u'51a3b654f252210572297f47597b31527c475fb8:aaa/otherthing.renamed'] returned 128 (fatal: Path 'aaa/otherthing.renamed' exists on disk, but not in '51a3b654f252210572297f47597b31527c475fb8'.
)

GNOME gbook cache file mismatch

Experienced the following error when extracting GNOME's gbook using last CVSAnalY version:

> git clone git://git.gnome.org/archive/gbook
> cd gbook
> cvsanaly2 -u <user> -p <pass> -d <db>
Parsing log for gbook (git)
Traceback (most recent call last):
  File "/usr/local/bin/cvsanaly2", line 5, in <module>
    pkg_resources.run_script('cvsanaly2==2.1.0', 'cvsanaly2')
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 505, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1245, in run_script
    execfile(script_filename, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/EGG-INFO/scripts/cvsanaly2", line 37, in <module>
    retval = pycvsanaly2.main.main (sys.argv[1:])
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/main.py", line 344, in main
    parser.end ()
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/Parser.py", line 61, in end
    self.handler.end ()        
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/DBProxyContentHandler.py", line 67, in end
    self.db_handler.repository (self.repo_uri)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/DBContentHandler.py", line 137, in repository
    raise CacheFileMismatch (msg)
pycvsanaly2.DBContentHandler.CacheFileMismatch: Cache file ~/.cvsanaly2/cache/git:__git.gnome.org_archive_gbook is not up to date or it's corrupt: Cache file cannot be foundIt's not possible to continue, the database should be cleaned up
´´´

Error runnig libcvsanaly2 activity script

Output error

Traceback (most recent call last):
  File "~/projects/git/libcvsanaly2/scripts/activity.py",
 line 52, in <module>
    gfx.render ("activity.png", 600, 400, dataset, options)
  File "~/projects/git/libcvsanaly2/libcvsanaly2/Graphs/PychaGraph.py",
 line 37, in render
    gfx = self.chart (surface, options)
  File "/usr/local/lib/python2.7/dist-packages/pycha-0.6.0-py2.7.egg/pycha/line.py",
 line 25, in __init__
    super(LineChart, self).__init__(surface, options, debug)
  File "/usr/local/lib/python2.7/dist-packages/pycha-0.6.0-py2.7.egg/pycha/chart.py",
 line 56, in __init__
    self.options.merge(options)
  File "/usr/local/lib/python2.7/dist-packages/pycha-0.6.0-py2.7.egg/pycha/chart.py",
 line 785, in merge
    self[key].merge(other[key])
  File "/usr/local/lib/python2.7/dist-packages/pycha-0.6.0-py2.7.egg/pycha/chart.py",
 line 782, in merge
    for key, value in other.items():
AttributeError: 'str' object has no attribute 'items'

Run command

python activity.py --db-user root --db-password admin
 --db-hostname localhost --db-database git_scm_log
 --db-driver mysql

Environment

Error runnig libcvsanaly2 ltools script

Output error

Traceback (most recent call last):
  File "~/projects/git/libcvsanaly2/scripts/ltools.py",
 line 50, in <module>
  gfx.render ("ltools-top-commits.png", 500, 400, dataset, options)
  File "~/projects/git/libcvsanaly2/libcvsanaly2/Graphs/PychaGraph.py",
 line 37, in render
    gfx = self.chart (surface, options)
  File "/usr/local/lib/python2.7/dist-packages/pycha-0.6.0-py2.7.egg/pycha/bar.py",
 line 26, in __init__
    super(BarChart, self).__init__(surface, options, debug)
  File "/usr/local/lib/python2.7/dist-packages/pycha-0.6.0-py2.7.egg/pycha/chart.py",
 line 56, in __init__
    self.options.merge(options)
  File "/usr/local/lib/python2.7/dist-packages/pycha-0.6.0-py2.7.egg/pycha/chart.py",
 line 785, in merge
    self[key].merge(other[key])
  File "/usr/local/lib/python2.7/dist-packages/pycha-0.6.0-py2.7.egg/pycha/chart.py",
 line 782, in merge
    for key, value in other.items():
AttributeError: 'str' object has no attribute 'items'

Run command

python ltools.py --db-user root --db-password admin
 --db-hostname localhost --db-database git_scm_log --db-driver mysql

Environment

Error with git version "1.8.4.rc3"

When running current (fresh from git repo) CVSAnalY with git version "1.8.4.rc3" (current in Debian testing) I get:

Parsing log for . (git)
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 808, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 761, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/jgb/src/MetricsGrimoire/CVSAnalY/pycvsanaly2/Log.py", line 62, in _logreader
    repo.log (self.uri or repo.get_uri ())
  File "/home/jgb/src/MetricsGrimoire/RepositoryHandler/repositoryhandler/backends/git.py", line 259, in log
    major, minor, micro, extra = self._get_git_version()
  File "/home/jgb/src/MetricsGrimoire/RepositoryHandler/repositoryhandler/backends/git.py", line 107, in _get_git_version
    self.git_version = tuple([int(i) for i in version.split('.')])
ValueError: invalid literal for int() with base 10: 'rc3\n'

Running git --version I get:

git version 1.8.4.rc3

My guess is that CVSAnalY doesn't like characters as a part of the git version id...

Error runnig libcvsanaly2 commits-types script

Output error

Traceback (most recent call last):
  File "~/projects/git/libcvsanaly2/scripts/commits-types.py",
 line 15, in <module>
    result = store.execute (query)
  File "/usr/lib/python2.7/dist-packages/storm/store.py",
 line 108, in execute
    return self._connection.execute(statement, params, noresult)
  File "/usr/lib/python2.7/dist-packages/storm/databases/mysql.py",
 line 106, in execute
    return Connection.execute(self, statement, params, noresult)
  File "/usr/lib/python2.7/dist-packages/storm/database.py",
 line 238, in execute
    raw_cursor = self.raw_execute(statement, params)
  File "/usr/lib/python2.7/dist-packages/storm/database.py",
 line 322, in raw_execute
    self._check_disconnect(raw_cursor.execute, *args)
  File "/usr/lib/python2.7/dist-packages/storm/database.py",
 line 371, in _check_disconnect
    return function(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py",
 line 174, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py",
 line 36, in defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.ProgrammingError: 
    (1146, "Table 'git_scm_log.file_types' doesn't exist")

Run command

python commits-types.py --db-user root --db-password
 admin --db-hostname localhost --db-database git_scm_log
 --db-driver mysql

Environment

Error when running some git repo

Hi,

When I tried to analyze a git repository, I faced the following error.


Parsing log for webkit-efl/ (git)
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(_self.__args, *_self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/DBTempLog.py", line 140, in __writer
cursor.executemany (statement ("INSERT into _temp_log (rev, date, object) values (?, ?, ?)", self.db.place_holder), commits)
File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 223, in executemany
r = self._query('\n'.join([query[:p], ',\n'.join(q), query[e:]]))
File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 316, in _query
rowcount = self._do_query(q)
File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 280, in _do_query
db.query(q)
OperationalError: (2006, 'MySQL server has gone away')


How can I analyze? (I used the latest version of CVSAnalY2 and other repositories are analyzed and fine.)
Git URL : review.tizen.org/framework/web/webkit-efl.git

Create a new release of CVSAnaly: 2.2.0 / 3.0.0

Hey,

the last release (2.1.0) is long time ago (ca. 2 years).
I would like to release a new (stable) version and upload it to pip (see #76).
The same i would like to suggest for RepositoryHandler.

I like to use a stable version in production and not the current master.

What do you think?
What steps are necessary to release a new version? Raise the version number, tagging in git and upload it to PyPi?
I can take care of it.

Prob Parsing repository from MediaWiki

acs@lenovix:/tmp$ git clone https://gerrit.wikimedia.org/r/p/mediawiki/extensions/cldr
acs@lenovix:~/devel/CVSAnalY$ ./cvsanaly2 -g -u root -p xxxx -d acs_cvsanaly_mediawiki_1598 /tmp/cldr/
BG: SELECT max(id) from tag_revisions
Parsing log for /tmp/cldr/ (git)
['git', 'log', '--all', '--topo-order', '--pretty=fuller', '--parents', '--name-status', '-M', '-C', '--decorate=full', 'origin']
DBG: Commit 5eeb30800653f70656414b4a4c472586686a5fce tagged as '2013.05'
Traceback (most recent call last):
File "./cvsanaly2", line 37, in
retval = pycvsanaly2.main.main (sys.argv[1:])
File "/home/acs/devel/CVSAnalY/pycvsanaly2/main.py", line 328, in main
reader.start (new_line, (parser, writer))
File "/home/acs/devel/CVSAnalY/pycvsanaly2/Log.py", line 92, in start
self._read_from_repository (new_line_cb, user_data)
File "/home/acs/devel/CVSAnalY/pycvsanaly2/Log.py", line 78, in _read_from_repository
new_line_cb (line, user_data)
File "/home/acs/devel/CVSAnalY/pycvsanaly2/main.py", line 320, in new_line
parser.feed (line)
File "/home/acs/devel/CVSAnalY/pycvsanaly2/Parser.py", line 56, in feed
self._parse_line (line)
File "/home/acs/devel/CVSAnalY/pycvsanaly2/GitParser.py", line 202, in _parse_line
self.branch.set_tail (git_commit)
AttributeError: 'NoneType' object has no attribute 'set_tail'

Error when running "Months" extension for more than one project in the same database

Months extension basically takes the minimum and maximum date of a project and and insert a row per month (between that minimum and maximum).

However, when this extension is run for several repositories in the same database (imagine this is run for the list of projects at Metrics Grimoire) and the Months extension is run for all of them, a crash will be raised given that the extension will try to fill the months table over and over (one per project). Thus, there is a problem with the integrity of the Primary Key used (In most of the cases there is an overlap).

So, not sure if this should work in this way, or perhaps, clearly specify that this should be run only at the end of the analysis of the whole list of projects...

My perception is that the very extension should be aware of the previous existence (or not) of the table "months", and fill extra months if needed.

Comments?

Useful Query in Queries.md is incorrect

What is the intention of the queries found in section 44?

The explanation as follows: Project size in LOC / SLOC per unit of time leads me to believe that the goal of the query is to finding the TOTAL LOC/SLOC of the project over time. The provided query only produces the LOC/SLOC of the files committed in a given period and does not take into account that files may be created and deleted in that period which I do not think is correct.

2 Errors when running CVSAnalY

I need your help...

  • for GIT

Git URL : review.tizen.org/framework/uifw/elementary.git


Parsing log for framework/uifw/elementary (git)
Traceback (most recent call last):
File "/usr/local/bin/cvsanaly2", line 5, in
pkg_resources.run_script('cvsanaly2==2.1.0', 'cvsanaly2')
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 499, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1235, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/EGG-INFO/scripts/cvsanaly2", line 37, in
retval = pycvsanaly2.main.main (sys.argv[1:])
File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/main.py", line 328, in main
reader.start (new_line, (parser, writer))
File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/Log.py", line 92, in start
self._read_from_repository (new_line_cb, user_data)
File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/Log.py", line 78, in _read_from_repository
new_line_cb (line, user_data)
File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/main.py", line 320, in new_line
parser.feed (line)
File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/Parser.py", line 56, in feed
self._parse_line (line)
File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/GitParser.py", line 202, in _parse_line
self.branch.set_tail (git_commit)
AttributeError: 'NoneType' object has no attribute 'set_tail'


  • for SVN

SVN UTL : http://svn.enlightenment.org/svn/e/trunk


Parsing log for http://svn.enlightenment.org/svn/e/trunk (svn)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in *bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(_self.__args, _self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/Log.py", line 62, in _logreader
repo.log (self.uri or repo.get_uri ())
File "/usr/local/lib/python2.7/dist-packages/repositoryhandler/backends/svn.py", line 278, in log
self._run_command(command, LOG)
File "/usr/local/lib/python2.7/dist-packages/repositoryhandler/backends/svn.py", line 133, in _run_command
Repository._run_command(self, command, type, input)
File "/usr/local/lib/python2.7/dist-packages/repositoryhandler/backends/__init
.py", line 168, in _run_command
raise RepositoryCommandError(e.cmd, e.returncode, e.error)
RepositoryCommandError: Command '['svn', '-v', 'log', 'http://svn.enlightenment.org/svn/e/trunk']' returned non-zero exit status 1


Failed to generate Metrics on git repository

Hello I am trying to generate all metrics of an eclipse git project.

I am executing the following command:

cvsanaly2 -u root -p password --metrics-all

This returns the following output:

Parsing log for /root/org.eclipse.e4.tools (git)
Warning: Detected empty branch 'pwebster/start_421', it'll be ignored
Warning: Detected empty branch 'integration', it'll be ignored
Warning: Detected empty branch 'R4_1_maintenance', it'll be ignored
Executing extensions

This executes in around 5 seconds and when I check my DB there does not seem to be any metric data being generated.

Am I using the wrong command or is this broken? I have looked for documentation on this and there doesn't seem to be much.

Thanks in advance for your help!

[Git] Timezone of author are not respected

During some testing of pull requests of @linzhp i recognized that the time zone of commits are not respected.
The git log of input.tar.gz in test/ directory shows something like:

commit 456a68ee1407a77f3e804a30dff245bb6c6b872f
Merge: ce8e0b8 51a3b65
Author: Zhongpeng Lin (林中鹏) <[email protected]>
Date:   Tue Feb 11 22:10:39 2014 -0800

    Merge branch 'lzp'

    Conflicts:
        aaa/otherthing

commit 51a3b654f252210572297f47597b31527c475fb8
Author: Zhongpeng Lin (林中鹏) <[email protected]>
Date:   Tue Feb 11 22:09:26 2014 -0800

    modify aaa/otherthing

commit ce8e0b86a1e9877f42fe9453ede418519115f367
Author: Zhongpeng Lin (林中鹏) <[email protected]>
Date:   Tue Feb 11 22:07:49 2014 -0800

    rename aaa/otherthing

commit 589bb080f059834829a2a5955bebfd7c2baa110a
Author: Eduardo Morais <[email protected]>
Date:   Tue Aug 14 15:04:01 2012 -0300

    Create "deeply" nested file
...

The scmlog table only stores 2014-02-11 22:10:39 (see last commit), without the timezone.

I my opinion, the timezone should be stored, too.
This information can be used for analysis how distributed the development of a repository is.
What do you think?

AttributeError: MetricsJob instance has no attribute 'measures'

I got this error when running CVSAnaly with Metrics to a repository:

File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/extensions/Metrics.py", line 665, in get_measures
return self.measures
AttributeError: MetricsJob instance has no attribute 'measures'

Error in CommitsLOC extension

This error is occurring during extensions execution, specifically the CommitsLOC.

$ cvsanaly2 --debug --writable-path=aries --save-logfile=aries/logfile --db-user=USER --db-password=PASS 
  --db-database=aries --metrics-all --extensions=CommitsLOC,CommitsLOCDet,FileTypes,Months,Patches,Weeks 
  https://svn.apache.org/repos/asf/aries/
DBG: INSERT INTO repositories (id, uri, name, type) values (%s, %s, %s, %s)
Parsing log for https://svn.apache.org/repos/asf/aries (svn)
['svn', '-v', 'log', 'https://svn.apache.org/repos/asf/aries']
DBG: SVN Parser: File /aries/trunk/subsystem/subsystem-core has been copied to /aries/tags/org.apache.aries.subsystem.core-1.1.0
DBG: SVN Parser: File /aries/trunk/subsystem/subsystem-core has been copied to /aries/tags/org.apache.aries.subsystem.core-1.0.2

[... omitted logs ...]

Executing extensions
Executing extension Patches
DBG: SELECT id from repositories where uri = %s
Executing extension Months
Executing extension CommitsLOC
DBG: SELECT id from repositories where uri = %s
DBG: SELECT id, rev, composed_rev from scmlog where repository_id = %s
['svn', 'diff', '-r', '817816:817817', 'https://svn.apache.org/repos/asf']
DBG: INSERT INTO commits_lines (id, commit_id, added, removed) values (%s, %s, %s, %s)
['svn', 'diff', '-r', '819516:819517', 'https://svn.apache.org/repos/asf']
DBG: INSERT INTO commits_lines (id, commit_id, added, removed) values (%s, %s, %s, %s)
['svn', 'diff', '-r', '819894:819895', 'https://svn.apache.org/repos/asf']
DBG: INSERT INTO commits_lines (id, commit_id, added, removed) values (%s, %s, %s, %s)
['svn', 'diff', '-r', '820086:820087', 'https://svn.apache.org/repos/asf']
DBG: INSERT INTO commits_lines (id, commit_id, added, removed) values (%s, %s, %s, %s)
['svn', 'diff', '-r', '820169:820170', 'https://svn.apache.org/repos/asf']
DBG: INSERT INTO commits_lines (id, commit_id, added, removed) values (%s, %s, %s, %s)
['svn', 'diff', '-r', '820226:820227', 'https://svn.apache.org/repos/asf']
DBG: INSERT INTO commits_lines (id, commit_id, added, removed) values (%s, %s, %s, %s)
['svn', 'diff', '-r', '820270:820271', 'https://svn.apache.org/repos/asf']

Error running extension Patches: 'Cursor' object has no attribute 'execut'

But the svn diff command is successfully executed.

svn diff -r 820270:820271 https://svn.apache.org/repos/asf
[... omitted a very large output  ...]

I note that the svn diff output is large and it may be causing this error.

After the error occurring, I try to CRTL+C to exit from process, but another possible error happens:

^CTraceback (most recent call last):
  File "/usr/local/bin/cvsanaly2", line 5, in <module>
    pkg_resources.run_script('cvsanaly2==2.1.0', 'cvsanaly2')
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 528, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1394, in run_script
    execfile(script_filename, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/EGG-INFO/scripts/cvsanaly2", line 37, in <module>
    retval = pycvsanaly2.main.main (sys.argv[1:])
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/main.py", line 382, in main
    emg.run_extensions(repo, path or uri, db)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/ExtensionsManager.py", line 105, in run_extensions
    self.run_extension(name, extension, repo, uri, db)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/ExtensionsManager.py", line 66, in run_extension
    extension.run(repo, uri, db)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/extensions/CommitsLOC.py", line 305, in run
    (added, removed) = counter.get_lines_for_revision(revision)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/extensions/CommitsLOC.py", line 115, in get_lines_for_revision
    printerr("Error running svn diff command: %s", (str(e)))
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/utils.py", line 111, in printerr
    str = str % tuple(to_utf8(arg) for arg in args)
TypeError: not all arguments converted during string formatting

File with two paths in the same commit

Analyzing the CVSAnalY repository, I've found a file that has two different paths in the same commit. The action applied to that file was a MOVE. I think this is an error because the path must be stored only once per each file.

I haven't check this behaviour in other repositories.

$ git clone git://github.com/MetricsGrimoire/CVSAnalY.git /tmp/cvsanaly
$ cvsanaly2 -g -u ***** -p ***** -d cvsanaly_test /tmp/cvsanaly
mysql> select * from file_links where file_id = 204  and commit_id <= 453 order by commit_id;
+-----+-----------+---------+-----------+---------------------------------------+
| id  | parent_id | file_id | commit_id | file_path                             |
+-----+-----------+---------+-----------+---------------------------------------+
| 249 |       176 |     204 |        85 | pycvsanaly/FindProgram.py             |
| 250 |       200 |     204 |        85 | pycvsanaly/libcvsanaly/FindProgram.py |
+-----+-----------+---------+-----------+---------------------------------------+

mysql> select * from actions where commit_id = 85 and file_id = 204;
+-----+------+---------+-----------+-----------+
| id  | type | file_id | commit_id | branch_id |
+-----+------+---------+-----------+-----------+
| 453 | V    |     204 |        85 |         2 |
+-----+------+---------+-----------+-----------+

Add support for a table prefix

If you use CVSAnaly in a database where tables already exists, the table listing of the database looks a little bit confusing.
Because you cant see the tables of CVSAnaly with one look.

Second if you got already a table with a name (e.g. scmlog) you got a table name conflict.

Due to this "problems" it might be useful to configure a table prefix.
This table prefix will add to every table / view.

E.g.:
Table-Prefix cvsanaly_
Tablenames: scmlog => cvsanaly_scmlog, repositories => cvsanaly_repositories, ...

extensions doesn't seem to work with svn

Hello,

I'm using cvsanaly to parse the logs of one subversion repository, and I tried to run differents extensions (CommitLOC and Metric), they both run at infinite, with no error message. I tried with subset of my repo and they run continuously whatever the size of the repo.

Problem with added and removed lines by authors in project Tempest

I found a problem when analyzing the project Tempest. The author with the most commits appears without added and removed lines. But if you see the commits on Github the author does add and remove lines.

Info:

mysql> use openstack_tempest_cvsanaly;

mysql> select * from people where id = 8;
+----+---------+------------------------------+
| id | name | email |
+----+---------+------------------------------+
| 8 | Jenkins | [email protected] |
+----+---------+------------------------------+

mysql> SELECT count() FROM commits_lines WHERE id = ANY (SELECT id FROM scmlog WHERE author_id = 8);
+----------+
| count(
) |
+----------+
| 771 |
+----------+

mysql> SELECT SUM(added) FROM commits_lines WHERE id = ANY (SELECT id FROM scmlog WHERE author_id = 8);
+------------+
| SUM(added) |
+------------+
| 0 |
+------------+

mysql> SELECT SUM(removed) FROM commits_lines WHERE id = ANY (SELECT id FROM scmlog WHERE author_id = 8);
+--------------+
| SUM(removed) |
+--------------+
| 0 |
+--------------+

Commit on Github by this author:

Merge "Test that Heat max_template_size is applied"
Jenkins authored 5 days ago
Showing 2 changed files with 48 additions and 0 deletions.
URL: openstack/tempest@fcbca58

Error Path does not exist with Metrics extension

I tried Metrics extension with a simple test repository (test repository in issue 64 thread: https://github.com/MetricsGrimoire/CVSAnalY/blob/master/tests/input.tar.gz ) and got this error

Error obtaining bbb/something@c0d66f9. Command ['git', 'show', u'c0d66f92a95e31c77be08dc9d0f11a16715d1885:bbb/something'] returned 128 (fatal: Path 'bbb/something' does not exist in 'c0d66f92a95e31c77be08dc9d0f11a16715d1885'
)

Using Metrics with a large repository caused lots of similar errors.

Error runnig libcvsanaly2 tops script

Output error

Traceback (most recent call last):
  File "~/projects/git/libcvsanaly2/scripts/tops.py",
 line 41, in <module>
    result = store.execute (query)
  File "/usr/lib/python2.7/dist-packages/storm/store.py",
 line 108, in execute
    return self._connection.execute(statement, params, noresult)
  File "/usr/lib/python2.7/dist-packages/storm/databases/mysql.py",
 line 106, in execute
    return Connection.execute(self, statement, params, noresult)
  File "/usr/lib/python2.7/dist-packages/storm/database.py",
 line 238, in execute
    raw_cursor = self.raw_execute(statement, params)
  File "/usr/lib/python2.7/dist-packages/storm/database.py",
 line 322, in raw_execute
    self._check_disconnect(raw_cursor.execute, *args)
  File "/usr/lib/python2.7/dist-packages/storm/database.py",
 line 371, in _check_disconnect
    return function(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py",
 line 174, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py",
 line 36, in defaulterrorhandler
    raise errorclass, errorvalue
_mysql_exceptions.OperationalError:
 (1054, "Unknown column 's.committer' in 'where clause'")

Run command

python tops.py --db-user root --db-password admin
 --db-hostname localhost --db-database git_scm_log --db-driver mysql

Environment

Error runnig libcvsanaly2 fm3 script

Output error

Traceback (most recent call last):
  File "~/projects/git/libcvsanaly2/scripts/fm3.py",
 line 7, in <module>
    from libcvsanaly2.Exporter import Exporter
ImportError: No module named Exporter

Run command

python fm3.py --db-output-user root --db-output-password admin
 --db-output-hostname localhost --db-output-database git_scm_log
 --db-driver mysql --db-user root --db-password admin --db-hostname
 localhost --db-database git_scm_log --db-driver mysql

Environment

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.