dreamyguy / gitlogg Goto Github PK
View Code? Open in Web Editor NEW💾 🧮 🤯 Parse the 'git log' of multiple repos to 'JSON'
License: MIT License
💾 🧮 🤯 Parse the 'git log' of multiple repos to 'JSON'
License: MIT License
I have two ideas to improve gitlogg's code quality (bash part):
If you think this is helpful, I am more than happy to implement both these changes :-)
When parsing some repos, the formatting went bonkers. I've included a gist here that I've removed a lot of the normal lines to show the difference:
Could you help me figure out how to fix the parsing?
Thank you!
I have found that in some point, the script gitlogg-generate-log.sh
replaces new lines \n
with ò
character.
The very first repository where I tried this tool was https://github.com/freeCodeCamp/freeCodeCamp where is actually one commit from a person that has this ò
character in his name, so the whole skript breaks at the point when it tires to parse that commit 😄
It would be nice if you could come with better solution and ideally omit this step of mysterious character replacing to make the tool more robust. Comment next to it saying "convert newlines/line-breaks to a character, so we can manipulate it without much trouble" does not quite apply in this case 🤣
Sorry for not creating a PR, but I really suck in coding in bash 😕
Even though the user gets a complaint about the missing path, the script doesn't fail gracefully.
The JSON
generation script is triggered and yet another error comes up, this time from node
, which is missing the data it expects.
To prevent this, make sure you:
A. Place the gitlogg folder where it is expected (Simple Mode)
or
B. Edit the absolute path to the folder that contains the repositories you want to parse to JSON
(Advanced Mode)
It's explained in details on the README.md file.
It turns out §
was a poor choice to replace \n
and \r
, as its presence on strings created through user input does break the output.
While attempting to parse 456 repositories I came across many occurrences of the usual delimiters in the subject
placeholder (commit message): |
, ^
, ~
and a few others. These are punctuation characters that rank pretty low on usage, according to http://www.wired.com/2013/08/the-rarity-of-the-ampersand/.
A possible solution would be to find a very seldom used character out of a reference like https://en.wikipedia.org/wiki/Letter_frequency
gitlogg-parse-json.js
is broken.
sh gitlogg-generate-log.sh
-e Generating git log for all repositories located at '../travistorrent-tools'. This might take a while!
-e The file ./gitlogg.tmp generated in: 0s
Generating JSON output...
[stdin]:40
author_name = item[16].replace(/"/g, "'"),
^
TypeError: Cannot read property 'replace' of undefined
at [stdin]:40:25
at Array.reduce (native)
at [stdin]:23:4
at Object.exports.runInThisContext (vm.js:54:17)
at Object.<anonymous> ([stdin]-wrapper:6:22)
at Module._compile (module.js:409:26)
at node.js:579:27
at nextTickCallbackWith0Args (node.js:420:9)
at process._tickCallback (node.js:349:13)
Minimally demonstrative example: Analyse repository TestRoots/travistorrent-tools.
Thanks for looking into this :-), I liked the way you prepare the git log
output in gitlogg-generate-log.sh
. Should this problem be related to git log
behaving strangely (printing/not prinitng empty lines), I might have a clean fix with awk
.
In some newer distributions, the option -r to reverse the output does not exist anymore.
It can be solved by using the command tac
instead of tail -r
Edit: Actually you can get the log in reverse order by using the --reverse
option for the git log
command.
The documentation says it should be set in gitlogg.sh, but it is overidden in gitlogg-generate-log.sh?
As it is now, the script that parses JSON does not throw any error if directories passed as git repositories aren't git repositories. That results in an unusable JSON file.
Feature request: Allow universal run path for primary script
With a default install, gitlogg
expects to be running in the root directory of the project, with a relative path above the scripts
directory:
> ./gitlogg.sh
bash: ./scripts/gitlogg-generate-log.sh: No such file or directory
In this issue I propose that gitlogg
development invest in a universal CLI tool, requiring some degree of path recognition or explicit sourcing is implemented so that gitlogg
can be run anywhere a user would like.
Suggestion 1: command-line arg specified path that explicitly sources the location of the helper scripts
Suggestion 2: install gitlogg
in a userland location such as /usr/local/
with explicit paths to that install location
Suggestion 3: packaging of the helper scripts and libraries into a single cli tool
...because that's the limit for strings in V8, a setting that's inherited by NodeJS (the runtime environment interprets JavaScript
using Google's V8 JavaScript
engine).
Here is an issue about the fuzzy error message at NodeJS's runtime, but however nice a more meaningful message would be, the V8 limitation would still be there.
This is pretty lame as the whole point of Gitlogg is to parse git log
from multiple repositories to JSON
, however bigger they are.
Does anyone know a way to bypass that limitation, or parse the information on gitlogg.tmp
more effectively, through a smarter stream?
I came across this problem while attempting to parse the git log
for https://github.com/LibreOffice/core. I read about the error and went on deleting a bunch of lines until I got it to work. The 268 MB file-size limitation was confirmed...
308,374 lines - failed 478,8 MB
174.000 lines - failed 268,8 MB
173.500 lines - worked 267,9 MB
This error breaks the git log
generation and causes the script to jump to the next repository:
sed: RE error: illegal byte sequence
gitlogg.tmp
's output doesn't necessarily break but commits are omitted, and pandas don't like that. Never say no to pandas. 🐼
Even though the cli
script manages to loop through the folders passed as an -d
argument, the output being streamed into each repo belongs to the gitlogg
repository.
In other words, when I add for instance the repository https://github.com/torvalds/linux as one of those under the directory I pass as -d
argument, I get the following output:
{
"repository": "linux",
"commit_nr": 1,
"commit_hash": "c7a397928f814f29028bccb281de60066395eaa1",
"commit_hash_abbreviated": "c7a3979",
"tree_hash": "e38dac0e625f63e877baa329204511ae490cd944",
"tree_hash_abbreviated": "e38dac0",
"parent_hashes": [],
"parent_hashes_abbreviated": [],
"author_name": "Wallace Sidhrée",
"author_name_mailmap": "Wallace Sidhrée",
"author_email": "[email protected]",
"author_email_mailmap": "[email protected]",
...
}
The script catches the linux repository name (which is the name of the directory the repo is under), but the content of the output belongs to the gitlogg repo - in this case the first commit.
Hello, I'm trying to follow simple mode and generate something
git checkout v0.1.4
./gitlogg.sh
Generating git log for all repositories located at '../repos/*/'. This might take a while! The file ./gitlogg.tmp generated in: 1s Error: Couldn't find preset "es2015" relative to directory "/Users/user/folder/gitlogg" at /usr/local/lib/node_modules/babel-cli/node_modules/babel-core/lib/transformation/file/options/option-manager.js:395:17 at Array.map (native) at OptionManager.resolvePresets (/usr/local/lib/node_modules/babel-cli/node_modules/babel-core/lib/transformation/file/options/option-manager.js:387:20) at OptionManager.mergePresets (/usr/local/lib/node_modules/babel-cli/node_modules/babel-core/lib/transformation/file/options/option-manager.js:370:10) at OptionManager.mergeOptions (/usr/local/lib/node_modules/babel-cli/node_modules/babel-core/lib/transformation/file/options/option-manager.js:330:14) at OptionManager.addConfig (/usr/local/lib/node_modules/babel-cli/node_modules/babel-core/lib/transformation/file/options/option-manager.js:232:10) at OptionManager.findConfigs (/usr/local/lib/node_modules/babel-cli/node_modules/babel-core/lib/transformation/file/options/option-manager.js:436:16) at OptionManager.init (/usr/local/lib/node_modules/babel-cli/node_modules/babel-core/lib/transformation/file/options/option-manager.js:484:12) at File.initOptions (/usr/local/lib/node_modules/babel-cli/node_modules/babel-core/lib/transformation/file/index.js:223:65) at new File (/usr/local/lib/node_modules/babel-cli/node_modules/babel-core/lib/transformation/file/index.js:140:24)
node -v v4.4.3
OSX 10.11.5
Is there a way to include https://git-scm.com/docs/git-notes into resulting json? If you run git log --show-notes=name-of-your-notes
you'll get some extra information.
commit 2e711912549af4efb9d1a8d13f6e325fefbc8a0e
Author: George Mihailov <[email protected]>
Date: Wed Jun 8 11:33:07 2016 -0700
Minor improvements
Notes (name-of-your-notes):
Here are some notes about the commit
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.