Coder Social home page Coder Social logo

Comments (21)

OrenRysn avatar OrenRysn commented on July 30, 2024 1

After breaking down the git log and parsing step by step, I discovered that there was actually a carriage return (^M) character in one of the Git commit messages being parsed, which was adding an extra newline and throwing all subsequent parsing off as a result.

Modifying the parsing in gitlogg-generate-log.sh to include a carriage return deletion allows me to get past the issue.

        git log --all --no-merges --shortstat --reverse --pretty=format:'commits\trepository\t'"${PWD##*/}"'\tcommit_hash\t%H\tcommit_hash_abbreviated\t%h\ttree_hash\t%T\ttree_hash_abbreviated\t%t\tparent_hashes\t%P\tparent_hashes_abbreviated\t%p\tauthor_name\t%an\tauthor_name_mailmap\t%aN\tauthor_email\t%ae\tauthor_email_mailmap\t%aE\tauthor_date\t%ad\tauthor_date_RFC2822\t%aD\tauthor_date_relative\t%ar\tauthor_date_unix_timestamp\t%at\tauthor_date_iso_8601\t%ai\tauthor_date_iso_8601_strict\t%aI\tcommitter_name\t%cn\tcommitter_name_mailmap\t%cN\tcommitter_email\t%ce\tcommitter_email_mailmap\t%cE\tcommitter_date\t%cd\tcommitter_date_RFC2822\t%cD\tcommitter_date_relative\t%cr\tcommitter_date_unix_timestamp\t%ct\tcommitter_date_iso_8601\t%ci\tcommitter_date_iso_8601_strict\t%cI\tref_names\t%d\tref_names_no_wrapping\t%D\tencoding\t%e\tsubject\t%s\tsubject_sanitized\t%f\tcommit_notes\t%N\tstats\t' |
          sed '/^[ \t]*$/d' |               # remove all newlines/line-breaks, including those with empty spaces
+          tr -d '\r' |                      # Delete carriage returns
          tr '\n' 'ò' |                     # convert newlines/line-breaks to a character, so we can manipulate it without much trouble
          tr '\r' 'ò' |                     # convert carriage returns to a character, so we can manipulate it without much trouble
          sed 's/tòcommits/tòòcommits/g' |  # because some commits have no stats, we have to create an extra line-break to make `paste -d ' ' - -` consistent
          tr 'ò' '\n' |                     # bring back all line-breaks
          sed '{
              N
              s/[)]\n\ncommits/)\
          commits/g
          }' |                              # some rogue mystical line-breaks need to go down to their knees and beg for mercy, which they're not getting
          paste -d ' ' - -                  # collapse lines so that the `shortstat` is merged with the rest of the commit data, on a single line

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

Hi there @Inventitech, I missed this one completely, been away for a while.

Let me look into this and see if I can solve it within the gitlogg-parse-json.js.

Through this project I've come to realise that git log's output is surprisingly inconsistent across multiple repositories, but I've managed to isolate different issues by testing with 490+ repos so far.

Cheers for the heads-up!

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

Hi there again, I have some good news! ✨

I got the json output for TestRoots/travistorrent-tools rendered correctly, and it's on this gist.

screen shot 2016-11-10 at 00 14 25

It ran pretty fast too. 🚀

I could not reproduce the error you've reported with gitlogg-parse-json.js. I used what I call the "Simple Mode" (https://github.com/dreamyguy/gitlogg#simple-mode), by simply doing a git clonewithin the repos/ folder, which should be created at the repo's root. The repos directory is present on .gitignore, so whatever is placed there won't be tracked by git.

from gitlogg.

Inventitech avatar Inventitech commented on July 30, 2024

Hey @dreamyguy, Thanks for the reply.

Interesting. I suspect there is some inconsistency in the intermediate representation extracted from git log. Which version of Git are you running?

➜  gitlogg git:(master) git --version
git version 2.9.3

Does parsing my gitlogg.tmp work for you? If it does, the error lies elsewhere.

from gitlogg.

OrenRysn avatar OrenRysn commented on July 30, 2024

Hey @dreamyguy! I'm actually running your gitlogg tool as well. Looking to utilize it to provide some clean, useful information across multiple repositories.

I'm seeing the same issue, but I've yet to debug the root cause. I've added a loop in gitlogg-parse.json.js to print every value in the item array:

var output = fs.readFileSync('gitlogg.tmp', 'utf8')
  .trim()
  .split('\n')
  .map(line => line.split('\\t'))
  .reduce((commits, item) => {

+    var i = 0;
+    for (i = 0; i < item.length; i++) {
+        console.log(chalk.blue('item[' + i + '] = ' + item[i]))
+    }
+
    // vars based on sequential values ( sanitise " to ' on fields that accept user input )

At some point while trying to run gitlogg.sh, I will eventually see item[68] which is usually populated with the value for "stats" populated with no information, and then the following item[0] which contains the string "commits" will also contain the stats.

Expected (from same output just prior to issue)

item[65] = commit_notes
item[66] =
item[67] = stats
item[68] = 1 file changed, 1 insertion(+), 2 deletions(-)
item[0] = commits
item[1] = repository

Actual:

item[65] = commit_notes
item[66] =
item[67] = stats
item[68] =
item[0] = 2 files changed, 11 insertions(+), 41 deletions(-) commits
item[1] = repository

Trying to determine if there's an issue with the parsing logic that removes/converts newlines/line-breaks.

Edit: To clarify, the first time this issue happens does not cause the error, but what happens as a result is that every subsequent items group suddenly sees the same issue of the stats values being placed in item[0] instead of item[68]. Which results in one last newline at the bottom of gitlogg.tmp containing just the stats value.

The result is a single line at the bottom of gitlogg.tmp that attempts to be parsed but only contains information in item[0], while item[1-68] are undefined, throwing the error.

Edit 2: Autofilled the wrong username at the top. Meant to direct to @dreamyguy

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

Hi @Inventitech, thanks for posting the gitlogg.tmp output. It broke on the 5th line. :(

As @OrenRysn indirectly pointed out, gitlogg-parse-json.js is completely dependent on gitlogg-generate-log.sh to be able to parse gitlogg.tmp correctly, and if that fails the parsing will break.

Nearly all problems I've had with gitlogg-generate-log.sh so far were caused by unexpected characters finding their ways into the git log. The carriage return ^M within placeholders was new to me, but I see now that this is possible and should be solved.

I'll look into this asap, and cheers for the feedback!

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

@Inventitech and by the way, we have the same git version, 2.9.3.

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

@OrenRysn only to comment on your first post, the index in the parser gets messed up when the output of gitlogg.tmp is broken, like the one mentioned above. It takes a single unexpected character on a single commit to break the whole structure of the temporary file, unfortunately. Characters that trigger a line-break are the absolute worst.

I'll see what I can do, and thanks again for the heads-up.

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

Hi guys, I did 3 commits that will hopefully help.

The first 2 didn't really correct anything, but help show what's happening on the console (how many repos will be parsed and which repo is getting its git log generated - live.

All credit for the 3rd commit goes to @OrenRysn 🏆 , but I did choose to replace carriage return with space instead of deleting it.

@Inventitech I'm really struggling to reproduce your problem, I haven't managed to get a broken output on gitlogg.tmp, even before my commits. I tried with your repo by itself (I pulled the latest changes) and mixed with other 476 repos, it worked every time.

Could you pull my latest changes and try again?

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

Another update, with a few more commits since this issue was opened.

@Inventitech since I've been at it, I have successfully generated gitlogg.tmp from TestRoots/travistorrent-tools, by itself and along other repos without a glitch. As long as that goes well, gitlogg-parse-json.js will do its job.

Try running gitlogg exactly as described here, with the latest changes, and let me know how it goes.

from gitlogg.

Inventitech avatar Inventitech commented on July 30, 2024

Error still persists :(

Generating git log for the one repository located at '../repos/*/'. This might take a while!
Outputting travistorrent-tools
The file ./gitlogg.tmp generated in: 1s
Generating JSON output...
[stdin]:42
  author_name = item[17].replace(/"/g, "'"),
                        ^

TypeError: Cannot read property 'replace' of undefined
    at [stdin]:42:25
    at Array.reduce (native)
    at [stdin]:23:4
    at Object.exports.runInThisContext (vm.js:54:17)
    at Object.<anonymous> ([stdin]-wrapper:6:22)
    at Module._compile (module.js:409:26)
    at node.js:579:27
    at nextTickCallbackWith0Args (node.js:420:9)
    at process._tickCallback (node.js:349:13)

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

That's really too bad @Inventitech. Is your gitlogg.tmp still breaking at the same line?

I have to find a way to validate gitlogg.tmp before running the js parser, otherwise one gets the impression that the problem is with the javascript.

I have prepared a gist with travistorrent-tools's gitlogg.json output, so you can play with it until a solution is found. 🎯

Just for curiosity, what system are you on? I'm on OSX 10.11.6, El Capitan.

BTW I've pushed a new release today, which makes the initial setup simpler and less error-prone - hopefully. The README has been updated accordingly.

I'd try starting from scratch with a new git clone and take it from there.

from gitlogg.

Inventitech avatar Inventitech commented on July 30, 2024

I'm a Linux dude.

If I remember correctly, what breaks it is that occasionally, you just get a stats output for a whole range of commits, not a single commit. I'll have time to look into this in more depth in about two to three weeks.

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

@Inventitech while testing your pull-request with humongous git repositories, I came across a few problems around gitlogg-generate-log.sh, problems that break gitlogg.tmp's output.

I'll be creating issues on them when time allows.

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

I can now say - with a hand in my heart - that this issue is resolved with the v0.1.9 release. 🎆

I've tested all these repos, in a single run of npm run gitlogg.

Note that the commit count is not updated, since I cloned these repos many weeks ago.

[634,959]  linux
[424,810]  android_kernel_yu_msm8916
[399,784]  core
[106,402]  odoo
[ 96,062]  nixpkgs
[ 70,822]  homebrew-core
[ 60,224]  rails
[ 45,091]  git
[ 23,494]  django
[  8,529]  react-native
[  7,590]  react
[  7,040]  beets
[    539]  travistorrent-tools
[    523]  fbctf
[    182]  flexbox-layout
[    118]  open-color
[     85]  color-consolidator
[     32]  sidhree-com
---------
1,886,349  total commit count

Stats:

gitlogg.tmp generation:     2,266 s ~= 37.76667 mins
gitlogg.json parsing:       137,331 ms ~= 2.28885 mins

gitlogg.tmp file size:      2,658,113,356 bytes ~= 2.65 GB
gitlogg.json file size:     1,515,488,035 bytes ~= 1.51 GB

gitlogg.tmp nr. lines:      1,809,164
gitlogg.json nr. lines:     1,809,168

There are less lines on the output file than the number of total commits, but that's because I exclude merges with the --no-merges git CLI.

Do try v0.1.9 out! 🏆

from gitlogg.

Inventitech avatar Inventitech commented on July 30, 2024

Still no luck, sir :(

➜  gitlogg git:(dd44796) ✗ sh scripts/gitlogg.sh -n 1
Generating git log for all 4 repositories located at './_repos/*/'. This might take a while!
Outputting ghtorrent-update
Outputting gi
Outputting gitlogg
Outputting UnifiedASATVisualizer
The file _tmp/gitlogg.tmp generated in: 5s
Parsing JSON output...
Something went wrong, _output/gitlogg.json could not be written / saved
[stdin]:162
  var time_array = author_date.split(' '),
                              ^

TypeError: Cannot read property 'split' of undefined
    at Transform.parser._transform ([stdin]:162:31)
    at Transform._read (_stream_transform.js:167:10)
    at Transform._write (_stream_transform.js:155:12)
    at doWrite (_stream_writable.js:300:12)
    at writeOrBuffer (_stream_writable.js:286:5)
    at Transform.Writable.write (_stream_writable.js:214:11)
    at LineStream.ondata (_stream_readable.js:542:20)
    at emitOne (events.js:77:13)
    at LineStream.emit (events.js:169:7)
    at readableAddChunk (_stream_readable.js:153:18)

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

@Inventitech I ran into this TypeError: Cannot read property 'split' of undefined all the time while using Gitlogg with xargs. The problem was in the mixing of one commit with another, in one line, and the indexes got messed up.

Please do test this on a fresh clone of Gitlogg, so you're sure to have the very latest version, v0.1.9, which no longer has the parallelization changes.

I think you're still running an older version, since you created a pull-request two hours ago with code from v0.1.8. I released v0.1.9 quite late yesterday, so it's very new.

from gitlogg.

Inventitech avatar Inventitech commented on July 30, 2024

No, as you can see above gitlogg git:(dd44796), I am on v.0.1.9. Also, I used -n 1 which disables parallelism.

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

Are these the repos you tested with?

https://github.com/gousiosg/ghtorrent-update
https://github.com/dspinellis/gi
https://github.com/dreamyguy/gitlogg
https://github.com/ClintonCao/UnifiedASATVisualizer

Note that parallel processing is completely removed on v0.1.9, so you can omit the CLI option.

from gitlogg.

Inventitech avatar Inventitech commented on July 30, 2024

Yes (regarding repositories).

from gitlogg.

dreamyguy avatar dreamyguy commented on July 30, 2024

@Inventitech I just tested these repos and got no error.

To be 100% sure we're taking the exact same steps, I've put this one-liner together. It's the same line I've used to test:

mkdir gitlogg-test && cd gitlogg-test && git clone https://github.com/dreamyguy/gitlogg.git && cd gitlogg && npm run setup && cd _repos && git clone --bare https://github.com/gousiosg/ghtorrent-update.git && git clone --bare https://github.com/dspinellis/gi.git && git clone --bare https://github.com/dreamyguy/gitlogg.git && git clone --bare https://github.com/ClintonCao/UnifiedASATVisualizer.git && npm run gitlogg

Run the line and let me know how it goes.

If you don't get it to work, let me know the specs of your OS and I'll open another issue that's specific to Linux, for I can't reproduce it on OSX.

from gitlogg.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.