Coder Social home page Coder Social logo

Speed problems about the_platinum_searcher HOT 11 OPEN

winks avatar winks commented on August 28, 2024
Speed problems

from the_platinum_searcher.

Comments (11)

winks avatar winks commented on August 28, 2024

Quick addendum, as I just noticed I ran all of them with -i
$ time ack-grep -i time_t php-src | wc -l
358
ack-grep -i time_t php-src 1.61s user 0.16s system 99% cpu 1.778 total
wc -l 0.00s user 0.00s system 0% cpu 1.778 total

$ time pt time_t php-src | wc -l
380
pt time_t php-src 2.10s user 0.25s system 218% cpu 1.077 total
wc -l 0.00s user 0.00s system 0% cpu 1.076 total

$ time ag time_t php-src | wc -l
380
ag time_t php-src 0.39s user 0.25s system 211% cpu 0.305 total
wc -l 0.00s user 0.00s system 0% cpu 0.304 total

$ time ack-grep time_t php-src | wc -l
351
ack-grep time_t php-src 1.56s user 0.11s system 99% cpu 1.677 total
wc -l 0.00s user 0.00s system 0% cpu 1.676 total

from the_platinum_searcher.

shantanugoel avatar shantanugoel commented on August 28, 2024

Can you try with the pt download mentioned in main page by monochromegane? Your go version seems to be old and regex engine may have improved between the two. (-i calls upon regex engine in pt to do the matching).

from the_platinum_searcher.

winks avatar winks commented on August 28, 2024

I tried downloading https://drone.io/github.com/monochromegane/the_platinum_searcher/files/artifacts/bin/linun_amd64/pt and this binary as well my own new try (built with go1.2.2) still give these numbers:

$ time pt time_t php-src | wc -l
380
pt time_t php-src 1.96s user 0.22s system 218% cpu 1.002 total
wc -l 0.00s user 0.00s system 0% cpu 1.001 total

$ time pt2 time_t php-src | wc -l
380
pt2 time_t php-src 1.84s user 0.30s system 218% cpu 0.976 total
wc -l 0.00s user 0.00s system 0% cpu 0.976 total

Just to clarify, not a problem at all, was just curious about your "It's faster than ag!!" line in the README and I am honestly curious why it would be so much slower on my machine.

from the_platinum_searcher.

shantanugoel avatar shantanugoel commented on August 28, 2024

Actually, I didn't write the README as I don't own this project.. Just trying to contribute to it to find out how to make it faster. I mainly use pt on windows where ag is dog slow. I do work on linux as well (and do my development there) but never compared ag on linux.
I'm planning to profile pt code to find out where are the bottlenecks. Also have some changes in pipeline, which would hopefully improve performance further

from the_platinum_searcher.

monochromegane avatar monochromegane commented on August 28, 2024

Thanks for this issue @winks , and thanks for reply @shantanugoel .

In my environment (Mac OSX 10.9.2) pt is faster than ag.

$ time pt time_t php-src | wc -l
     380
pt time_t php-src  2.41s user 0.73s system 237% cpu 1.319 total
wc -l  0.00s user 0.00s system 0% cpu 1.318 total

$ time ag time_t php-src | wc -l
     380
ag time_t php-src  5.16s user 0.45s system 111% cpu 5.021 total
wc -l  0.00s user 0.00s system 0% cpu 5.020 total

I think "It's faster than ag!!" is right.

But, if -i options given pt is slow... as you say.
pt uses regex pattern in this case.

$ time pt -i time_t php-src | wc -l
     391
pt -i time_t php-src  23.00s user 0.64s system 388% cpu 6.076 total
wc -l  0.00s user 0.00s system 0% cpu 6.076 total

$ time ag -i time_t php-src | wc -l
     391
ag -i time_t php-src  5.27s user 0.45s system 113% cpu 5.030 total
wc -l  0.00s user 0.00s system 0% cpu 5.029 total

I will find out what makes slowly.

from the_platinum_searcher.

shantanugoel avatar shantanugoel commented on August 28, 2024

I was able to replicate this as well on my linux box. It mostly boils down to the regex engine I think. golang regex engine seems to be slow.

from the_platinum_searcher.

shiena avatar shiena commented on August 28, 2024

Hi, all
I was able to speed up by modifying the method of comparing -i option.
But, you will not be able to use the regular expression.
@monochromegane This is correct?

$ /usr/bin/time ./pt_cur -i time_t php-src > cur
        4.00 real        21.00 user         0.84 sys
$ /usr/bin/time ./pt_mod -i time_t php-src > mod
        1.39 real         5.54 user         0.91 sys
$ wc -l cur mod 
     391 cur
     391 mod
     782 total
$ diff <(sort cur) <(sort mod)
diff --git a/search/match/match.go b/search/match/match.go
index 31148fe..f488015 100644
--- a/search/match/match.go
+++ b/search/match/match.go
@@ -83,7 +84,7 @@ func (self *Match) setUpNewMatch(num int, s string) (*Match, bool) {

 func (self *Match) IsMatch(pattern *pattern.Pattern, num int, s string) (*Match, bool) {
        if pattern.IgnoreCase {
-               if pattern.Regexp.MatchString(s) {
+               if strings.Contains(strings.ToUpper(s), strings.ToUpper(pattern.Pattern)) {
                        return self.setUpNewMatch(num, s)
                }
        } else if strings.Contains(s, pattern.Pattern) {
diff --git a/search/pattern/pattern.go b/search/pattern/pattern.go
index 47ca9e1..ed8891e 100644
--- a/search/pattern/pattern.go
+++ b/search/pattern/pattern.go
@@ -24,7 +24,7 @@ func NewPattern(pattern, filePattern string, smartCase, ignoreCase bool) (*Patte
        var regIgnoreCase *regexp.Regexp
        var ignoreErr error
        if ignoreCase {
-               regIgnoreCase, ignoreErr = regexp.Compile(`(?i)(` + pattern + `)`)
+               regIgnoreCase, ignoreErr = regexp.Compile(`(?i)(\Q` + pattern + `\E)`)
        }

        var regFile *regexp.Regexp

from the_platinum_searcher.

monochromegane avatar monochromegane commented on August 28, 2024

Thanks @shiena . It is a interesting patch.
But it has problem as you say. And there is still regexp that print with color.

I think I will remove regex from basic implementaion, and add -e option if user want to use regexp.

refs: #27 (comment)

from the_platinum_searcher.

monochromegane avatar monochromegane commented on August 28, 2024

I have implemented -e option that parses PATTERN as regexp.
And -i option speed up by @shiena 's way.

from the_platinum_searcher.

monolithpl avatar monolithpl commented on August 28, 2024

see some benchmarks comparing ag, pt, grep and sift on windows here: https://github.com/monolithpl/frequency-count-benchmark/blob/master/README.md . turns out grep is the fastest

from the_platinum_searcher.

charlievieth avatar charlievieth commented on August 28, 2024

Take a look at the fastWalk function in golang/x/tools/imports, specifically the unix implementation. Checking if paths should included is some of the hottest in ag and I wouldn't be surprised if thats the case here.

The Go stdlib is not fast when it comes to stat'ing files. For example the below code is about twice as slow as find "$GOPATH/src" -name \*.go:

package main

import (
    "fmt"
    "os"
    "path/filepath"
    "strings"
)

func walkFn(path string, info os.FileInfo, err error) error {
    if strings.HasSuffix(path, ".go") {
        fmt.Println(path)
    }
    return nil
}

func main() {
    filepath.Walk(filepath.Join(os.Getenv("GOPATH"), "src"), walkFn)
}

from the_platinum_searcher.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.