Coder Social home page Coder Social logo

gowarc's People

Contributors

avokadoen avatar dan-bishopfox avatar dependabot[bot] avatar johnerikhalse avatar maeb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gowarc's Issues

Bug when running index with specified path

Issue description

Running warc index ./testdata/IAH-20080430204825-00000-blackbook.warc results in an runtime error where the program tries to dereference invalid memory.

Steps to reproduce the issue

  1. clone repo from master branch
  2. build the project
  3. do ./warc index ./testdata/IAH-20080430204825-00000-blackbook.warc in root

What's the expected result?

  • gowarc index warc file OR an error message describing user error

What's the actual result?

dump:

Using config file: /home/aksel/Projects/gowarc/config.yaml
Format: { <nil>}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xa124b5]

goroutine 1 [running]:
github.com/nlnwa/gowarc/cmd/warc/cmd/index.runE(0xc0000d4dc0, 0x0, 0x0)
        /home/aksel/Projects/gowarc/cmd/warc/cmd/index/index.go:88 +0xd5
github.com/nlnwa/gowarc/cmd/warc/cmd/index.NewCommand.func1(0xc00012f900, 0xc000122060, 0x1, 0x1, 0x0, 0x0)
        /home/aksel/Projects/gowarc/cmd/warc/cmd/index/index.go:77 +0x6f
github.com/spf13/cobra.(*Command).execute(0xc00012f900, 0xc000122020, 0x1, 0x1, 0xc00012f900, 0xc000122020)
        /home/aksel/go/pkg/mod/github.com/spf13/[email protected]/command.go:826 +0x47c
github.com/spf13/cobra.(*Command).ExecuteC(0xc00012ef00, 0xc000000180, 0xc00018ff78, 0x411905)
        /home/aksel/go/pkg/mod/github.com/spf13/[email protected]/command.go:914 +0x30b
github.com/spf13/cobra.(*Command).Execute(...)
        /home/aksel/go/pkg/mod/github.com/spf13/[email protected]/command.go:864
main.main()
        /home/aksel/Projects/gowarc/cmd/warc/main.go:26 +0x2b

Investigate heisenbug with db AddBatch()

Issue description

A heisenbug which causes gowarc to panic at db AddBatch()

Steps to reproduce the issue

Currently unknown, but indexing many warcs increase the odds

What's the expected result?

All warcs are either indexed or some report errors i.e invalid file content

What's the actual result?

Some warcs causes gowarc to panic at random

Additional details / screenshot

Panic stack trace

panic: runtime error: invalid memory address or nil pointer dereference                       
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xa0a59c]                       
goroutine 23 [running]:                                                                       
github.com/nlnwa/gowarc/pkg/index.(*Db).AddBatch.func1(0xc517e22980, 0x450001, 0xc517e22980)  
        /build/pkg/index/db.go:253 +0x8c                                                      
github.com/dgraph-io/badger/v2.(*DB).Update(0xc0001ce800, 0xc78fa4bf50, 0x0, 0x0)             
        /go/pkg/mod/github.com/dgraph-io/badger/[email protected]/txn.go:696 +0x94                    
github.com/nlnwa/gowarc/pkg/index.(*Db).AddBatch(0xc0001d7ab0, 0xc4fd83e000, 0x2711, 0x2711)  
        /build/pkg/index/db.go:251 +0x100                                                     
github.com/nlnwa/gowarc/pkg/index.NewIndexDb.func1(0xc0001d7ab0, 0xc0000e8e40)                
        /build/pkg/index/db.go:79 +0x71                                                       
created by github.com/nlnwa/gowarc/pkg/index.NewIndexDb                                       
        /build/pkg/index/db.go:77 +0x31a  

Create a tagged alpha release

Create a tagged alpha release so that other go projects can start using gowarc as a dependency without relying on commit hashes

Recursive indexing

The index command can be ignored in this issue as there should be another issue that is about unifying the logic for index and serve indexing.

Expose port in the cli on serve

The following changes should be done:

  • Allow user to change port through the cli.
    • Through arguments on serve
    • Through config file (argument takes precedence)
  • print clickable url endpoint on serve
  • describe config port variable in readme

Feature: Integrate new CI features from

Is your feature request related to a problem? Please describe.
Linting of incoming code changes

Describe the solution you'd like
Update CI to mirror changes in gowarcserver PR 25

Describe alternatives you've considered

Additional context

Support for warc 0.17

Is your feature request related to a problem? Please describe.
Our current testdata set has a warc file with version 0.17 which is not supported in the refactor

Misleading comment

The following comment:

// WarcRecord will always be nil if error is returned.

seems to be inaccurate because
wf.currentRecord, recordOffset, validation, err = wf.warcReader.Unmarshal(wf.bufferedReader)

leads to

gowarc/unmarshaler.go

Lines 196 to 202 in 614c93d

err = record.parseBlock(bufio.NewReader(content), validation)
if err != nil {
return record, offset, validation, err
}
err = record.ValidateDigest(validation)
return record, offset, validation, err

which returns a warcRecord which is not nil on error.

No way of setting log level

Description

logrus allows changing log level with log.SetLevel(*level*) (see doc).

gowarc should expose this feature through the config file and command argument in root to allow users to change it for each session.

Changes

  • Create new config file variable for log level with a sane default
  • Explain new variable in the readme
  • New config variable in root to overwrite config file variable i.e -log-level=warn

Handle deleting files while serving

There is no handling of deleting files. There are currently two issues with this

  • Deleting a file/directory before gowarc has indexed the files will result in a panic (see log below). This issue migth be very hard to fix and possibly not worth it as this would be a serious user error
  • The index for the file will remain in the db

Preferrably there should also be a command to remove removed indices from the db

Panic log:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0xa0f2bd]

goroutine 226 [running]:
github.com/nlnwa/gowarc/pkg/index.indexFile(0xc000176e00, 0xc0003ac090, 0x25)
        /home/aksel/Projects/gowarc/pkg/index/indexworker.go:108 +0xfd
github.com/nlnwa/gowarc/pkg/index.(*indexWorker).worker(0xc00025e810, 0x7)
        /home/aksel/Projects/gowarc/pkg/index/indexworker.go:75 +0x108
created by github.com/nlnwa/gowarc/pkg/index.NewIndexWorker
        /home/aksel/Projects/gowarc/pkg/index/indexworker.go:58 +0x13a

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.