Coder Social home page Coder Social logo

Comments (29)

ChrisCates avatar ChrisCates commented on June 30, 2024 1

@iamonuwa, great, yes! So in https://github.com/ChrisCates/CommonCrawler/blob/master/README.md I've specified a configuration that I'd like for us to use.

If you have any questions about the proposed command line interface, let me know.
I'll be back on Friday to discuss more. As today and this week I need to focus on other stuff.

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024 1

@iamonuwa, those are configurations for using it as a binary.

An example of usage:

commoncrawler start --base-uri https://commoncrawler.com

And that would use a different base path for where CommonCrawl files are stored. This should update the Config struct as well too.

The intended functionality should work both as a library and as a CLI tool when compiled. I will be preparing an issue (with bounty) for making it fully usable as a library. So for now, just focus on it being a CLI tool.

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024 1

@iamonuwa, absolutely, that is expected. Just ensure that functionality is relatively the same and it works as intended.

from commoncrawler.

gitcoinbot avatar gitcoinbot commented on June 30, 2024

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


This issue now has a funding of 0.5 ETH (67.12 USD @ $134.25/ETH) attached to it as part of the AccessibleSoftware fund.

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024

@zyfrank, great, let me know if you have any questions. 💯
I'll be posting more work later!~

from commoncrawler.

zyfrank avatar zyfrank commented on June 30, 2024

@ChrisCates, I make a first investigation, I think what I can do are:

  1. use cobra to enhance config

  2. I think we can have two commands: first is download (which include download and unzip files), second is analyze. So you can download in one time and make analyze on another time .

What's is your opinion?

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024

@zyfrank, for the CLI tool it only needs to be able to download any common crawl file (and also navigate files by historical date) plus unzip.

The analyze tool is just used as a demo and will be moved to goveralls which I will assign to another task and bounty in issue #1

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024

@zyfrank, you can use this as a reference: https://commoncrawl.s3.amazonaws.com/ for navigating files in common crawl.

If this goes well. I will be adding another bounty for 1 ETH on the goveralls issue (#1) next week if you'd like to take it.

from commoncrawler.

gitcoinbot avatar gitcoinbot commented on June 30, 2024

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work for 0.5 ETH (68.81 USD @ $137.61/ETH) has been submitted by:

  1. @zyfrank

@ChrisCates please take a look at the submitted work:


from commoncrawler.

zyfrank avatar zyfrank commented on June 30, 2024

seems travis has authentication error

from commoncrawler.

rauchp avatar rauchp commented on June 30, 2024

Is this issue already closed or should someone still work on it?

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024

Hi @pedrojor2,

If @zyfrank is up to retrying. I think he should still get a chance.
If not, happy for you to try.

I actually do want to refactor this repository. Simply so that the formatting is better and easier to use. I'm not sure if @zyfrank completely understood what my intention was in order to build it into a CLI executable.

I am looking to allocate a couple of hours this Friday.

from commoncrawler.

gitcoinbot avatar gitcoinbot commented on June 30, 2024

Issue Status: 1. Open 2. Started 3. Submitted 4. Done


Work has been started.

These users each claimed they can complete the work by 7 months, 4 weeks ago.
Please review their action plans below:

1) josprachi has started work.

I am learning Golang. I want to work on this issue
2) jay-dee7 has started work.

i've been working with go for 2 years now and also expert in docker containers and tooling.

Learn more on the Gitcoin Issue Details page.

from commoncrawler.

josprachi avatar josprachi commented on June 30, 2024

Hi I need help When I tried to run it, I am getting an error
go run: cannot run *_test.go files (src/analyze_test.go)
Please guide

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024

Hi @josprachi.
Could you tell me what OS and version of Go you're using?
I will whip up a Go container as per: #10

from commoncrawler.

josprachi avatar josprachi commented on June 30, 2024

Hello @ChrisCates I am using following elementary OS
Linux 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15 16:06:21 UTC 2019

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024

from commoncrawler.

josprachi avatar josprachi commented on June 30, 2024

Hi I am able to run docker now

from commoncrawler.

vreddhi avatar vreddhi commented on June 30, 2024

@ChrisCates Do you want me to take this?

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024

I submitted a bounty for #10.
Once that is complete, we can discuss next steps.

from commoncrawler.

iamonuwa avatar iamonuwa commented on June 30, 2024

@ChrisCates let's discuss this

from commoncrawler.

iamonuwa avatar iamonuwa commented on June 30, 2024

What does each of these commands do?

commoncrawler --base-uri https://commoncrawl.s3.amazonaws.com/
commoncrawler --wet-paths wet.paths
commoncrawler --data-folder output/crawl-data
commoncrawler --start 0
commoncrawler --stop 5

from commoncrawler.

iamonuwa avatar iamonuwa commented on June 30, 2024

Do you wish to build a full cli tool from this project?

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024

@iamonuwa I've just added: #13

from commoncrawler.

iamonuwa avatar iamonuwa commented on June 30, 2024

The intended functionality should work both as a library and as a CLI tool when compiled. I will be preparing an issue (with bounty) for making it fully usable as a library. So for now, just focus on it being a CLI tool.

It will affect the project structure abit. But will try to capture the expected result

from commoncrawler.

zoek1 avatar zoek1 commented on June 30, 2024

@ChrisCates this Bounty Is still active?

from commoncrawler.

ChrisCates avatar ChrisCates commented on June 30, 2024

We will be revisiting all bounties on this repository at a later date.
Sorry that it's been inactive for a considerable amount of time.

from commoncrawler.

jay-dee7 avatar jay-dee7 commented on June 30, 2024

@ChrisCates is it still active? i would love to work on it

from commoncrawler.

SeanDunford avatar SeanDunford commented on June 30, 2024

Any updates?

from commoncrawler.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.