Comments (29)
@iamonuwa, great, yes! So in https://github.com/ChrisCates/CommonCrawler/blob/master/README.md I've specified a configuration that I'd like for us to use.
If you have any questions about the proposed command line interface, let me know.
I'll be back on Friday to discuss more. As today and this week I need to focus on other stuff.
from commoncrawler.
@iamonuwa, those are configurations for using it as a binary.
An example of usage:
commoncrawler start --base-uri https://commoncrawler.com
And that would use a different base path for where CommonCrawl files are stored. This should update the Config
struct as well too.
The intended functionality should work both as a library and as a CLI tool when compiled. I will be preparing an issue (with bounty) for making it fully usable as a library. So for now, just focus on it being a CLI tool.
from commoncrawler.
@iamonuwa, absolutely, that is expected. Just ensure that functionality is relatively the same and it works as intended.
from commoncrawler.
Issue Status: 1. Open 2. Started 3. Submitted 4. Done
This issue now has a funding of 0.5 ETH (67.12 USD @ $134.25/ETH) attached to it as part of the AccessibleSoftware fund.
- If you would like to work on this issue you can 'start work' on the Gitcoin Issue Details page.
- Want to chip in? Add your own contribution here.
- Questions? Checkout Gitcoin Help or the Gitcoin Slack
- $50,186.80 more funded OSS Work available on the Gitcoin Issue Explorer
from commoncrawler.
@zyfrank, great, let me know if you have any questions. 💯
I'll be posting more work later!~
from commoncrawler.
@ChrisCates, I make a first investigation, I think what I can do are:
-
use cobra to enhance config
-
I think we can have two commands: first is download (which include download and unzip files), second is analyze. So you can download in one time and make analyze on another time .
What's is your opinion?
from commoncrawler.
@zyfrank, for the CLI tool it only needs to be able to download any common crawl file (and also navigate files by historical date) plus unzip.
The analyze tool is just used as a demo and will be moved to goveralls which I will assign to another task and bounty in issue #1
from commoncrawler.
@zyfrank, you can use this as a reference: https://commoncrawl.s3.amazonaws.com/ for navigating files in common crawl.
If this goes well. I will be adding another bounty for 1 ETH on the goveralls issue (#1) next week if you'd like to take it.
from commoncrawler.
Issue Status: 1. Open 2. Started 3. Submitted 4. Done
Work for 0.5 ETH (68.81 USD @ $137.61/ETH) has been submitted by:
@ChrisCates please take a look at the submitted work:
- Learn more on the Gitcoin Issue Details page
- Want to chip in? Add your own contribution here.
- Questions? Checkout Gitcoin Help or the Gitcoin Slack
- $51,738.31 more funded OSS Work available on the Gitcoin Issue Explorer
from commoncrawler.
seems travis has authentication error
from commoncrawler.
Is this issue already closed or should someone still work on it?
from commoncrawler.
Hi @pedrojor2,
If @zyfrank is up to retrying. I think he should still get a chance.
If not, happy for you to try.
I actually do want to refactor this repository. Simply so that the formatting is better and easier to use. I'm not sure if @zyfrank completely understood what my intention was in order to build it into a CLI executable.
I am looking to allocate a couple of hours this Friday.
from commoncrawler.
Issue Status: 1. Open 2. Started 3. Submitted 4. Done
Work has been started.
These users each claimed they can complete the work by 7 months, 4 weeks ago.
Please review their action plans below:
1) josprachi has started work.
I am learning Golang. I want to work on this issue
2) jay-dee7 has started work.
i've been working with go for 2 years now and also expert in docker containers and tooling.
Learn more on the Gitcoin Issue Details page.
from commoncrawler.
Hi I need help When I tried to run it, I am getting an error
go run: cannot run *_test.go files (src/analyze_test.go)
Please guide
from commoncrawler.
Hi @josprachi.
Could you tell me what OS and version of Go you're using?
I will whip up a Go container as per: #10
from commoncrawler.
Hello @ChrisCates I am using following elementary OS
Linux 4.15.0-47-generic #50~16.04.1-Ubuntu SMP Fri Mar 15 16:06:21 UTC 2019
from commoncrawler.
from commoncrawler.
Hi I am able to run docker now
from commoncrawler.
@ChrisCates Do you want me to take this?
from commoncrawler.
I submitted a bounty for #10.
Once that is complete, we can discuss next steps.
from commoncrawler.
@ChrisCates let's discuss this
from commoncrawler.
What does each of these commands do?
commoncrawler --base-uri https://commoncrawl.s3.amazonaws.com/
commoncrawler --wet-paths wet.paths
commoncrawler --data-folder output/crawl-data
commoncrawler --start 0
commoncrawler --stop 5
from commoncrawler.
Do you wish to build a full cli tool from this project?
from commoncrawler.
@iamonuwa I've just added: #13
from commoncrawler.
The intended functionality should work both as a library and as a CLI tool when compiled. I will be preparing an issue (with bounty) for making it fully usable as a library. So for now, just focus on it being a CLI tool.
It will affect the project structure abit. But will try to capture the expected result
from commoncrawler.
@ChrisCates this Bounty Is still active?
from commoncrawler.
We will be revisiting all bounties on this repository at a later date.
Sorry that it's been inactive for a considerable amount of time.
from commoncrawler.
@ChrisCates is it still active? i would love to work on it
from commoncrawler.
Any updates?
from commoncrawler.
Related Issues (12)
- Test Coverage with CodeCov
- More detailed logging upon failure(s) HOT 2
- Error during parse binary warc package HOT 2
- Docker Container HOT 7
- Binary is cURLable from web
- Full usage as a library
- Windows Docker CI Build
- what functionality is ready to try / test out HOT 2
- Preparing CommonCrawl .wet files via IPFS
- Electron GUI
- Having trouble building
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from commoncrawler.