Comments (7)
All very good points. Effectively the underlying use case and technical questions for personal users may be quite different from orgs in the end vs. just a matter of conforming to the API…
from sourced-ce.
About being possible, I'd say yes;
I'd maybe change "org" by "owner", being able to be either a "user" or an "org"; doing so we would also avoid problems if the user becomes an org at any point.
But:
with "org", we fetch metadata from its members.
with "user", we won't fetch that metadata.
But I'm not sure what's the purpose of getting the org members.
If the purpose is to assign the activity in the repos, to its members, then there will be some activity that won't be assigned (because it will belong to gh users that won't be members of that org, so they won't be imported; example: one issue opened in bblfsh
by a non bblfsh
member, won't be assigned to any user in our DB)
If we need to get the info about all the users contributing in a repo (like the example above), we should also fetch:
- all gh users having activity in that repo, and not being members of that repo org,
- try to find gh users from repo commits (to be able to assign commits to users, not only gh activity).
If we import also repos from users, as suggested by this issue, the activity in their repos won't be assigned to another user than the imported user, unles we also do (1)
and (2)
.
from sourced-ce.
@marnovo even technically it's not that different from org but the results might be very unexpected for users and we should do something about it. Problems I see:
- half (or more) of the repos I have and any other dev in src-d are forks. Similar happens for external devs. The problem with forks: nobody updates master. Most of our charts rely on the HEAD so repos would produce results only to the moment when they were forked
- there are no issues or pull requests in forks, all metadata charts will become useless
As a solution for user
command, I would propose to resolve forks and download code/metadata for the original repo. Even in some cases (example) it would make more sense to download the fork, but such cases are exceptions.
from sourced-ce.
I wouldn't do it automatically but maybe with options: --use-parent
, to use the parent repo instead, or --add-parent
to fetch both: original, and parent; or even fully ignore forks with --no-forks
as requested by @warenlg at #109
Or also --exclude
and pass a list of repos to be ignored (in case of repos causing konwn fails, o whatever other reasons)
This way everything would be more explicit, what I think would be better, and more flexible.
from sourced-ce.
I'd love to have this feature, and I also think that it would increase a lot the chance of being tried by people.
BTW regarding forks I agree that there could be different needs depending on the user. But in general I think that it's either --ignore-forks
or not. If the user is interested in resolving forks with original repo then maybe it's more straightforward to just initialize sourced-ce
with the owner (whether it is an org or a user) of that original repo and maybe provide some filtering capabilities such as init orgs apache --repositories=incubator-superset
.
Also because the repositories that are most likely to be forked are popular ones, and including popular repos together with mine, I think that it will just hide a lot of insights as it will add a lot of noise.
from sourced-ce.
Agree with Marvin for most of the points. Though I would want to remind that not everybody (I don't have numbers but most probably it's a majority of github users) don't have real repositories that aren't forks and aren't dump of some code (for a school or workshop or something like that). So analyzing only the profile doesn't make sense for them at all. Exploring the information about repositories they contributed to, on another hand, can be interesting.
from sourced-ce.
Though I would want to remind that not everybody (I don't have numbers but most probably it's a majority of github users) don't have real repositories that aren't forks and aren't dump of some code (for a school or workshop or something like that).
I don't know whether is the majority of the users, but you're absolutely right about this type of users, I didn't think about it. I'm just wondering how this type of users is likely to use a tool like this for their forked repos, but this is a different point.
from sourced-ce.
Related Issues (20)
- When running `sourced stop` some containers doesn't exit with return code 0 HOT 1
- Add GOPROXY to travis HOT 4
- Allow to pass an index number with `sourced compose set` HOT 1
- More complete `sourced status` command
- Forbidden error while accessing docker compose file when initializing local Git repositories using source{d} HOT 5
- Add gitbase-spark-connector to the compose file HOT 2
- Docker exits with 1 when "sourced init local <repo>" HOT 6
- Error sourced init orgs --token HOT 9
- sourced init without the org parameter (or making it optional) HOT 4
- Support for GitHub Enterprise HOT 3
- Changes from GitBook might alter our docs HOT 9
- Update to go-1.13 HOT 3
- Docs outdated when listing sub-commands
- Installation Error message (inside corporate firewall) HOT 28
- Improve error messages
- Bitbucket support? HOT 1
- `docker-compose.override.yml` file breaks the `compose` sub cmd HOT 6
- Identify and show errors for old unsupported version of docker/docker-compose HOT 1
- gitcollector container stops after a few seconds of startup
- `sourced init orgs`: Invalid project name
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sourced-ce.