Coder Social home page Coder Social logo

ros_gh's Introduction

ros_gh

License Build Status codecov codebeat  badge

ros_gh is a tool that apply algorithms to recommend users to answers a given question . To achieve it there are three steps: collect data from github and ROS Answers, match identities and apply algorithms.

How to use

maven

<dependencies>
    <dependency>
        <groupId>com.elbraulio</groupId>
        <artifactId>ros_gh</artifactId>
        <version>{version}</version>
    </dependency>
</dependencies>
<!-- for ros_gh -->
<repositories>
	<repository>
	    <id>jitpack.io</id>
	    <url>https://jitpack.io</url>
	</repository>
</repositories>

gradle

dependencies {
        implementation 'com.elbraulio:ros_gh:{version}'
}
allprojects {
	repositories {
		...
		maven { url 'https://jitpack.io' }
	}
}

Collect data from Github and Ros Answers

ros_gh provides tools for fetching data from GitHub and ROS Answers. You can get them separately.

Github

Here we use jcabi-github to get info from Github and CanRequest to handle Github's API rate limit. Here are the steps for collecting data from Github:

  1. get a token from your Github account.
  2. choose a distribution file to extract information from its package repositories.
  3. use the script below to collect the data. In this example we fetch data from indigo distribution. This repository includes these files as Json files, all of them were from this repo and can be used.
@Test
public void ghInfo() throws InterruptedException, IOException {
    final String token = "secret_token";
    final String path = "src/test/java/resources/github/indigo.json";
    final Github github = new RtGithub(token);
    final CanRequest canRequest = new CanRequest(60);
    for (RosPackage rosPackage : new FromJsonFile(path).repoList()) {
        if (!rosPackage.source().isEmpty()) {
            final GhRepo ghRepo = rosPackage.asRepo(github);
            canRequest.waitForRate();
            final GhUser ghUser = new FetchGhUser(
                github, ghRepo.owner()
            ).ghUser();
            canRequest.waitForRate();
            final List<GhColaborator> colaborators = new Colaborators(
                ghRepo.fullName(), canRequest, github
            ).colaboratorList();
            System.out.println("repo: " + ghRepo.name());
            System.out.println("owner: " + ghUser.login());
            System.out.println("colaborator: " + colaborators.size());
            System.out.println("-----------------------------------");
        }
    }
}

Ros Answers

Here we use jsoup as scraper. For getting all the user info, including questions and answers, first you might want to get all user profiles then all questions.

User profiles

@Test
public void rosUserProfile() throws IOException {
    final String url = "https://answers.ros.org";
    final Document usersPage = Jsoup.connect(url + "/users/").get();
    final int initialPage = 1;
    final int lastPage = new LastRosUserPage(usersPage).value();
    final Iterator<String> usersLinks = new IteratePagedContent<>(
        new IterateDomPages(
            new RosUserPagedDom(),
            initialPage,
            lastPage,
            new IterateByUserLinks()
        )
    );
    while (usersLinks.hasNext()) {
        final String userLink = usersLinks.next();
        System.out.println(
            new RosDomUser(Jsoup.connect(root + userLink).get())
        );
    }
}

Questions, answers and comments

ROS Answers is supported by askbot, so it has an API that can be used to read question's content but it doesn't provide any information about answers content. Therefore we also use scraper that read DOM pages to get information about answers.

@Test
public void rosQuestions() throws IOException {
    final Iterator<JsonArray> iterable = new IterateApiQuestionPage();
    while (iterable.hasNext()) {
        final JsonArray questionArray = iterable.next();
        for (int i = 0; i < questionArray.size(); i++) {
            final ApiRosQuestion questionApi = new DefaultApiRosQuestion(
                questionArray.getJsonObject(i)
            );
            final RosDomQuestion questionDom = new DefaultRosDomQuestion(
                questionApi.id()
            );
            System.out.println("From API (title): " + questionApi.title());
            System.out.println("From API (url): " + questionApi.url());
            System.out.println("From DOM (votes): " + questionDom.votes());
        }
    }
}

Implementing your own Algorithm

Access to data

working on it ...

Extending the base class

We provide some useful tools for researches like pre made sql queries or basic health checks. The only thing you have to do is to extend some Abstract classes. For example, here we have a pseudo-implementation of a recommendation algorithm DevRec proposed by Zhang et al in this publication.

class Devrec extends AbstractAlgorithm {
    // Devrec initialization ...
    /**
    * Here we execute the algorithm and get the results. 
	  */
    @Override
    protected List<Aspirant> feed(Question question) {
        List<Aspirant> aspirants = new LinkedList();
        // use DB to get all users
        for(User user : DB.getAllUsers()) {
            final Topic topic = question.topic();
            // calculate KA from Tuu for a specific topic
            Number ka = new Ka(this.topicsRelation, topic);
            // get the project related to the question's topic
            final Project project = this.topicProjectRelation.get(topic);
            // calculate DA from a specific project
            Number da = new Da(this.projectsRelation, project);
            aspirants.add(new DevrecAspirant(ka.double(), da.double(), user));
        }
        // return all users without and specific order 😮
        return aspirants;
    }
}

You might be wandering about why the aspirants are returned unordered … Well it is for relieve you to do that. You only need to call devrec.aspirants() and you will get aspirants sorted by its rank. How does it work? It’s easy, DevrecAspirantwas implemented from another interface Aspirant, see this code:

class DevrecAspirant implements Aspirant {

    // useful and important things ...

    /**
    * easy 😎 ...
    */
    @Override
    public double rank() {
        return this.ka * 0.75 + this.da * 0.25;
    }
}

Now you see how we sort your data before we give it back to you. All of these abstract classes and interfaces that you will extend or implement will help you to focus on the only important thing to you: the Algorithm’s implementation 🤓.

Health Checks

working on it ...

Resolve

Working on it …

ros_gh's People

Contributors

dependabot[bot] avatar elbraulio avatar pestefo avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

pestefo

ros_gh's Issues

move Devrec example into artifact domain

currently Devrec is out of artifact domain com.elbraulio.rosgh. That is confusing when you import this as

import examples.Devrec

when it should be

import com.elbraulio.rosgh.example.Devrec

Is it possible execute the scrapper to get an updated sample and also more data from the ROS User (e.g. karma, last_seemt_at, etc.) ?

I need more data from the ROS Answers' user, specifically:

  1. karma

  2. joined_at

  3. last_seen_at

  4. location

  5. has_avatar (or the url to the avatar and NULL if it has the default)

  6. description

  7. real name (if it exist)

  8. age

  9. badges (and its count)

I'm particularly interested in the first 4, so if it's it require more effort to get the rest of the list I'd already be happy with havine just those four (karma, joined_at, last_seen_at, location).

Some tag names in 'ros_tag' table are cut

There are 1277 tags that looks cut, e.g. :

id name
5 turtlebot_dash...
9 turtlebot_cali...
206 message_genera...
263 installation_e...
288 camera_calibra...
305 sicktoolbox_wr...
306 xv_11_laser_dr...
348 trajectory_fil...

You can get the complete list with this query:

select *
from ros_tag	
where name like '%...'

wrong tag counting in devrec

the result of querying how many times a given tag is related to all users has different results if it is queried for each user using FetchTagCount compared with using a single query:

select sum(count) as count
from ros_user_tag
where ros_user_tag.ros_user_id in (select ros_user_id from linked_users) and
      ros_user_tag.ros_tag_id = ?;

release 0.1-beta.1

fix bugs and devrec implementation.

  • create issue release.
  • create new branch
  • update pom version.
  • make a pull request.
  • close milestone when pr is acepted.
  • make a release on github with pom version.
  • check Travis and JitPack build results.

Logs for data extraction

Like builds, all we want to know how the extract process made success or failed. It is important to have logs in order to identify errors and possible missing data cases.

devrec implementations as example

we want to replicate Devrec described in this paper. It will be implemented on the examples branch using tools committed on master branch. There is an important difference between Zhang et al. implementation and ours, it is that we are looking for someone to answers a question instead of participating on a project.

All the following quotes were extracted from the original paper.

Data extraction

  • we use this data already extracted with ros_gh and available here.

Developer Recommendation Based on Social Coding Activities

  • UP Connector: This part is to create the association matrix of users and projects based on the activities in GitHub. Here we get a two-value matrix Ru−p, where 1 stands for participation and 0 stands for the opposite.

  • User Connector: This part is to calculate the association between users based on the user project association matrix using Jaccard algorithm.

  • Match Engine: In this part, we calculate the association between users and projects according to the user association matrix Ru−u. If we use UAp⟨u1,u2,...,un⟩ to represent users that have already participated in the target project p, we can obtain the match score of each user towards project p using:

captura de pantalla 2018-10-27 a la s 13 51 44

Developer Recommendation Based on Knowledge Sharing Activities

  • Relation Creator: In this part, we calculate the user tag association matrix. Here we use TF-IDF method. If we use U{u1,u2,...,un} to represent users in StackOverflow, Tu = {t1,t2,...,tn} to represent the tags that related to user u, and C(t,u) to represent the number of times tag t relates to user u. Then we can calculate user tag association matrix using

captura de pantalla 2018-10-29 a la s 18 23 32

  • User Connector: After obtaining the user tag association matrix Ru−t, we calculate the association of users using Vector Space Similarity algorithm.

  • Match Engine: The same as the match engine part in DA-based approach.

RuuKA math error

the correct Math is

    private double vectorSpace(int i, int j, double[][] rut) {
        double sum = 0d;
        double length = 0d;
        for (int k = 0; k < rut[i].length; k++) {
            sum += rut[i][k] * rut[j][k];
            length += rut[i][k];
        }
        return sum/length;
    }

examples must be excluded from the project scope

it is not suitable to have examples within the project because they have not have unit test and increments the project complexity. These tools must be provided by the project but without containing examples inside ignored test. This packages must be excluded from the main project and can be included on a examples branch:

  • launcher

  • GithubInfoTest

  • ignored test from FetchUsersPageListTest

  • FetchAnswersTest

  • IteratePagedContentTest

  • ParticipantsTest

device rank

with small doubles the rank returns NaN, so it should be replaced as 0.

Double.isNaN(rank) ? 0d : rank;

also Aspirant is returning wrong ranking, must be

this.ka*0.75 + this.da*0.25;

maven example on Readme is wrong

it should be

<dependencies>
    <dependency>
        <groupId>com.elbraulio</groupId>
        <artifactId>ros_gh</artifactId>
        <version>{version}</version>
    </dependency>
</dependencies>

<repositories>
	<repository>
	    <id>jitpack.io</id>
	    <url>https://jitpack.io</url>
	</repository>
</repositories>

accuracy light

check the first aspirant who math at least the half of tags from question.

Also rename DefaultAccuracy to StricAcuracy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.