ros_gh

ros_gh is a tool that apply algorithms to recommend users to answers a given question . To achieve it there are three steps: collect data from github and ROS Answers, match identities and apply algorithms.

How to use

maven

<dependencies>
    <dependency>
        <groupId>com.elbraulio</groupId>
        <artifactId>ros_gh</artifactId>
        <version>{version}</version>
    </dependency>
</dependencies>
<!-- for ros_gh -->
<repositories>
	<repository>
	    <id>jitpack.io</id>
	    <url>https://jitpack.io</url>
	</repository>
</repositories>

gradle

dependencies {
        implementation 'com.elbraulio:ros_gh:{version}'
}
allprojects {
	repositories {
		...
		maven { url 'https://jitpack.io' }
	}
}

Collect data from Github and Ros Answers

ros_gh provides tools for fetching data from GitHub and ROS Answers. You can get them separately.

Github

Here we use jcabi-github to get info from Github and CanRequest to handle Github's API rate limit. Here are the steps for collecting data from Github:

get a token from your Github account.
choose a distribution file to extract information from its package repositories.
use the script below to collect the data. In this example we fetch data from indigo distribution. This repository includes these files as Json files, all of them were from this repo and can be used.

@Test
public void ghInfo() throws InterruptedException, IOException {
    final String token = "secret_token";
    final String path = "src/test/java/resources/github/indigo.json";
    final Github github = new RtGithub(token);
    final CanRequest canRequest = new CanRequest(60);
    for (RosPackage rosPackage : new FromJsonFile(path).repoList()) {
        if (!rosPackage.source().isEmpty()) {
            final GhRepo ghRepo = rosPackage.asRepo(github);
            canRequest.waitForRate();
            final GhUser ghUser = new FetchGhUser(
                github, ghRepo.owner()
            ).ghUser();
            canRequest.waitForRate();
            final List<GhColaborator> colaborators = new Colaborators(
                ghRepo.fullName(), canRequest, github
            ).colaboratorList();
            System.out.println("repo: " + ghRepo.name());
            System.out.println("owner: " + ghUser.login());
            System.out.println("colaborator: " + colaborators.size());
            System.out.println("-----------------------------------");
        }
    }
}

Ros Answers

Here we use jsoup as scraper. For getting all the user info, including questions and answers, first you might want to get all user profiles then all questions.

User profiles

@Test
public void rosUserProfile() throws IOException {
    final String url = "https://answers.ros.org";
    final Document usersPage = Jsoup.connect(url + "/users/").get();
    final int initialPage = 1;
    final int lastPage = new LastRosUserPage(usersPage).value();
    final Iterator<String> usersLinks = new IteratePagedContent<>(
        new IterateDomPages(
            new RosUserPagedDom(),
            initialPage,
            lastPage,
            new IterateByUserLinks()
        )
    );
    while (usersLinks.hasNext()) {
        final String userLink = usersLinks.next();
        System.out.println(
            new RosDomUser(Jsoup.connect(root + userLink).get())
        );
    }
}

Questions, answers and comments

ROS Answers is supported by askbot, so it has an API that can be used to read question's content but it doesn't provide any information about answers content. Therefore we also use scraper that read DOM pages to get information about answers.

@Test
public void rosQuestions() throws IOException {
    final Iterator<JsonArray> iterable = new IterateApiQuestionPage();
    while (iterable.hasNext()) {
        final JsonArray questionArray = iterable.next();
        for (int i = 0; i < questionArray.size(); i++) {
            final ApiRosQuestion questionApi = new DefaultApiRosQuestion(
                questionArray.getJsonObject(i)
            );
            final RosDomQuestion questionDom = new DefaultRosDomQuestion(
                questionApi.id()
            );
            System.out.println("From API (title): " + questionApi.title());
            System.out.println("From API (url): " + questionApi.url());
            System.out.println("From DOM (votes): " + questionDom.votes());
        }
    }
}

Implementing your own Algorithm

Access to data

working on it ...

Extending the base class

We provide some useful tools for researches like pre made sql queries or basic health checks. The only thing you have to do is to extend some Abstract classes. For example, here we have a pseudo-implementation of a recommendation algorithm DevRec proposed by Zhang et al in this publication.

class Devrec extends AbstractAlgorithm {
    // Devrec initialization ...
    /**
    * Here we execute the algorithm and get the results. 
	  */
    @Override
    protected List<Aspirant> feed(Question question) {
        List<Aspirant> aspirants = new LinkedList();
        // use DB to get all users
        for(User user : DB.getAllUsers()) {
            final Topic topic = question.topic();
            // calculate KA from Tuu for a specific topic
            Number ka = new Ka(this.topicsRelation, topic);
            // get the project related to the question's topic
            final Project project = this.topicProjectRelation.get(topic);
            // calculate DA from a specific project
            Number da = new Da(this.projectsRelation, project);
            aspirants.add(new DevrecAspirant(ka.double(), da.double(), user));
        }
        // return all users without and specific order 😮
        return aspirants;
    }
}

You might be wandering about why the aspirants are returned unordered … Well it is for relieve you to do that. You only need to call devrec.aspirants() and you will get aspirants sorted by its rank. How does it work? It’s easy, DevrecAspirantwas implemented from another interface Aspirant, see this code:

class DevrecAspirant implements Aspirant {

    // useful and important things ...

    /**
    * easy 😎 ...
    */
    @Override
    public double rank() {
        return this.ka * 0.75 + this.da * 0.25;
    }
}

Now you see how we sort your data before we give it back to you. All of these abstract classes and interfaces that you will extend or implement will help you to focus on the only important thing to you: the Algorithm’s implementation 🤓.

Health Checks

working on it ...

Resolve

Working on it …

id	name
5	turtlebot_dash...
9	turtlebot_cali...
206	message_genera...
263	installation_e...
288	camera_calibra...
305	sicktoolbox_wr...
306	xv_11_laser_dr...
348	trajectory_fil...

devrec implementations as example

we want to replicate Devrec described in this paper. It will be implemented on the examples branch using tools committed on master branch. There is an important difference between Zhang et al. implementation and ours, it is that we are looking for someone to answers a question instead of participating on a project.

All the following quotes were extracted from the original paper.

Data extraction

we use this data already extracted with ros_gh and available here.

Developer Recommendation Based on Social Coding Activities

UP Connector: This part is to create the association matrix of users and projects based on the activities in GitHub. Here we get a two-value matrix Ru−p, where 1 stands for participation and 0 stands for the opposite.
User Connector: This part is to calculate the association between users based on the user project association matrix using Jaccard algorithm.
Match Engine: In this part, we calculate the association between users and projects according to the user association matrix Ru−u. If we use UAp⟨u1,u2,...,un⟩ to represent users that have already participated in the target project p, we can obtain the match score of each user towards project p using:

Developer Recommendation Based on Knowledge Sharing Activities

Relation Creator: In this part, we calculate the user tag association matrix. Here we use TF-IDF method. If we use U{u1,u2,...,un} to represent users in StackOverflow, Tu = {t1,t2,...,tn} to represent the tags that related to user u, and C(t,u) to represent the number of times tag t relates to user u. Then we can calculate user tag association matrix using

User Connector: After obtaining the user tag association matrix Ru−t, we calculate the association of users using Vector Space Similarity algorithm.
Match Engine: The same as the match engine part in DA-based approach.

elbraulio / ros_gh Goto Github PK

ros_gh's Introduction

ros_gh

How to use

maven

gradle

Collect data from Github and Ros Answers

Github

Ros Answers

User profiles

Questions, answers and comments

Implementing your own Algorithm

Access to data

Extending the base class

Health Checks

Resolve

ros_gh's People

Contributors

Watchers

Forkers

ros_gh's Issues

Data extraction

Developer Recommendation Based on Social Coding Activities

Developer Recommendation Based on Knowledge Sharing Activities

Recommend Projects

Recommend Topics

Recommend Org