Coder Social home page Coder Social logo

kowainik / issue-wanted Goto Github PK

View Code? Open in Web Editor NEW
59.0 12.0 11.0 136 KB

๐Ÿท Web application to help beginners to start contributing into Haskell projects

Home Page: https://kowainik.github.io/posts/gsoc2019

License: Mozilla Public License 2.0

Haskell 82.18% Makefile 0.37% TSQL 0.14% Dockerfile 0.11% PLpgSQL 1.95% JavaScript 5.01% HTML 0.81% Elm 9.14% CSS 0.29%
haskell github-issues three-layer-architecture web-application backend gsoc-2019

issue-wanted's People

Contributors

chshersh avatar rashadg1030 avatar sshine avatar vrom911 avatar willbasky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

issue-wanted's Issues

Implement SQL join statements

I was thinking because we have join tables we should implement some join statements in the sql/join.sql file. Is this a good idea? I'm currently doing some research on SQL joins.

Remove URL field from Issue schema

I noticed something important:

CREATE TABLE IF NOT EXISTS repos
( id         SERIAL PRIMARY KEY  
, owner      TEXT   NOT NULL       
, name       TEXT   NOT NULL
, descr      TEXT   NOT NULL
, categories TEXT   ARRAY
);

CREATE TABLE IF NOT EXISTS issues
( id         SERIAL PRIMARY KEY 
, number     INT    NOT NULL
, title      TEXT   NOT NULL
, body       TEXT   NOT NULL
, repo_owner TEXT   NOT NULL
, repo_name  TEXT   NOT NULL
, url        TEXT   NOT NULL
, labels     TEXT   ARRAY
);

For repos table we don't have an url field because we agreed we could construct the URL on the frontend with the owner and name.

The issues table does have an url field but we could remove it like repos.

-- | Data type representing a GitHub issue.
data Issue = Issue
    { issueId        :: Id Issue
    , issueNumber    :: Int
    , issueTitle     :: Text
    , issueBody      :: Text
    , issueRepoOwner :: RepoOwner
    , issueRepoName  :: RepoName
    , issueUrl       :: Text
    , issueLabels    :: SqlArray Text
    } deriving stock (Generic, Show, Eq)
      deriving anyclass (ToJSON, FromRow, ToRow)
-- | Data type representing a GitHub repository
data Repo = Repo 
    { repoId         :: Id Repo
    , repoOwner      :: RepoOwner
    , repoName       :: RepoName
    , repoDescr      :: Text
    , repoCategories :: SqlArray Text
    } deriving stock (Generic, Show, Eq)
      deriving anyclass (ToJSON, FromRow, ToRow)

Notice the Repo type doesn't have url either. Which one is better? Construct the URL on the frontend, backend, or store it in the database?

Choice of web-framework

There are several web-frameworks to choose from.

I have not tested web-frameworks for two years, but the ones I came to like back then were:

  • snap for being quite modular and not too magical.
  • scotty for being minimalist.
  • servant for being type-driven.

The drawbacks I can think of are: snap may require heavier lock-in / investment in the framework (writing snaplets when necessary), scotty may require us to learn web-framework organization the hard way if the site grows beyond its original intent, and servant would, I think, require a web-app-like front-end and give less of a website feel.

Maybe this issue hasn't been posted because there is an implicit answer already?

Decide on the Haskell database library

Possible candidates:

  • postgresql-simple
  • squeal
  • opaleye
  • esqueleto

@rashadg1030 needs to make the decision with which of the library he prefers to work. But we can share our thoughts on the libraries and discuss the options.

Change name of owner column in issues table

CREATE TABLE IF NOT EXISTS issues
( id        SERIAL PRIMARY KEY 
, number    INT    NOT NULL
, title     TEXT   NOT NULL
, body      TEXT
, url       TEXT   NOT NULL
, owner     TEXT   NOT NULL
, repo_name TEXT   NOT NULL
, labels    TEXT   ARRAY
);

I was wondering if we should change the name of the owner column to repo_owner?

Upgrade to GHC-8.6.5

  • Upgrade cabal file (base dependency, use cabal-version: 2.4, use common stanzas)
  • Upgrade stack.yaml (bump up lts, libraries versions)
  • Update .travis.yml file

As a reference, you can see changes in the three-layer repository.

Prototypes or wireframes for frontend

This issue is in reference for the creation of prototypes or wireframes for the frontend. It would aid in streamlining the backend as per project's requirements. Do these wireframes exist or are in the process of development? If not then would it be desirable if I start implementing them with the guidance of respected contibutors?

Preferred issue update channels - webhooks?

Hi all, I'm looking at trying this for Google Summer of Code. Currently in the fetch functions in IssueWanted.Search, we can poll Github's API for issues regularly. Is this the permanent plan for keeping the cache up to date?

I ask because I've been prodding around the github library, and was wondering as to the value of using webhooks to subscribe to particularly active repos (e.g, stack) to give more live updates on issues with valuable tags.

@chshersh Do you have any specific recommendations on how we want to keep items synced?

Frontend stack decision

We know for sure that we'd like to use some functional language for frontend. What suits best for our needs: PureScript, Elm, other? Pros and cons?

Graphql integration and github apis: V3 or V4?

Which version of Github API are we going to use?
Github provides access both in the form of REST(v3) and GraphQl(v4). Considering the amount of data we need to scrap for issues, cabal files and keeping our database synced with repos would it be beneficial to integrate graphql in backend as well as frontend? A single graphql call will replace multiple REST calls thus reducing latency with database as well as frontend and keep the rate limits imposed by github under check.

Admin page discussion

Maybe we should have admin page. Let's discuss what can be there. I can think of the following:

  • Force sync of repository
  • Force sync of user
  • Blacklist user (if spammer)
  • Blacklist repository (fake repo to abuse achievement system)
  • Add/Remove/Edit tags

Add async worker for populating DB

I was wondering if it is a good time to work on the async worker that will populate our DB? Once we get a better idea of the data we need I guess. I've implemented an endpoint for our server that touches the DB and everything, so I think this would be a good next step. And one thing about file structure. We have a file that holds the GitHub query functions on the path src/IW/Server/Search.hs. Is it fine if it goes into the Server folder like it is now, or should it go to another folder called Async or Worker. For example, it could be src/IW/Worker/Search.hs. I'm not sure because technically Search.hs does function on the server, but I think files in src/IW/Server should be related to the issue-wanted API.

Update the schema for more efficiency and simplicity

I did some research on how to structure the database for our use case better. We need to do some adjustments to make it work fast. At the same time, it will make our lives easier.

  1. Foreign key in the issues table should be not for id but for repo.name. It's completely okay to have textual foreign keys. Just add an INDEX for repos.name column (later we can see what columns we are using so we can add more indexes for performance).
  2. Remove tables categories, labels, repos_categories and issues_labels. Instead, we will store them as a separate column using PostgreSQL arrays. This approach is much much faster than joining tables. And such arrays support all operations that we need for filtering. This also means that updating labels is now a simple task: just write new labels in place for that array.

Database decision

We should decide which Database we want to use for issue-wanted. We don't need something strong and secure but we also don't want ๐Ÿ’ฉ

Possible candidates:

  • PostgreSQL
  • SQLite
  • RocksDB
  • acid-state
  • Raw files

Something else?

What we need to store (approximation of our database scheme):

  • List of issues for beginners for every Haskell GitHub repository
  • GitHub issue metadata (name, tags?, creation date, something else (to sort issues))
  • Categories for every project
  • Users metadata (GitHub nickname, achievements)

Looks like some SQL DB is the solution to go... But in that case we also need to choose library...

Bring `three-layer` architecture to issue-wanted

See the three-layer repository for example.

Specifically, we need to bring the following things:

  • Our custom monad with the environment (see Lib/App directory)
  • Error data types with helper functions (see Lib/App/Error.hs module)
  • Effects.Log module with logging
  • TOML configuration
  • Makefile for running commands smoothly

Document Postgres setup

Doing stack build I get the following error:

--  While building package postgresql-libpq-0.9.4.2 using:
      /tmp/stack-9b8112fa643e992a/postgresql-libpq-0.9.4.2/.stack-work/dist/x86_64-linux/Cabal-2.4.0.1/setup/setup --builddir=.stack-work/dist/x86_64-linux/Cabal-2.4.0.1 configure --with-ghc=/home/simon/.stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc --with-ghc-pkg=/home/simon/.stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg --user --package-db=clear --package-db=global --package-db=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/pkgdb --libdir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/lib --bindir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/bin --datadir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/share --libexecdir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/libexec --sysconfdir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/etc --docdir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/doc/postgresql-libpq-0.9.4.2 --htmldir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/doc/postgresql-libpq-0.9.4.2 --haddockdir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/doc/postgresql-libpq-0.9.4.2 --dependency=Cabal=Cabal-2.4.1.0-9MZFDeNrcJI10bcroa6pq8 --dependency=base=base-4.12.0.0 --dependency=bytestring=bytestring-0.10.8.2 --dependency=unix=unix-2.7.2.2
    Process exited with code: ExitFailure 1

I spent a little time figuring out that on Ubuntu I need to apt install libpq-dev to compile this.

For running Postgres, I need to

$ sudo apt install postgresql postgresql-contrib
$ sudo service postgres start
$ sudo -u postgres psql
postgres=# create database "issue-wanted";
postgres=# create user simon;
postgres=# grant all privileges on database "issue-wanted" to simon;

I modified pg_hba.conf with the lines

local all simon          trust
host all simon 0.0.0.0/0 trust

(This is a little unsafe, I realize, but for some reason 127.0.0.1/8 didn't cut it.)

and changed user=simon in config.toml and added listen_address = '127.0.0.1' in /etc/postgresql/10/main/postgresql.conf.

I then restarted Postgres and initialized the database manually:

$ sudo service postgres restart
$ psql issue-wanted < sql/schema.sql
$ psql issue-wanted < sql/seed.sql
$ stack exec issue-wanted

At this point the /issues endpoint is responding positively!

Perhaps we should document some of this in README.md?

Discuss users UX

I would like to have an ability for users to see their achievements depending on their open-source contribution. This is the biggest motivation for people to do something. So we should have an ability to login into our application through GitHub (in perfect case).

Also, I'm not sure that we should synchronise contributors past... Let's track everything starting from server start.

So I propose the following sync scheme:

  • Sync all activity for user starting from server start and calculate all achivements
  • Sync all achievements for users no often than in 4 hours (?) (or smarter strategy, depending on user activity?...)
  • If user doesn't do anything for 30 days, stop syncing him (to not waste our resources)
  • Have force sync button in API

We should also assign points to every achievement and have ranking table...

Setup testing suite

I would like to work on setting up the testing suite before I start making any serious changes to the code. Should I refer to the three-layer example tests? If so, will we need hedgehog for testing or is hspec enough?

Issue-wanted endpoint URLs

What should the URLs for issue-wanted look like? So far, we have one endpoint:
~/issues/:issueId

Some other ones I think we need:

~/issues/         -- returns all issues
~/issues/:label   -- returns all issues with the given label 
~/repos/          -- returns all repos
~/repos/:repoId   -- returns a repo with the Id
~/repos/:category -- returns all repos with the given category

Any more suggestions?

Update the README file

I propose to add a better description in the README using the abstract of my GSoC proposal. Is this a good idea?

Describe SQL schema for storing data

What needs to be done here:

  • Schema should be written in raw SQL files in the sql/ directory in the project root. This directory should contain two files now: schema.sql with the schema and drop.sql with removing the schema (useful for testing)
  • IW.Db.Schema module with helper functions (see three-layer for the example)

Add `Dockerfile`

After adding the Makefile I realized that the project uses docker to run the PostgreSQL database. Is the Dockerfile going to be the same as the one in the three-layer repo?

Describe all API endpoints we need from GitHub

Basically we need:

  • List of all Haskell repositories
curl -H 'Accept: application/vnd.github.preview.text-match+json' https://api.github.com/search/repositories?q=language:haskell&order=desc
  • For each repository: list of issues by given label (help wanted, good first issue)
    • TODO: insert Rest API method here
  • For each user: each PR for this user
    • TODO: insert Rest API method here
  • For each PR: which issue it closes
    • TODO: insert Rest API method here

GitHub API Query Functions

I've made a fork of the project and have been testing the GitHub API query functions located in issue-wanted/src/IssueWanted/Search.hs. None of the functions return errors which is great, but I'm not sure if they are returning the right results. For example, the function fetchHaskellReposGFI, which returns all Haskell repositories with "good-first-issue" labels, gives a result count of 124. The function fetchGoodFirstIssue, which is supposed to return all issues with Haskell language and label "good-first-issue", only gives a result count of 8. This seems odd to me, but I'm not sure. There may be a problem with the query strings passed into searchRepos or searchIssues, but I can't be sure until I look more into the GitHub API documentation. I just wanted to get someone else's opinion on this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.