kowainik / issue-wanted Goto Github PK
View Code? Open in Web Editor NEW๐ท Web application to help beginners to start contributing into Haskell projects
Home Page: https://kowainik.github.io/posts/gsoc2019
License: Mozilla Public License 2.0
๐ท Web application to help beginners to start contributing into Haskell projects
Home Page: https://kowainik.github.io/posts/gsoc2019
License: Mozilla Public License 2.0
I was thinking because we have join tables we should implement some join statements in the sql/join.sql
file. Is this a good idea? I'm currently doing some research on SQL joins.
We want to store some information in our database regarding all Haskell repositories. We don't need to store everything about repo. Only interesting part. Schema of repo table should be created using squeal
library.
I noticed something important:
CREATE TABLE IF NOT EXISTS repos
( id SERIAL PRIMARY KEY
, owner TEXT NOT NULL
, name TEXT NOT NULL
, descr TEXT NOT NULL
, categories TEXT ARRAY
);
CREATE TABLE IF NOT EXISTS issues
( id SERIAL PRIMARY KEY
, number INT NOT NULL
, title TEXT NOT NULL
, body TEXT NOT NULL
, repo_owner TEXT NOT NULL
, repo_name TEXT NOT NULL
, url TEXT NOT NULL
, labels TEXT ARRAY
);
For repos
table we don't have an url
field because we agreed we could construct the URL on the frontend with the owner
and name
.
The issues
table does have an url
field but we could remove it like repos
.
-- | Data type representing a GitHub issue.
data Issue = Issue
{ issueId :: Id Issue
, issueNumber :: Int
, issueTitle :: Text
, issueBody :: Text
, issueRepoOwner :: RepoOwner
, issueRepoName :: RepoName
, issueUrl :: Text
, issueLabels :: SqlArray Text
} deriving stock (Generic, Show, Eq)
deriving anyclass (ToJSON, FromRow, ToRow)
-- | Data type representing a GitHub repository
data Repo = Repo
{ repoId :: Id Repo
, repoOwner :: RepoOwner
, repoName :: RepoName
, repoDescr :: Text
, repoCategories :: SqlArray Text
} deriving stock (Generic, Show, Eq)
deriving anyclass (ToJSON, FromRow, ToRow)
Notice the Repo
type doesn't have url
either. Which one is better? Construct the URL on the frontend, backend, or store it in the database?
We need a roundtrip property test to ensure the ToRow
and FromRow
instances of Repo
are correct.
There are several web-frameworks to choose from.
I have not tested web-frameworks for two years, but the ones I came to like back then were:
snap
for being quite modular and not too magical.scotty
for being minimalist.servant
for being type-driven.The drawbacks I can think of are: snap
may require heavier lock-in / investment in the framework (writing snaplets when necessary), scotty
may require us to learn web-framework organization the hard way if the site grows beyond its original intent, and servant
would, I think, require a web-app-like front-end and give less of a website feel.
Maybe this issue hasn't been posted because there is an implicit answer already?
Possible candidates:
postgresql-simple
squeal
opaleye
esqueleto
@rashadg1030 needs to make the decision with which of the library he prefers to work. But we can share our thoughts on the libraries and discuss the options.
CREATE TABLE IF NOT EXISTS issues
( id SERIAL PRIMARY KEY
, number INT NOT NULL
, title TEXT NOT NULL
, body TEXT
, url TEXT NOT NULL
, owner TEXT NOT NULL
, repo_name TEXT NOT NULL
, labels TEXT ARRAY
);
I was wondering if we should change the name of the owner
column to repo_owner
?
How will this work?
base
dependency, use cabal-version: 2.4
, use common stanzas)stack.yaml
(bump up lts, libraries versions).travis.yml
fileAs a reference, you can see changes in the three-layer
repository.
This issue is in reference for the creation of prototypes or wireframes for the frontend. It would aid in streamlining the backend as per project's requirements. Do these wireframes exist or are in the process of development? If not then would it be desirable if I start implementing them with the guidance of respected contibutors?
Hi all, I'm looking at trying this for Google Summer of Code. Currently in the fetch
functions in IssueWanted.Search
, we can poll Github's API for issues regularly. Is this the permanent plan for keeping the cache up to date?
I ask because I've been prodding around the github
library, and was wondering as to the value of using webhooks to subscribe to particularly active repos (e.g, stack
) to give more live updates on issues with valuable tags.
@chshersh Do you have any specific recommendations on how we want to keep items synced?
After #8 is done
We know for sure that we'd like to use some functional language for frontend. What suits best for our needs: PureScript, Elm, other? Pros and cons?
Which version of Github API are we going to use?
Github provides access both in the form of REST(v3) and GraphQl(v4). Considering the amount of data we need to scrap for issues, cabal files and keeping our database synced with repos would it be beneficial to integrate graphql in backend as well as frontend? A single graphql call will replace multiple REST calls thus reducing latency with database as well as frontend and keep the rate limits imposed by github under check.
The issueRepoOwner
and issueRepoName
fields in the Issue
type need to be changed to their corresponding newtypes.
Maybe we should have admin
page. Let's discuss what can be there. I can think of the following:
I will describe later what functions to add.
I was wondering if it is a good time to work on the async worker that will populate our DB? Once we get a better idea of the data we need I guess. I've implemented an endpoint for our server that touches the DB and everything, so I think this would be a good next step. And one thing about file structure. We have a file that holds the GitHub query functions on the path src/IW/Server/Search.hs
. Is it fine if it goes into the Server
folder like it is now, or should it go to another folder called Async
or Worker
. For example, it could be src/IW/Worker/Search.hs
. I'm not sure because technically Search.hs
does function on the server, but I think files in src/IW/Server
should be related to the issue-wanted
API.
I did some research on how to structure the database for our use case better. We need to do some adjustments to make it work fast. At the same time, it will make our lives easier.
issues
table should be not for id but for repo.name
. It's completely okay to have textual foreign keys. Just add an INDEX for repos.name
column (later we can see what columns we are using so we can add more indexes for performance).
categories
, labels
, repos_categories
and issues_labels
. Instead, we will store them as a separate column using PostgreSQL arrays. This approach is much much faster than joining tables. And such arrays support all operations that we need for filtering. This also means that updating labels is now a simple task: just write new labels in place for that array.We should decide which Database we want to use for issue-wanted
. We don't need something strong and secure but we also don't want ๐ฉ
Possible candidates:
acid-state
Something else?
What we need to store (approximation of our database scheme):
Looks like some SQL DB is the solution to go... But in that case we also need to choose library...
See the three-layer
repository for example.
Specifically, we need to bring the following things:
Lib/App
directory)Lib/App/Error.hs
module)Effects.Log
module with loggingDoing stack build
I get the following error:
-- While building package postgresql-libpq-0.9.4.2 using:
/tmp/stack-9b8112fa643e992a/postgresql-libpq-0.9.4.2/.stack-work/dist/x86_64-linux/Cabal-2.4.0.1/setup/setup --builddir=.stack-work/dist/x86_64-linux/Cabal-2.4.0.1 configure --with-ghc=/home/simon/.stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc --with-ghc-pkg=/home/simon/.stack/programs/x86_64-linux/ghc-8.6.5/bin/ghc-pkg --user --package-db=clear --package-db=global --package-db=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/pkgdb --libdir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/lib --bindir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/bin --datadir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/share --libexecdir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/libexec --sysconfdir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/etc --docdir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/doc/postgresql-libpq-0.9.4.2 --htmldir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/doc/postgresql-libpq-0.9.4.2 --haddockdir=/home/simon/.stack/snapshots/x86_64-linux/lts-13.26/8.6.5/doc/postgresql-libpq-0.9.4.2 --dependency=Cabal=Cabal-2.4.1.0-9MZFDeNrcJI10bcroa6pq8 --dependency=base=base-4.12.0.0 --dependency=bytestring=bytestring-0.10.8.2 --dependency=unix=unix-2.7.2.2
Process exited with code: ExitFailure 1
I spent a little time figuring out that on Ubuntu I need to apt install libpq-dev
to compile this.
For running Postgres, I need to
$ sudo apt install postgresql postgresql-contrib
$ sudo service postgres start
$ sudo -u postgres psql
postgres=# create database "issue-wanted";
postgres=# create user simon;
postgres=# grant all privileges on database "issue-wanted" to simon;
I modified pg_hba.conf
with the lines
local all simon trust
host all simon 0.0.0.0/0 trust
(This is a little unsafe, I realize, but for some reason 127.0.0.1/8
didn't cut it.)
and changed user=simon
in config.toml and added listen_address = '127.0.0.1'
in /etc/postgresql/10/main/postgresql.conf
.
I then restarted Postgres and initialized the database manually:
$ sudo service postgres restart
$ psql issue-wanted < sql/schema.sql
$ psql issue-wanted < sql/seed.sql
$ stack exec issue-wanted
At this point the /issues
endpoint is responding positively!
Perhaps we should document some of this in README.md
?
I would like to have an ability for users to see their achievements depending on their open-source contribution. This is the biggest motivation for people to do something. So we should have an ability to login into our application through GitHub (in perfect case).
Also, I'm not sure that we should synchronise contributors past... Let's track everything starting from server start.
So I propose the following sync scheme:
We should also assign points to every achievement and have ranking table...
I would like to work on setting up the testing suite before I start making any serious changes to the code. Should I refer to the three-layer
example tests? If so, will we need hedgehog
for testing or is hspec
enough?
These columns should exist only inside the database. They should set to NOW()
during creation. And updated_at
should be updated automatically when we update the row. These columns will be useful later when we want to perform cleanup of our database.
We can use the GitHub API to retrieve the contents of a repo's Cabal file. For example pandoc: https://api.github.com/repos/jgm/pandoc/contents/pandoc.cabal
We can then use
https://www.haskell.org/cabal/release/cabal-latest/doc/API/Cabal/Distribution-PackageDescription-Parsec.html
to parse the file once we retrieve it.
Anything I'm missing anything or suggestions??
What should the URLs for issue-wanted
look like? So far, we have one endpoint:
~/issues/:issueId
Some other ones I think we need:
~/issues/ -- returns all issues
~/issues/:label -- returns all issues with the given label
~/repos/ -- returns all repos
~/repos/:repoId -- returns a repo with the Id
~/repos/:category -- returns all repos with the given category
Any more suggestions?
I propose to add a better description in the README using the abstract of my GSoC proposal. Is this a good idea?
All modules currently have IssueWanted
prefix in the library. Let's choose shorter prefix until it's not to late ๐
We will need to add functions that convert the github
library types like Issue
and Repo
to our own types.
We're going to use github
package for GitHub bindings. This function might be useful:
I propose to put this function under IssueWanted.Search
module.
What needs to be done here:
sql/
directory in the project root. This directory should contain two files now: schema.sql
with the schema and drop.sql
with removing the schema (useful for testing)IW.Db.Schema
module with helper functions (see three-layer
for the example)Refers to #33
After adding the Makefile
I realized that the project uses docker to run the PostgreSQL database. Is the Dockerfile
going to be the same as the one in the three-layer
repo?
Basically we need:
curl -H 'Accept: application/vnd.github.preview.text-match+json' https://api.github.com/search/repositories?q=language:haskell&order=desc
help wanted
, good first issue
)
We need unit tests to make sure our SQL query statements work correctly.
I've made a fork of the project and have been testing the GitHub API query functions located in issue-wanted/src/IssueWanted/Search.hs. None of the functions return errors which is great, but I'm not sure if they are returning the right results. For example, the function fetchHaskellReposGFI
, which returns all Haskell repositories with "good-first-issue" labels, gives a result count of 124. The function fetchGoodFirstIssue
, which is supposed to return all issues with Haskell language and label "good-first-issue", only gives a result count of 8. This seems odd to me, but I'm not sure. There may be a problem with the query strings passed into searchRepos
or searchIssues
, but I can't be sure until I look more into the GitHub API documentation. I just wanted to get someone else's opinion on this issue.
Start setting up the file structure for the async worker code once we've figured out the database.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.