open-next / losh-reporter Goto Github PK
View Code? Open in Web Editor NEWA locally executable script + Apache Jena + Queries to generate (mostly statistical) reports on LOSH data in MD + PDF & HTML exports
License: GNU General Public License v3.0
A locally executable script + Apache Jena + Queries to generate (mostly statistical) reports on LOSH data in MD + PDF & HTML exports
License: GNU General Public License v3.0
Create a cli command to run the reporter include a little help
Moe and me tried in parallel, to get the reporter running on our two machines, according to the README of konekto/LOSH-Reporter/dev
:
$ git clone [email protected]:konekto/LOSH-Reporter.git
$ cd LOSH-Reporter
$ # We had pandoc already installed, and installed poetry with:
$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
$ poetry --version
Poetry version 1.1.13
$ poetry shell
RuntimeError
Poetry could not find a pyproject.toml file in /bla/LOSH-Reporter or its parents
$ # Because we were on the `master` branch, so:
$ git checkout dev
$ poetry shell
source /home/bla/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/bin/activate.fish
Argument is not a number: ''
$ # <- error because I was in the fish shell; switching to BASH ...
$ bash
$ poetry shell
$ # Good this time!
$ poetry run python reporter/cli.py --log_to console --out output generate example html
...
please set environment variables for apache fuseki
...
$ # There was a mention in the erorr message about the file '.env.example', nice! :-)
$ cat .env.example
FUSEKI_URL=http://localhost:3030
FUSEKI_DATASET_NAME=loshrdf
$ export FUSEKI_URL=http://localhost:3030
$ export FUSEKI_DATASET_NAME=loshrdf
$ # and we try again ...
$ poetry run python reporter/cli.py --log_to console --out output generate example html
Traceback (most recent call last):
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/urllib3/util/connection.py", line 95, in create_connection
raise err
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/urllib3/util/connection.py", line 85, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/urllib3/connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/urllib3/connection.py", line 239, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/usr/lib/python3.9/http/client.py", line 1285, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.9/http/client.py", line 1331, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.9/http/client.py", line 1280, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.9/http/client.py", line 1040, in _send_output
self.send(msg)
File "/usr/lib/python3.9/http/client.py", line 980, in send
self.connect()
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/urllib3/connection.py", line 205, in connect
conn = self._new_conn()
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/urllib3/connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fc234802250>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/requests/adapters.py", line 440, in send
resp = conn.urlopen(
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/urllib3/connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/urllib3/util/retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=3030): Max retries exceeded with url: /loshrdf/sparql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc234802250>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/USER/Projects/OSEG/repos/LOSH-Reporter/reporter/cli.py", line 59, in <module>
main()
File "/home/USER/Projects/OSEG/repos/LOSH-Reporter/reporter/cli.py", line 52, in main
report_generator.generate(args)
File "/home/USER/Projects/OSEG/repos/LOSH-Reporter/reporter/generator.py", line 97, in generate
results = request(query)
File "/home/USER/Projects/OSEG/repos/LOSH-Reporter/reporter/core/requester.py", line 48, in request
response = requests.request('POST', url, data={'query': query})
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/requests/sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/home/USER/.cache/pypoetry/virtualenvs/losh-reporter--EO10kDI-py3.9/lib/python3.9/site-packages/requests/adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=3030): Max retries exceeded with url: /loshrdf/sparql (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc234802250>: Failed to establish a new connection: [Errno 111] Connection refused'))
$ # Ok... still something missing, maybe? *reading in README ...* ah, there is something about Fuseki (under # Development) ...
$ cd fuseki
$ docker-compose up
$ # Fails, cause docker is not setup to run as non-root, so:
$ sudo docker-compose up
...
Status: Downloaded newer image for secoresearch/fuseki:latest
Creating fuseki_fuseki_1 ... done
Attaching to fuseki_fuseki_1
fuseki_1 | ###################################
fuseki_1 | Initializing Apache Jena Fuseki
fuseki_1 |
fuseki_1 |
fuseki_1 | ###################################
fuseki_1 | sed: can't create temp file '/fuseki-base/configuration/assembler.ttlXXXXXX': Permission denied
fuseki_1 | sed: can't create temp file '/fuseki-base/configuration/assembler.ttlXXXXXX': Permission denied
fuseki_fuseki_1 exited with code 1
$ # Ups, whats that? lets see ...
$ cd ..
$ grep -r XXXXXX
$ # no output; Here we gave up for now
When I execute parts of the file, I'm getting error messages, see below:
for In 1
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Input In [1], in <cell line: 1>()
----> 1 import reporter.core.requester as req
2 import reporter.generator as gen
3 import reporter.templater as templater
File ~/Documents/git/IPK/LOSH-Reporter/reporter/core/requester.py:8, in <module>
6 import os
7 import json
----> 8 from dotenv import load_dotenv
9 import requests
11 from reporter.core.errors import NoQueryProvided, RequestError, NoEnvironmentVariableProvided
ModuleNotFoundError: No module named 'dotenv'
for In 2
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [2], in <cell line: 23>()
21 y_selector="repoHosts.count"
22 acc = 5
---> 23 x, y = fetch_xy(x_selector, y_selector)
24 x, y = accum_below_abs(x, y, acc)
25 fig_01, ax_01 = piechart_create(
26 x,
27 y,
28 title="repo-hosts-percent",
29 label="repo-host",
30 legend_vals=True)
NameError: name 'fetch_xy' is not defined
for In 3
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [1], in <cell line: 6>()
4 y_selector="repoHosts.count"
5 acc = 5
----> 6 x, y = fetch_xy(x_selector, y_selector)
7 x, y = accum_below_abs(x, y, acc)
8 fig, ax = barchart_create(x, y, title="repo-hosts", label="count")
NameError: name 'fetch_xy' is not defined
this continues for the others. fetch_xy
seems to be the issue :)
Thousand thanks to @MIE5R0 and Jan from TUB for the super rich feedback on the report! I'm attaching their commented version of the first report to this issue for reference.
Here's a ToDo list of points to resolve that I extracted from their comments and from a video call with them. All points are for @moedn unless otherwise stated
Ergänzungen LOSH-Report.docx
losh-report-08-2022_commented Robert_Jan.pdf
@hoijui quickly went through the current version → looks neat! Here are a few things I discovered; let's use this issue as a ToDo list and create dedicated issues on the individual items if needed
report.ipynb
see #11 @hoijuiIn 2
and In 3
and possibly also other ones) @hoijuiImplement the templating logic with jinja2 to generate the md file
A reporter package holds the logic for
A queries are stored in a separate folder.
→ Would it be possible to extent fetchers (after a yet to be written specification from my side) so that they would get more data and store it into RDF? We would however don't change anything in the Wikibase module → data will appear solely in RDF (getting this data to the Frontend is subject to a larger change within the LOSH system). I could simply define the keys and update the ontology; however, I'd like to have your opinion on this first :)
Implement plotting for the templates
See: https://swcarpentry.github.io/python-novice-gapminder/09-plotting/
The reports will be used as PDF or HTML documents.
Extend the reporter to use pandoc to export the both formats
To enable richer reports, it would be super helpful to add some wikifactory (WIF) -specific fields
I've got the following reply from WIF:
Hi Hi Moe! :)
I have attached a yaml file with all the fields associated to the projects that can be obtained from the Wikifactory GraphQl API.
This API can be accessed by going to https://wikifactory.com/api/graphql
However, I believe it is worth for you to know how did I get that list of fields.
If you go to the API URL, on the right side you will see a "Docs" button.
It opens a "Documentation Explorer" view, that allows you to inspect the data schema provided by Wikifactory. Among others, you can search for "Project" (notice the capital P). You should then see something similar to the attached screenshot, with all the fields shown in an interactive way.
Is that the information you were looking for?
If that is the case, let me know if you need help to operate with the API.
Otherwise, please, do let me
know what are you missing, and I will try to better help you
Cheers!
Their YAML-Export:
FeaturedIn: String
id: ID!
type: String
slug: String
creatorId: Int!
createdInRegion: String
spaceId: Int
dateCreated: DateTime
lastUpdated: DateTime
whitelabel: String
whiteLabelOnlyContent: Boolean
lastCommentedAt: DateTime
lastActivityAt: DateTime!
commentsCount: Int
likesCount: Int
followersCount: Int
score: Float!
pageviewsCount: Int!
publicRead: Boolean!
registeredRead: Boolean!
contentPtrId: ID!
name: String
description: String
imageId: Int
license: License
contributionCount: Int
archiveDownloadCount: Int!
contextId: Int!
projectType: project_type!
importStatus: ImportStatus
importJobId: String
slackThreadTs: String
slackContributionThreadTs: JSONString!
isExactForkCopy: String
hasImage: String
starCount: Int
headContribution: String
canAppearOnHome: String
hasContributions: String
featuredIn: String
image: File
phase: ProjectPhase
context: Context
creator: User
space: Space
followers(sortBy: StringfilterBy: [[String]]contains: [String]notContains: [[String]]whitelabels: [String]before: Stringafter: Stringfirst: Intlast: Int): ProfileConnection
tags: [Tag]
collections: [Collection]
comments(sortBy: Stringorigin: IDbefore: Stringafter: Stringfirst: Intlast: Int): CommentConnection
contributions(projectId: IDsortBy: Stringbefore: Stringafter: Stringfirst: Intlast: Int): ContributionConnection
contentType: String
parentSlug: String
isPrivate: Boolean
snippet: String
inviteLink: InviteLink
socialAccounts: [Social]
forum: Forum
followingCount: Int
canUpdate: Boolean
canDelete: Boolean
content: Content
parentContent: Content
inSpace: Space
avatar: File
imageFallbackChar: String
title: String
commenters: [Profile]
pageViews: Int
descriptionSnippet: String
private: Boolean
forkedFrom: DiffInfo
pendingOperations: [ContribOp]
conflicts: [Conflict]
conflictsParent: String
contributionUpstream: Contribution
lastZipGenerated: Boolean
tracker: Tracker
creatorProfile: Profile
isStarred: Boolean
forkCount: Int
contribution(version: String): Contribution
contributors(sortBy: StringfilterBy: [[String]]contains: [String]notContains: [[String]]whitelabels: [String]before: Stringafter: Stringfirst: Intlast: Int): ProfileConnection
fileHistory(uuid: Stringbefore: Stringafter: Stringfirst: Intlast: Int): OpsConnection
this is a copy from the report of all sections that couldn't be filled with data, including the developer's comments on this
## File Types
- search for MIME types of associated source and export files
- also state for how many projects source files are even defined
**NOTE: From here on, there is no more data, becasue the required fields do not exist.**
# Platform-specific Insights
## OSHWA
The Open Source Hardware Association (OSHWA) runs a certification program for OSH. Certified projects are officially deemed to be fully compliant to the OSHWA definition of OSH ([ref](https://www.oshwa.org/definition/)).
@fig:oshwa-cert-cumul shows the historical development of certificates issued in the past years
# <!-- TODO Robin (NO-CAN-DO: No creation-time data) line plot of cumulated certifications over time (x-axis could be e.g. per quarter, but that's totally optional) -->
# #fig:oshwa-cert-cumul
@fig:oshwa-cert-rate and @fig:oshwa-cert-growth show the derived certification rate and relative growth.
# <!-- TODO Robin (NO-CAN-DO: No creation-time data) line plot of (certifications per year) over time -->
# #fig:oshwa-cert-rate
# <!-- TODO Robin (NO-CAN-DO: No creation-time data) horizontal bar chart with growth of (certifications per year) relative to the year before-->
# #fig:oshwa-cert-growth
A total of {number-of-biggest-OSHWA-licensors-adding-up-to-50%} creators have combined to certify ~50 % of all currently OSHWA-certified projects. So {number-of-biggest-OSHWA-licensors-adding-up-to-50% / oshwa-unique-licensors-total} % of creators make half of OSHWA's database. On the other end we have {number-of-OSHWA-licensors-with-only-1-certificate} holders of a single certificate ( % of all creators). This calculates to a median of nearly {OSHWA-median-certificates-per-licensor} per participating individual or organization.
@fig:oshwa-cert-rate illustrates the distribution of certifications among creators.
# <!-- TODO Robin vertical bar chart with certifications per licensor-->
# #fig:oshwa-cert-rate
Since OSHWA also publishes the location of the certified OSH projects (or rather the corresponding team or organisation, presumingly), we can also see how these certifications distribute among countries, as shown in @fig:oshwa-cert-country and @fig:oshwa-cert-country-map.
# <!-- TODO Robin (NO-CAN-DO: No geo-location data) vertical bar chart with certifications per country -->
# #fig:oshwa-cert-country
# <!-- TODO Robin (NO-CAN-DO: No geo-location data) world map with certifications per country (see https://i0.wp.com/oshdata.wpcomstaging.com/wp-content/uploads/2020/09/cert-density-by-country.png?resize=768%2C467&ssl=1)-->
# #fig:oshwa-cert-country-map.
<!--- NOTE: the world map is totally optional; if that's to hard to implement → no biggy -->
## Wikifactory
Wikifactory is an online platform dedicated to OSH. It is designed to need specific needs and offer specialised services that arise in the development (and production) process of OSH. The platform itself is not open source, but free to use (however, there are some cool premium features).
It is important to mention that lots of projects on Wikifactory are not open source and hence are not considered in the following statistics.
@fig:wif-proj-hist-cumul illustrates the historical growth of OSH projects on Wikifactory; @fig:wif-proj-tag-cloud shows the most popular tags of those projects in a word cloud.
# <!-- TODO Robin (NO-CAN-DO: No creation-time data) line plot of cumulated project creations over time (`dateCreated`) (x-axis could be e.g. per quarter, but that's totally optional) -->
# #fig:wif-proj-hist-cumul
# <!-- TODO Robin (NO-CAN-DO: No tag data) word cloud of most used tags -->
# #fig:wif-proj-tag-cloud
The following sections aim to give you a feeling for the OSH projects hosted on that platform :)
### Project Locations
Since some users on wikifactory also specify the location of their project, we can also see how those projects distribute among countries, as shown in @fig:wif-proj-country and @fig:wif-proj-country-map.
# <!-- TODO Robin (NO-CAN-DO: No geo-location data) vertical bar chart with projects per country (`createdInRegion`) -->
# #fig:wif-proj-country
# <!-- TODO Robin (NO-CAN-DO: No geo-location data) world map with projects per country (see https://i0.wp.com/oshdata.wpcomstaging.com/wp-content/uploads/2020/09/cert-density-by-country.png?resize=768%2C467&ssl=1)-->
# #fig:wif-proj-country-map
<!--- NOTE: the world map is totally optional; if that's to hard to implement → no biggy -->
### Most Downloaded
It may be safe to say that OSH projects generally aim to be replicated and therewith have relevant applications in practice, improving people's lifes (in whichever way).
An indicator for practical replication of a projects is the number of downloads of its technical documentation.
On average an OSH project on Wikifactory is downloaded {average-archiveDownloadCount-WIF} times; [{name-of-WIF-project-with-most-downloads}]({repoURL-of-WIF-project-with-most-downloads}) is currently the most downloaded project with now totalling {archiveDownloadCount-of-project-with-most-downloads-WIF} downloads.
@fig:wif-downloads-dist shows the distribution of downloads per project, @fig:wif-downloads-top20 the top 20 of most downloaded OSH projects on wikifactory.
# <!-- TODO Robin (NO-CAN-DO: No view data) horizontal bar chart with distribution of downloads per project-->
# #fig:wif-downloads-dist
# <!-- TODO Robin (NO-CAN-DO: No view data) vertical bar chart with top 20 projects with most downloads-->
# #fig:wif-downloads-top20
### Most Viewed
The popularity of open source projects depends on many factors, not only the awesomeness of it's technical solution. Especially for commercialised projects, this may be an important performance indicator; and of course, popular projects may have an easier game when it comes to actual community building.
An indicator for how much attention a project receives is its number of views.
On average an OSH project on Wikifactory is viewed {average-pageviewsCount-WIF} times; [{name-of-WIF-project-with-most-views}]({repoURL-of-WIF-project-with-most-views}) is currently the most viewed project with now totalling {pageviewsCount-of-project-with-most-views-WIF} views.
@fig:wif-views-dist shows the distribution of total views per project, @fig:wif-views-top20 the top 20 of most viewed OSH projects on wikifactory.
# <!-- TODO Robin (NO-CAN-DO: No view data) horizontal bar chart with distribution of total views per project-->
# #fig:wif-views-dist
# <!-- TODO Robin (NO-CAN-DO: No view data) vertical bar chart with top 20 projects with most views-->
# #fig:wif-views-top20
### Most Contributions
Development on Wikifactory is version-controlled per "contribution" made (sort of equals "commits" on git-based systems).
An indicator for how much development work has been carried out on a single project is its number of contributions.
On average an OSH project on Wikifactory consists of {average-contributionCount-WIF} contributions; [{name-of-WIF-project-with-most-contributions}]({repoURL-of-WIF-project-with-most-contributions}) is currently the project with most contribution with now totalling {contributionCount-of-project-with-most-contributions-WIF} contributions.
@fig:wif-contributions-dist shows the distribution of total contributions per project, @fig:wif-contributions-top20 the top 20 of OSH projects most contributions on wikifactory.
# <!-- TODO Robin (NO-CAN-DO: No contributors data) horizontal bar chart with distribution of contributions per project-->
# #fig:wif-contributions-dist
# <!-- TODO Robin (NO-CAN-DO: No contributors data) vertical bar chart with top 20 projects with most contributions-->
# #fig:wif-contributions-top20
### Most Contributors
Community is an essential factor for successful open source projects.
An indicator for how wide the developer base of a single project is, is its number of contributors.
On average an OSH project on Wikifactory has {average-contributors-WIF} contributors; [{name-of-WIF-project-with-most-contributors}]({repoURL-of-WIF-project-with-most-contributors}) is currently the project with most contributors with now totalling {contributors-of-project-with-most-contributors-WIF} contributors.
@fig:wif-contributors-dist shows the distribution of total contributors per project, @fig:wif-contributors-top20 the top 20 of OSH projects most contributors on wikifactory.
# <!-- TODO Robin (NO-CAN-DO: No contributors data) horizontal bar chart with distribution of contributors per project-->
# #fig:wif-contributors-dist
# <!-- TODO Robin (NO-CAN-DO: No contributors data) vertical bar chart with top 20 projects with most contributors-->
# #fig:wif-contributors-top20
# <!-- TODO Robin (NO-CAN-DO: No contributors data) vertical bar chart with top 20 projects with most contributors-->
Implement query logic for reporter for apache fureseki
curl http://localhost:3030/vcard/ -X POST --data 'query=PREFIX+vCard%3A++++++%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2Fvcard-rdf%2F3.0%23%3E%0A%0ASELECT+%3Fy+%3FgivenName%0AWHERE%0A+%7B+%3Fy+vCard%3AFamily+%22Smith%22+.%0A+++%3Fy+vCard%3AGiven++%3FgivenName+.%0A+%7D' -H 'Accept: application/sparql-results+json,/;q=0.9'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.