codeforamerica / cfapi Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chihacknight/civic-json-worker

113.0 30.0 52.0 1.47 MB

The Code for America API. Tracks and motivates activity and participation across the civic technology movement.

Home Page: http://codeforamerica.org/api

License: MIT License

Python 90.89% Mako 0.11% CSS 0.22% HTML 8.67% PLpgSQL 0.12%

cfapi's People

Stargazers

Watchers

Forkers

evz daguar codeformuenster chrisrodz isaacchansky carpeliam ejegg marcel12bell schlos pmackay rquinnlv codeandomonterrey juanjcsr tlacoyodefrijol neverforgit ycombinator atishn 0111001101111010 ondrae patrickegorman jmizer1112 apranav19 hansthompson opendatachina hlprmnky lorenanicole sarunac sugetha24 jeff-meadows maxpresman patricklaban smalley tansyarron titlecharacter codeforjapan f88 joetravis cweems code-for-england saifulazad opencltbrigade opensavannah tdooner code-for-all civictechworld code-for-nashville mataralhawiti fagan2888 violet26 isabella232 open-source-projects-initiatives sarjith

cfapi's Issues

Record timestamp when a Brigade first appears

record timestamp when a Brigade first appears // how is babby formed

for six month clock begins

Create an easy way to add projects

Description

Currently, a Brigade has to choose between the API reading from one GitHub organization url like https://github.com/sfbrigade or from a hand curated list of GitHub rep urls like https://docs.google.com/spreadsheet/pub?key=0AlcMlEZm_nJ0dEt4d29jYjdfcWNPYVo5T1BOWk5lTGc&output=csv

We should make it easier to do both.

First version

A Captain could point us to one list containing both organization urls and project repo urls.

Second version

An easy to use form that can take in both organization urls or project repos and select a Brigade for them to be listed with. We could use a Google Form, with custom styling like we did for http://codeforamerica.github.io/story-bucket/

Change Type column to be a list of types

Description

Organizations should be able to have more than one Type. Change the documentation to tell people they can put in a comma separated list of types.

Example

Code for Germany should be Type = "Brigade, Code for All"

Stories is causing errors in production

Figure out what's going on.

Add requirement that Brigade names contain only `[0-9A-Za-z ]`

Follow-on to #40 .

More resilient process for updating organizations

Description

The current run_update.py grabs all the info about all the organizations and then saves them to the database. This means nothing gets updated on an error or if we get rate limited.

A better run_update flow is to:

Get an organizations stories, save them to the db.
Get an organizations events, save them to the db.
Get an organizations projects, save them to the db.
Repeat with a new organization.

If we are able to clearly log which organization is causing an error, then we choose the next org at random.

organizations.geojson failing

Description

The geojson endpoint is failing. We're getting no info on the Brigade website. We should fix the geojson endpoint so that it doesn't fail when it doesn't gather every Brigade's info.

Clues

I manually ran the run_update.py script again, just after it had run already. We got throttled by GitHub.

Add links to upcoming_events and past_events in the Organization

Description

All the possible endpoints should be followable from within the API itself. Fielding called it HATEOAS.

Files

app.py

Tasks

Add in "upcoming_events" : "http://codeforamerica.org/api/organizations/Beta_NYC/upcoming_events"
Add in "past_events : "http://codeforamerica.org/api/organizations/Beta_NYC/past_events"

Produce project_details.json files

Description

The civic-json-worker began by producing flat files. We've since started using a database and API instead. There are lots of interesting possibilities made possible by producing and versioning the original project_details.json files though. These possibilities are described more in discussion of #9.

I've pulled the production of these files and their uploading to S3 into a separate script. It currently doesn't work, I just copied and pasted all the relevant code over and it needs structure and clean up.

Tasks

Rewrite civic_json_transform.py to actually produce json files.
Have these files upload to S3.
Set up Heroku Scheduler to run this script once a day.
Figure out how to version them ...

Modify code to work for any city/group

@jpvelez and I started a discussion on this point over here and it's probably worth making note of this over here, too since it seems like that's at least part of the aim here.

Create a /brigades endpoint

Description

Create a /brigades endpoint. Something like:

{
  "name" : "Code for Kansas City,
  "website" : "http://codeforkc.org",
  "events_url" : "http://www.meetup.com/kcbrigade/",
  "next_event" : { ... },
  "all_events" : [ ... ],
  "rss" : " ... ",
  "lastest_story" : { ... },
  "all_stories" : [ ... ],
  "projects_list_url" : " ... ",
  "two_active_projects" : [ ... ],
  "all_projects" : [ ... ],
  "location" : {
      "latitude": 38.8721061,
      "longitude": -94.6079577,
      "name": "Kansas City, KS & MO"
      }
  "core_roles" : {
      "captain" : { ... },
      "delivery_lead" : { ... },
      "storyteller" : { ... },
      "community_organizer" : { ... }
      },
  "members" : [ ... ]
}

A few questions:

Should we call it /brigades or /orgs? I want to include Fellowship projects and maybe others.
Whats a good way to migrate this data over from the current Brigade site? Possibly more than once as new members will keep joining.
Any other ways to structure this that help how the Brigade admin team keeps track of things?

Change timestamps on events to correct or no timezone

Event timestamps are currently all GMT, e.g. at http://civic-tech-movement.codeforamerica.org/api/organizations/Open%20Oakland:

"start_time": "Wed, 25 Mar 2015 01:30:00 GMT"
"start_time": "Wed, 18 Mar 2015 01:30:00 GMT"

These should be changed in one of two ways. If we know the local timezone, we should express them in local time with the correct timezone. If we can’t determine the local timezone, we should instead express them as unix epoch values for ease of parsing and use.

Replace Celery with Scheduler

Description

We'll be hosting this app on Heroku. They offer a Scheduler add on that works similar to cron jobs. We'll replace the Celery code with this Scheduler instead.

Tasks

Remove celery.sh and celeryconfig.py
Remove the celery dependency in tasks.py
Set up Heroku Scheduler to run the update_projects function in tasks.py

Have the collections be returned in order by name or id

Description

Currently, the api returns its objects in a random order. For examples /organizations is giving us

{
  "num_results": 54, 
  "objects": [
    {
      "city": "Ann Arbor, MI", 
      "id": 1, 
      ...
    }, 
    {
      "city": "Anchorage, AK", 
      "id": 5, 
      ...
    }, 
    {
      "city": "New York, NY", 
      "id": 2, 
      ...
    }

We should return these in order by either id or alphabetically.

Files

app.py

Research

Review Preprocessors in flask-restless for how to do this.

Fix URLs in JSON links

Now:

  "all_events": "http://civic-tech-movement.codeforamerica.org/api/organizations/Code for America/events", 
  "all_projects": "http://civic-tech-movement.codeforamerica.org/api/organizations/Code for America/projects", 
  "all_stories": "http://civic-tech-movement.codeforamerica.org/api/organizations/Code for America/stories",

Need to encode brigade names to make the URLs valid, and should remove or change hostname in URL to correctly reflect local testing environments or possible domain name change.

Could be:

  "all_events": "/api/organizations/Code%20for%20America/events", 
  "all_projects": "/api/organizations/Code%20for%20America/projects", 
  "all_stories": "/api/organizations/Code%20for%20America/stories",

Pull projects from an Brigade's GitHub account

To make it even easier for a Brigade to get their projects listed in our api, we could just ask each Brigade for their official GitHub repo and pull in all the projects from there.

They could still also supply a list to add in Brigade projects which live in other people's repos.

There are a few downsides to this though. A Brigade wouldn't have granular control over what was included. They couldn't easily tell us to not include certain projects. Also, not every Brigade has an official repo, though I will be encouraging them all to make one.

To Dos

Add a GitHub org name column to the Brigade Information sheet.
Use the Organizations Repos Github endpoint to get all those projects

Weird warnings in run_update.py

INFO:root:Gathering all of Code for Durham's stories.
INFO:root:Gathering all of Code for Durham's events.
/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/sql/type_api.py:207: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  return x == y
/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/orm/session.py:1394: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  elif state.key != instance_key:
INFO:root:Gathering all of OK Lab Münster's stories.

Multiple lists of Organizations

Description

Currently, we use one master list of Organizations. We should instead have multiple lists of organizations of different types. We could have a list of all the Fellowship teams, a list of all of our non-profits partners, a list of all of our funders, a list of all the Code for All groups. Each of these will be in a different GDoc with different editing permissions.

Files

run_update.py

Tasks

Have run_update.py read from a list of gdoc links, instead of just the one.
Create Google Spreadsheets for each type of organization we want to include in the API.
The different types of orgs would be indicated in the type column at /organizations.

Conform to schema.org standards?

@noneck raised this in a comment here
BetaNYC/civic.json#5 (comment)

so I just wanted to open an issue, since I'm worried that this implementation and the data standard discussion are starting to become detached.

Add in Eventbrite Support

Description

Some Brigade's use Eventbrite to schedule their events. We should include them in the Calendar as well.

How to

I haven't been able to find a way to use the Eventbrite API to get an organizers events. They would need to give us permission somehow.

The work around is to use the search api. You can use the organizers name that way.
For example:
https://www.eventbrite.com/json/event_search?organizer=open-city&app_key=UW4C24YXU2XK3G6AD6
returns

{
  events: [
    event: {
      title: "Open Gov Hack Night",
      start_date: "2013-03-05 18:00:00",
      status: "Started",
      description: "..."
      end_date: "2013-03-05 22:00:00",
      repeat_schedule: "weekly-1-N,Y,N,N,N,N,N-12/23/2014",
      created: "2012-11-01 08:35:46",
      url: "http://opengovhacknight.eventbrite.com/?aff=SRCH",
      modified: "2014-04-07 06:01:06",
      repeats: "yes"
    }
  ]
}

You'll notice though that the dates on there are way old, but it repeats weekly. We'll need to parse that ourselves. 😦

Notes

Only Open Gov Hack Night uses Eventbrite so far, so no rush yet.

Add a geojson option to Brigades?

Should it be returned as geojson? We could then add all the Brigades to the map as one feature layer instead of adding each marker one by one.

@migurski Says: Maybe use both /organizations.geojson and /organizations.json ?

Requested Enhancements

Search for projects by attribute. @MatthewLoveless wants to get all the projects started in 2014.
Filter out forked projects? https://twitter.com/_jden/status/458665275978969088 @jden

Check for Organization's complete info before working on it

Description

The run_update.py script fails if an organization doesn't have a latitude or longitude.

Reproduce

Someone added Code for Anchorage a second time to the Brigade Information spreadsheet. They added it with only the City column filled out. When run_update.py tried to gather all the info about this second Anchorage, it crashed on trying to enter it into the database with an empty sting instead of a float for the latitude and longitude.

To Fix

Check that an Organization's information is complete on the Brigade Information spreadsheet before gathering all its stories, projects, and events.

Files

run_update.py

Tasks

At line 505, when main() loops over each org, add a check that the correct type of data is in each column of an organization.
break out of the loop if data is missing.

Allow for pretty urls at /organization

In reference to this front end request, we should allow for pretty urls in the api too.

Check out flask-restless preprocessors to see how this works.

Change name of `recent_events` to `upcoming_events`

They’re upcoming events, right?

Test fastest way to get recent activity

Description

In #19 we want to have only the most recent stories, events, and projects be shown on the /organizations endpoint.

Options

I see two ways to do this.

Create three new tables recent_stories, latest_events, updated_projects. These will get populated once an hour by new code in the run_update.py script that looks at all the stories, events, and projects for an organization and chooses the two most recent. Then we create relationships from the Organization table to those tables, instead of the full stories, events, and projects tables.
Use flask-restless' preprocessors to get the two most recent things on each request.

Include “hack” issues in project needs

Open ATX is using the label on their issues:

http://open-austin.github.io/hack-task-aggregator/public/index.html

Add the sweep and keep pattern to stories

All of the other tables in the db follow a sweep and keep pattern. Stories should get the same treatment.

If we don't, then old stories will stick around and prevent deleting of Brigades due to their foreign keys.

Add Events API for future events.

The generic events API method is current not useful for showing meaningful upcoming events, partially due to Meetup API returning a full year’s worth of recurring weekly meetups. Related to issue #36.

Some possible approaches:

Add /api/events/upcoming sorted by ascending date.
Or /api/events/all sorted by ascending date, with /api/events showing only future events.
Or /api/events/past sorted by descending date.

Ask GitHub for a higher limit on the API

Details

The run_update.py script runs once an hour and asks GitHub about every project listed in all the Repos of all the Brigades around the world. The number of projects is growing really fast as the Brigades are growing. The GitHub API has rate limits of 5000 requests every few hours though. We'll scale up against this restraint very quickly.

We need to ask GitHub for a higher rate limit to accurately update the Civic Tech API.

Add timestamps to stories

Return valid GeoJSON for /api/organizations.geojson

Current response from http://codeforamerica.org/api/organizations.geojson is invalid GeoJSON.

The response should be wrapped in a FeatureCollection rather than a GeometryCollection.

See http://geojsonlint.com/ for validation

Write JSON files someplace else

I seem to remember at some point during the development of this discovering that Heroku doesn't actually allow you to write files to the local filesystem. Am I remembering this correctly @derekeder?

One way around this which we used was to write the JSON files out to an S3 bucket instead of the way that we are handling it in the code currently. The reason we shifted away from it was mainly for convenience (fewer hoops to jump through) and because I didn't know about this scheduler stuff on Heroku (so I was running two separate free apps, one for the web interface and one for the celery worker).

Not sure what the labels mean here so hopefully someone will help me there.

Add ability to update one Brigade

Description

When a new Brigade is added to the Brigade Information sheet, we have to run_update.py on all of them to be able to get that new group added. We're running low on GitHub requests, so we need to ration.

We should have a command line argument to update specific Brigades.

Files

run_update.py

Tasks

Use argparse to take in command line args
The command should be the Brigade's name. python run_update.py --name Code for San Francisco

Add total count attribute to api endpoints

Description

Lets add another attribute to /organizations, /events, /stories that shows the total count of objects returned.

{ 
    "objects" : [ ... ] ,
    "pages : {
        "next" : ... ,
        "last" : ...
    } ,
    "total" : 123
}

Files

app.py

Add run_update check to status output

If the update process hasn’t successfully run after an hour+, make the status Not OK.

Rewrite API

Description

Flask-Restless has a few limitations we are up against. It depends on each Brigade having a numerical id. You can only return a single Brigade by using its id. http://civic-tech-movement.codeforamerica.org/api/organizations/1

The syntax for using a string instead is really ugly. [http://civic-tech-movement.codeforamerica.org/api/organizations?q={"filters":[{"name":"name","op":"eq","val":"Code for San Francisco"}]}](http://civic-tech-movement.codeforamerica.org/api/organizations?q={"filters":[{"name":"name","op":"eq","val":"Code for San Francisco"}]})

We really only need four endpoints that either return multiple or single responses.
/organizations, /projects, /events, /stories

Lets rewrite it ourselves!

Files

app.py

Tasks

Remove all flask-restless code.
Write new routes for each of the wanted endpoints.
/organizations should return all of the organizations.
/organizations/Code for San Francisco should return just one.
A organization should also have all of its two latest stories, projects, and events returned as well.

Make API output civic.json files for projects

The first stab at answering the "how do we document civic tech projects" was to define a civic.json file. Repo developers could fill out all the fields manually, and drop them into their projects. We could then amass a list of projects and their details just by searching Github.

This approach made too much work for developers. Instead, we want robots to do the work. Civic.json is still useful, attractive, and simple meta-data standard to describe projects, though.

So civic-json-worker's job is to automate the creation - as much as possible - of these files from Github. It should dump out civic.json, and hopefully this meta-data standard proves useful in other projects.

Civic.json encompasses all the fields the API is currently getting from Github, please a proposed set of extended fields. Over in the civic.json repo, we're figuring out what small set of extended fields we should support, and how to make it as easy as possible to collect these fields. (Discussion going on in the issue tracker.)

Once that's clarified, the API would be modified to support the full spec. That's the idea, anyway.

Update README

For the public launch we should update the README.

Sections we should have:

Description

Projects / Press

How to get your Brigade included

Documentation

Tech

History

Does this project become an API?

Should this project become the full fledged Civic Tech Movement API that Code for America is planning to build? Or should it remain separate and just feed that API with these project json files.?

Mongodb vs Postgres

Right now this app is using json files on S3 as the database. We should seriously consider using a database instead. We'll need to if we ever want to create an API.

Mongo makes sense in that we have all the modeling already done by @chriswhong. Also json is best.

Postgres makes sense because there are relationships between Brigades and projects so a relational db will help.

Thoughts?

Remove github backup of files

Since #2 seems to be the case if this app gets hosted on Heroku, the code that we wrote to backup the files to github doesn't really make sense. It's also kinda wacky cause it does the commit with my github account (which makes me look like a rockstar, right @derekeder?)

UTF-8 not working in run_update.py

Description

I tried to use the Polish spelling of Brigade names locally. No go.

To Reproduce

Use the testing gdocs_link in run_update.py

Clues

At the bottom of run_update.py main() where we delete orgs that we didn't find this run, there are some utf-8 issues. Check out line 560. if bad_org.name in organization_names:

With Łódź as the example, bad_org.name is u'\xc5\x81\xc3\xb3d\xc5\xba' while organization_name has it as \xc3\x85\xc2\x81\xc3\x83\xc2\xb3d\xc3\x85\xc2\xba so it doesn't match and gets deleted.

Add API paging to /api/events

There are over a thousand, might be best returned in descending order by time.

Define user workflow?

Just wanted to start a discussion for how folks are thinking of defining the user workflow here.

To me, the easiest would be for a group to submit a URL for a .json file (via simple web form or POST request) that has the group's metadata, eg:

{
  name: "OpenOakland",
  place: "Oakland, CA, USA",
  project_urls: ["https://github.com/openoakland/oakland_answers",
    "https://github.com/openoakland/adopt-a-drain", ...]
  ...
}

Just curious to start the conversation to flesh this out.

Figure out why Meetup events are wrong

For example, OO has two upcoming events in 2015 but none of the ones listed on the Meetup page.

Add API links

The rest of the API should use internal links like organizations now do.

api_url in projects.
api_url in events.
api_url in linked organizations.
link-based paging.

~~For items with unstable IDs like events, we should either attempt to generate a stable identifier e.g. based on a hash of a link, or omit documenting the ability to link to them for now.~~ Keep the API footprint small and avoid making promises we can’t deliver.

Double quotes breaking CSV parsing in run_update

Bug

Using double quotes in a project description field totally breaks the csv parsing!

"Web application to aggregate tasks across projects that are identified for ""hacking"". The ""Hack Task Aggregator"" is a client (Javascript) application that queries a collection of project repositories, identifies all the tasks marked for ""hacking"", and presents a single page with this information. We produced this so that people who were interested in hacking on an Open Austin (http://www.open-austin.org/) project could see what's available."

Is being parsed out in run_update.py as

{
    ...
    'code_url': ' and presents a single page with this information. We produced this so that people who were interested in hacking on an Open Austin (http://www.open-austin.org/) project could see what\'s available."',
    'description': 'Web application to aggregate tasks across projects that are identified for "hacking"". The ""Hack Task Aggregator"" is a client (Javascript) application that queries a collection of project repositories',
    ...
}

#### To Reproduce
Comment out the production gdocs_link at the top of run_update.py
Uncomment the testing gdocs link
run run_update.py
Watch it break.

Modify code to work for any City / Group

@jpvelez and @evz started a discussion on this point over here and it's probably worth making note of this over here, too since it seems like that's at least part of the aim here.

Add error reporting to engine light

Description

When run_update.py throws an error and stops, we should report this to Engine Light.

Tasks

Create a new table of errors with timestamps
Have .well_known/status check for new errors

codeforamerica / cfapi Goto Github PK

cfapi's People

Stargazers

Watchers

Forkers

cfapi's Issues

Description

First version

Second version

Description

Example

Description

Description

Clues

Description

Files

Tasks

Description

Tasks

Description

Description

Tasks

Description

Files

Research

To Dos

Description

Files

Tasks

Description

How to

Notes

Description

Reproduce

To Fix

Files

Tasks

Description

Options

Details

Description

Files

Tasks

Description

Files

Description

Files

Tasks

Description

Projects / Press

How to get your Brigade included

Documentation

Tech

History

Description

To Reproduce

Clues

Bug

Description

Tasks

Recommend Projects

Recommend Topics

Recommend Org