Coder Social home page Coder Social logo

codeforamerica / cfapi Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chihacknight/civic-json-worker

113.0 30.0 52.0 1.47 MB

The Code for America API. Tracks and motivates activity and participation across the civic technology movement.

Home Page: http://codeforamerica.org/api

License: MIT License

Python 90.89% Mako 0.11% CSS 0.22% HTML 8.67% PLpgSQL 0.12%

cfapi's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cfapi's Issues

Create an easy way to add projects

Description

Currently, a Brigade has to choose between the API reading from one GitHub organization url like https://github.com/sfbrigade or from a hand curated list of GitHub rep urls like https://docs.google.com/spreadsheet/pub?key=0AlcMlEZm_nJ0dEt4d29jYjdfcWNPYVo5T1BOWk5lTGc&output=csv

We should make it easier to do both.

First version

A Captain could point us to one list containing both organization urls and project repo urls.

Second version

An easy to use form that can take in both organization urls or project repos and select a Brigade for them to be listed with. We could use a Google Form, with custom styling like we did for http://codeforamerica.github.io/story-bucket/

Change Type column to be a list of types

Description

Organizations should be able to have more than one Type. Change the documentation to tell people they can put in a comma separated list of types.

Example

Code for Germany should be Type = "Brigade, Code for All"

More resilient process for updating organizations

Description

The current run_update.py grabs all the info about all the organizations and then saves them to the database. This means nothing gets updated on an error or if we get rate limited.

A better run_update flow is to:

  1. Get an organizations stories, save them to the db.
  2. Get an organizations events, save them to the db.
  3. Get an organizations projects, save them to the db.
  4. Repeat with a new organization.

If we are able to clearly log which organization is causing an error, then we choose the next org at random.

organizations.geojson failing

screen shot 2014-04-29 at 11 16 14 am

Description

The geojson endpoint is failing. We're getting no info on the Brigade website. We should fix the geojson endpoint so that it doesn't fail when it doesn't gather every Brigade's info.

Clues

I manually ran the run_update.py script again, just after it had run already. We got throttled by GitHub.

Add links to upcoming_events and past_events in the Organization

Description

All the possible endpoints should be followable from within the API itself. Fielding called it HATEOAS.

Files

app.py

Tasks

  • Add in "upcoming_events" : "http://codeforamerica.org/api/organizations/Beta_NYC/upcoming_events"
  • Add in "past_events : "http://codeforamerica.org/api/organizations/Beta_NYC/past_events"

Produce project_details.json files

Description

The civic-json-worker began by producing flat files. We've since started using a database and API instead. There are lots of interesting possibilities made possible by producing and versioning the original project_details.json files though. These possibilities are described more in discussion of #9.

I've pulled the production of these files and their uploading to S3 into a separate script. It currently doesn't work, I just copied and pasted all the relevant code over and it needs structure and clean up.

Tasks

  • Rewrite civic_json_transform.py to actually produce json files.
  • Have these files upload to S3.
  • Set up Heroku Scheduler to run this script once a day.
  • Figure out how to version them ...

Create a /brigades endpoint

Description

Create a /brigades endpoint. Something like:

{
  "name" : "Code for Kansas City,
  "website" : "http://codeforkc.org",
  "events_url" : "http://www.meetup.com/kcbrigade/",
  "next_event" : { ... },
  "all_events" : [ ... ],
  "rss" : " ... ",
  "lastest_story" : { ... },
  "all_stories" : [ ... ],
  "projects_list_url" : " ... ",
  "two_active_projects" : [ ... ],
  "all_projects" : [ ... ],
  "location" : {
      "latitude": 38.8721061,
      "longitude": -94.6079577,
      "name": "Kansas City, KS & MO"
      }
  "core_roles" : {
      "captain" : { ... },
      "delivery_lead" : { ... },
      "storyteller" : { ... },
      "community_organizer" : { ... }
      },
  "members" : [ ... ]
}

A few questions:

  • Should we call it /brigades or /orgs? I want to include Fellowship projects and maybe others.
  • Whats a good way to migrate this data over from the current Brigade site? Possibly more than once as new members will keep joining.
  • Any other ways to structure this that help how the Brigade admin team keeps track of things?

Change timestamps on events to correct or no timezone

Event timestamps are currently all GMT, e.g. at http://civic-tech-movement.codeforamerica.org/api/organizations/Open%20Oakland:

"start_time": "Wed, 25 Mar 2015 01:30:00 GMT"
"start_time": "Wed, 18 Mar 2015 01:30:00 GMT"

These should be changed in one of two ways. If we know the local timezone, we should express them in local time with the correct timezone. If we can’t determine the local timezone, we should instead express them as unix epoch values for ease of parsing and use.

Replace Celery with Scheduler

Description

We'll be hosting this app on Heroku. They offer a Scheduler add on that works similar to cron jobs. We'll replace the Celery code with this Scheduler instead.

Tasks

  • Remove celery.sh and celeryconfig.py
  • Remove the celery dependency in tasks.py
  • Set up Heroku Scheduler to run the update_projects function in tasks.py

Have the collections be returned in order by name or id

Description

Currently, the api returns its objects in a random order. For examples /organizations is giving us

{
  "num_results": 54, 
  "objects": [
    {
      "city": "Ann Arbor, MI", 
      "id": 1, 
      ...
    }, 
    {
      "city": "Anchorage, AK", 
      "id": 5, 
      ...
    }, 
    {
      "city": "New York, NY", 
      "id": 2, 
      ...
    }

We should return these in order by either id or alphabetically.

Files

app.py

Research

Fix URLs in JSON links

Now:

  "all_events": "http://civic-tech-movement.codeforamerica.org/api/organizations/Code for America/events", 
  "all_projects": "http://civic-tech-movement.codeforamerica.org/api/organizations/Code for America/projects", 
  "all_stories": "http://civic-tech-movement.codeforamerica.org/api/organizations/Code for America/stories", 

Need to encode brigade names to make the URLs valid, and should remove or change hostname in URL to correctly reflect local testing environments or possible domain name change.

Could be:

  "all_events": "/api/organizations/Code%20for%20America/events", 
  "all_projects": "/api/organizations/Code%20for%20America/projects", 
  "all_stories": "/api/organizations/Code%20for%20America/stories", 

Pull projects from an Brigade's GitHub account

To make it even easier for a Brigade to get their projects listed in our api, we could just ask each Brigade for their official GitHub repo and pull in all the projects from there.

They could still also supply a list to add in Brigade projects which live in other people's repos.

There are a few downsides to this though. A Brigade wouldn't have granular control over what was included. They couldn't easily tell us to not include certain projects. Also, not every Brigade has an official repo, though I will be encouraging them all to make one.

To Dos

Weird warnings in run_update.py

INFO:root:Gathering all of Code for Durham's stories.
INFO:root:Gathering all of Code for Durham's events.
/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/sql/type_api.py:207: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  return x == y
/app/.heroku/python/lib/python2.7/site-packages/sqlalchemy/orm/session.py:1394: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
  elif state.key != instance_key:
INFO:root:Gathering all of OK Lab Münster's stories.

Multiple lists of Organizations

Description

Currently, we use one master list of Organizations. We should instead have multiple lists of organizations of different types. We could have a list of all the Fellowship teams, a list of all of our non-profits partners, a list of all of our funders, a list of all the Code for All groups. Each of these will be in a different GDoc with different editing permissions.

Files

run_update.py

Tasks

  • Have run_update.py read from a list of gdoc links, instead of just the one.
  • Create Google Spreadsheets for each type of organization we want to include in the API.
  • The different types of orgs would be indicated in the type column at /organizations.

Add in Eventbrite Support

Description

Some Brigade's use Eventbrite to schedule their events. We should include them in the Calendar as well.

How to

I haven't been able to find a way to use the Eventbrite API to get an organizers events. They would need to give us permission somehow.

The work around is to use the search api. You can use the organizers name that way.
For example:
https://www.eventbrite.com/json/event_search?organizer=open-city&app_key=UW4C24YXU2XK3G6AD6
returns

{
  events: [
    event: {
      title: "Open Gov Hack Night",
      start_date: "2013-03-05 18:00:00",
      status: "Started",
      description: "..."
      end_date: "2013-03-05 22:00:00",
      repeat_schedule: "weekly-1-N,Y,N,N,N,N,N-12/23/2014",
      created: "2012-11-01 08:35:46",
      url: "http://opengovhacknight.eventbrite.com/?aff=SRCH",
      modified: "2014-04-07 06:01:06",
      repeats: "yes"
    }
  ]
}

You'll notice though that the dates on there are way old, but it repeats weekly. We'll need to parse that ourselves. 😦

Notes

Only Open Gov Hack Night uses Eventbrite so far, so no rush yet.

Add a geojson option to Brigades?

Should it be returned as geojson? We could then add all the Brigades to the map as one feature layer instead of adding each marker one by one.

@migurski Says: Maybe use both /organizations.geojson and /organizations.json ?

Check for Organization's complete info before working on it

Description

The run_update.py script fails if an organization doesn't have a latitude or longitude.

Reproduce

Someone added Code for Anchorage a second time to the Brigade Information spreadsheet. They added it with only the City column filled out. When run_update.py tried to gather all the info about this second Anchorage, it crashed on trying to enter it into the database with an empty sting instead of a float for the latitude and longitude.

To Fix

Check that an Organization's information is complete on the Brigade Information spreadsheet before gathering all its stories, projects, and events.

Files

run_update.py

Tasks

  • At line 505, when main() loops over each org, add a check that the correct type of data is in each column of an organization.
  • break out of the loop if data is missing.

Test fastest way to get recent activity

Description

In #19 we want to have only the most recent stories, events, and projects be shown on the /organizations endpoint.

Options

I see two ways to do this.

  1. Create three new tables recent_stories, latest_events, updated_projects. These will get populated once an hour by new code in the run_update.py script that looks at all the stories, events, and projects for an organization and chooses the two most recent. Then we create relationships from the Organization table to those tables, instead of the full stories, events, and projects tables.
  2. Use flask-restless' preprocessors to get the two most recent things on each request.

Add Events API for future events.

The generic events API method is current not useful for showing meaningful upcoming events, partially due to Meetup API returning a full year’s worth of recurring weekly meetups. Related to issue #36.

Some possible approaches:

  • Add /api/events/upcoming sorted by ascending date.
  • Or /api/events/all sorted by ascending date, with /api/events showing only future events.
  • Or /api/events/past sorted by descending date.

Ask GitHub for a higher limit on the API

Details

The run_update.py script runs once an hour and asks GitHub about every project listed in all the Repos of all the Brigades around the world. The number of projects is growing really fast as the Brigades are growing. The GitHub API has rate limits of 5000 requests every few hours though. We'll scale up against this restraint very quickly.

We need to ask GitHub for a higher rate limit to accurately update the Civic Tech API.

Write JSON files someplace else

I seem to remember at some point during the development of this discovering that Heroku doesn't actually allow you to write files to the local filesystem. Am I remembering this correctly @derekeder?

One way around this which we used was to write the JSON files out to an S3 bucket instead of the way that we are handling it in the code currently. The reason we shifted away from it was mainly for convenience (fewer hoops to jump through) and because I didn't know about this scheduler stuff on Heroku (so I was running two separate free apps, one for the web interface and one for the celery worker).

Not sure what the labels mean here so hopefully someone will help me there.

Add ability to update one Brigade

Description

When a new Brigade is added to the Brigade Information sheet, we have to run_update.py on all of them to be able to get that new group added. We're running low on GitHub requests, so we need to ration.

We should have a command line argument to update specific Brigades.

Files

run_update.py

Tasks

  • Use argparse to take in command line args
  • The command should be the Brigade's name. python run_update.py --name Code for San Francisco

Add total count attribute to api endpoints

Description

Lets add another attribute to /organizations, /events, /stories that shows the total count of objects returned.

{ 
    "objects" : [ ... ] ,
    "pages : {
        "next" : ... ,
        "last" : ...
    } ,
    "total" : 123
} 

Files

app.py

Rewrite API

Description

Flask-Restless has a few limitations we are up against. It depends on each Brigade having a numerical id. You can only return a single Brigade by using its id. http://civic-tech-movement.codeforamerica.org/api/organizations/1

The syntax for using a string instead is really ugly. [http://civic-tech-movement.codeforamerica.org/api/organizations?q={"filters":[{"name":"name","op":"eq","val":"Code for San Francisco"}]}](http://civic-tech-movement.codeforamerica.org/api/organizations?q={"filters":[{"name":"name","op":"eq","val":"Code for San Francisco"}]})

We really only need four endpoints that either return multiple or single responses.
/organizations, /projects, /events, /stories

Lets rewrite it ourselves!

Files

app.py

Tasks

  • Remove all flask-restless code.
  • Write new routes for each of the wanted endpoints.
  • /organizations should return all of the organizations.
  • /organizations/Code for San Francisco should return just one.
  • A organization should also have all of its two latest stories, projects, and events returned as well.

Make API output civic.json files for projects

The first stab at answering the "how do we document civic tech projects" was to define a civic.json file. Repo developers could fill out all the fields manually, and drop them into their projects. We could then amass a list of projects and their details just by searching Github.

This approach made too much work for developers. Instead, we want robots to do the work. Civic.json is still useful, attractive, and simple meta-data standard to describe projects, though.

So civic-json-worker's job is to automate the creation - as much as possible - of these files from Github. It should dump out civic.json, and hopefully this meta-data standard proves useful in other projects.

Civic.json encompasses all the fields the API is currently getting from Github, please a proposed set of extended fields. Over in the civic.json repo, we're figuring out what small set of extended fields we should support, and how to make it as easy as possible to collect these fields. (Discussion going on in the issue tracker.)

Once that's clarified, the API would be modified to support the full spec. That's the idea, anyway.

Update README

For the public launch we should update the README.

Sections we should have:

Description
Projects / Press
How to get your Brigade included
Documentation
Tech
History

Does this project become an API?

Should this project become the full fledged Civic Tech Movement API that Code for America is planning to build? Or should it remain separate and just feed that API with these project json files.?

Mongodb vs Postgres

Right now this app is using json files on S3 as the database. We should seriously consider using a database instead. We'll need to if we ever want to create an API.

Mongo makes sense in that we have all the modeling already done by @chriswhong. Also json is best.

Postgres makes sense because there are relationships between Brigades and projects so a relational db will help.

Thoughts?

Remove github backup of files

Since #2 seems to be the case if this app gets hosted on Heroku, the code that we wrote to backup the files to github doesn't really make sense. It's also kinda wacky cause it does the commit with my github account (which makes me look like a rockstar, right @derekeder?)

UTF-8 not working in run_update.py

Description

I tried to use the Polish spelling of Brigade names locally. No go.

To Reproduce

Use the testing gdocs_link in run_update.py

Clues

At the bottom of run_update.py main() where we delete orgs that we didn't find this run, there are some utf-8 issues. Check out line 560. if bad_org.name in organization_names:

With Łódź as the example, bad_org.name is u'\xc5\x81\xc3\xb3d\xc5\xba' while organization_name has it as \xc3\x85\xc2\x81\xc3\x83\xc2\xb3d\xc3\x85\xc2\xba so it doesn't match and gets deleted.

Define user workflow?

Just wanted to start a discussion for how folks are thinking of defining the user workflow here.

To me, the easiest would be for a group to submit a URL for a .json file (via simple web form or POST request) that has the group's metadata, eg:

{
  name: "OpenOakland",
  place: "Oakland, CA, USA",
  project_urls: ["https://github.com/openoakland/oakland_answers",
    "https://github.com/openoakland/adopt-a-drain", ...]
  ...
}

Just curious to start the conversation to flesh this out.

Add API links

The rest of the API should use internal links like organizations now do.

  • api_url in projects.
  • api_url in events.
  • api_url in linked organizations.
  • link-based paging.

For items with unstable IDs like events, we should either attempt to generate a stable identifier e.g. based on a hash of a link, or omit documenting the ability to link to them for now. Keep the API footprint small and avoid making promises we can’t deliver.

Double quotes breaking CSV parsing in run_update

Bug

Using double quotes in a project description field totally breaks the csv parsing!

"Web application to aggregate tasks across projects that are identified for ""hacking"". The ""Hack Task Aggregator"" is a client (Javascript) application that queries a collection of project repositories, identifies all the tasks marked for ""hacking"", and presents a single page with this information. We produced this so that people who were interested in hacking on an Open Austin (http://www.open-austin.org/) project could see what's available."

Is being parsed out in run_update.py as

{
    ...
    'code_url': ' and presents a single page with this information. We produced this so that people who were interested in hacking on an Open Austin (http://www.open-austin.org/) project could see what\'s available."',
    'description': 'Web application to aggregate tasks across projects that are identified for "hacking"". The ""Hack Task Aggregator"" is a client (Javascript) application that queries a collection of project repositories',
    ...
}

#### To Reproduce
Comment out the production gdocs_link at the top of run_update.py
Uncomment the testing gdocs link
run run_update.py
Watch it break.

Add error reporting to engine light

Description

When run_update.py throws an error and stops, we should report this to Engine Light.

Tasks

  • Create a new table of errors with timestamps
  • Have .well_known/status check for new errors

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.