Coder Social home page Coder Social logo

find-that-charity's Introduction

Find that charity

Elasticsearch-powered search engine for looking for charities. Allows for:

  • importing data from England and Wales, Scotland, and Northern Ireland, ensuring that duplicates are matched to one record.
  • An elasticsearch index that can be queried.
  • Reconciliation API for searching charity, based on an optimised search query.
  • Facility for uploading a CSV of charity names and adding the (best guess) at a charity number.
  • HTML pages for searching for a charity

Installation

  1. Clone repository
  2. Create virtual environment (python -m venv env)
  3. Activate virtual environment (env/bin/activate or env/Scripts\activate)
  4. Install requirements (pip install -r requirements.txt)
  5. Install elasticsearch
  6. Start elasticsearch
  7. Create elasticsearch index (python data_import/create_elasticsearch.py)

Fetching data

This step fetches data on charities in England, Wales and Scotland. The command is run using the following command:

python data_import/fetch_data.py --oscr <path/to/oscr/zip/file.zip>

Office of the Scottish Charity Regulator (OSCR)

OSCR data needs to be manually downloaded from the OSCR website in order to accept the terms and conditions. Once downloaded the path needs to be passed to data_import/fetch_data.py using the --oscr flag.

Charity Commission for England and Wales

Data on charities in England and Wales will be fetched from http://data.charitycommission.gov.uk/. If a different URL is needed then use the --ccew flag.

The latest .ZIP file will be downloaded and unzipped, and the data contained will be converted from .bcp files to .csv.

Charity Commission for Northern Ireland

Data on charities in Northern Ireland will be fetched from http://www.charitycommissionni.org.uk/charity-search/ (Open Government Licence) If a different URL is needed then pass it to the --ccni flag when running import/fetch_data.py

The latest .CSV file (updated daily) will be downloaded to /data.

"Other names" for Northern Ireland charities are not contained in the downloadable CSV, but are in the information presented on the CCNI website. The other names are maintained in this list which will be downloaded. To use another file, pass url to --ccni_extra.

Dual registered charities

A list of dual registered charities will be downloaded from github. To use another file pass an url to --dual.

The list is CSV file with a line per pair of England and Wales/Scottish charities in the format:

"Scottish Charity Number","E&W Charity Number","Charity Name (E&W)"
"SC002327","263710","Shelter, National Campaign for Homeless People Limited"

To add more charities fork the to the Github gist and add a comment to the original gist.

Postcode data

You can also add postcode data from https://github.com/drkane/es-postcodes to allow for geographic-based searching. If you host the postcode elasticsearch index on the same host it can be used at the import_data.py stage.

Importing data

Once the data has been fetched the needed files are stored data/ directory. You can then run the python data_import/import_data.py script to import it.

By default the script will look for an elasticsearch instance at localhost:9200, use python data_import/import_data.py --help to see the available options. To use the postcode elasticsearch index you need to pass --es-pc-host localhost.

Data model

The data is imported into elasticsearch in the following format:

{
  "charity_number": "12355",
  "ccew_number": "12355",
  "oscr_number": "SC1235",
  "ccni_number": "NI100012",
  "active": true,
  "names": [
    {"name": "Charity Name", "type": "registered name", "source": "ccew"}
  ],
  "known_as": "Charity Name",
  "geo": {
    "areas": ["gss_codes"],
    "postcode": "PO54 0DE",
    "latlng": [0.0, 50.0]
  },
  "url": "http://www.url.org.uk/",
  "domain": "url.org.uk",
  "latest_income": 12345,
  "company_number": [
    {"number": "00121212", "source": "ccew"}
  ],
  "parent": "124566",
  "ccew_link": "http://apps.charitycommission.gov.uk/Showcharity/RegisterOfCharities/SearchResultHandler.aspx?RegisteredCharityNumber=12355&SubsidiaryNumber=0",
  "oscr_link": "http://www.oscr.org.uk/charities/search-scottish-charity-register/charity-details?number=SC1235",
  "ccni_link": "http://www.charitycommissionni.org.uk/charity-details/?regid=100012&subid=0"
}

Server

The server uses bottle. Run it with the following command:

python server/server.py --host localhost --port 8080

The server offers the following API endpoints:

  • /reconcile: a reconciliation service API conforming to the OpenRefine reconciliation API specification.

  • /charity/12345: Look up information about a particular charity

Todo

Current status is a proof-of-concept, needs a bit of work to get up and running.

Priorities:

  • tests for ensuring data is correctly imported
  • server tests
  • use results of server/recon_test.py to produce the best reconciliation search query for use in the server (recon_test_7 seems the best at the moment)
  • threshold for when to use the result vs discard

Future development:

  • upload a CSV file and reconcile each row with a charity
  • allow updating a charity with additional possible names

find-that-charity's People

Contributors

drkane avatar bobharper1 avatar

Stargazers

Mike avatar

Watchers

Bee Webb avatar Rob Redpath avatar James Cloos avatar Steven Flower avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.