Coder Social home page Coder Social logo

regardscitoyens / the-law-factory-parser Goto Github PK

View Code? Open in Web Editor NEW
45.0 18.0 9.0 4.93 MB

Data generator for the-law-factory project

Home Page: https://www.lafabriquedelaloi.fr

License: GNU General Public License v3.0

HTML 0.77% Shell 10.79% Perl 7.75% Python 80.69%
parliamentary-data lafabriquedelaloi

the-law-factory-parser's Introduction

the-law-factory-parser

Build Status Coverage Status

Data generator for the-law-factory project (http://www.LaFabriqueDeLaLoi.fr)

Code used to generate the API available at: http://www.LaFabriqueDeLaLoi.fr/api/

Install the dependencies

You should set up a dedicated virtualenv with Python 3.5+:

virtualenv -p $(which python3) venv
source venv/bin/activate

Using Pypy can seriously boost performance. You can easily install it and create a virtualenv with it for instance by installing Pyenv:

pyenv install pypy3.5-6.0.0
pyenv virtualenv pypy3.5-6.0.0 lafabrique
pyenv activate lafabrique

Then with your choice of virtualenv activated, install the dependencies:

sudo apt install libxml2-dev libxslt-dev # necessary for lxml
pip install --upgrade setuptools pip # not necessary but always a good idea
pip install -e .
pip install -Ur requirements.txt # to get the latest version of those dependencies

Generate data for one bill

tlfp-parse <url>

The data is generated in the "data" directory. You can change this default behavior by inputting a data path as extra argument: tlfp-parse <url> <dataDir>.

For example, to generate data about the "Enseignement supérieur et recherche" bill:

tlfp-parse http://www.senat.fr/dossier-legislatif/pjl12-614.html
ls data/pjl12-614/

You can also use directly Senate's ids such as: tlfp-parse pjl12-614

Development options --debug, --enable-cache and --only-promulgated can also be used.

Generate data for many bills

To generate all bills from 2008, you can pipe a list of ids or urls into tlfp-parse-many.

A convenient way to do so is to use senapy:

senapy-cli doslegs_urls --min-year=2008 | tlfp-parse-many data/

See senapy-cli doslegs_urls help for more options. You can also use anpy with anpy-cli doslegs_urls.

Serve bills locally for The Law Factory website

First, you need to build data for all desired bills.

Then generate the files required by the frontend:

python tlfp/generate_dossiers_csv.py data/       # generates home.json and dossiers_promulgues.csv used by the searchbar
python tlfp/tools/assemble_procedures.py data/   # generates dossiers_n.json files used by the Navettes viz

Finally, serve the data directory however you like. For instance, you can serve it on a specific port with a simple http server like nodeJs', in which case, you'll need to enable cors: just install http-server with npm and run it in data directory on a given port (8002 in the example):

npm install -g http-server
cd data & http-server -p 8002 --cors

Generate git version for a bill

Work In Progress

You can export all your bills as git repositories: python tlfp/tools/make_git_repos.py git_export

Other things you can do

  • parse a sénat dosleg: senapy-cli parse pjl15-610
  • parse an AN dosleg: anpy-cli parse http://www.assemblee-nationale.fr/13/dossiers/deuxieme_collectif_2009.asp
  • parse all the sénat doslegs: senapy-cli doslegs_urls | senapy-cli parse_many senat_doslegs/
  • parse all the AN doslegs anpy-cli doslegs_urls | anpy-cli parse_many an_doslegs/
  • generate a graph of the steps: python tlfp/tools/steps_as_dot.py data/ | dot -Tsvg > steps.svg

You can explore the related projects here

Tests

To run the tests, you can follow the .travis.yml file.

git clone https://github.com/regardscitoyens/the-law-factory-parser-test-cases.git
python tests/test_regressions.py the-law-factory-parser-test-cases

If you modify something, best is to regenerate the test-cases with the --regen flag:

python tests/test_regressions.py the-law-factory-parser-test-cases --regen

To make the tests faster, you can also use the --enable-cache flag. To clear the cache, you can remove the directory returned by lawfactory_where_is_my_cache. To update the meta-infos (ex: a new political group was added), you need to clear the test-cases directory of all the root .json files.

You can also watch for parts of the code not yet covered by the tests:

  • First, install coverage: pip install coverage
  • Then, you can execute bash coverage.sh
  • Then, the report is in htmlcov/index.html

Credits

This work, a collaboration between Regard Citoyens, médialab Sciences Po and CEE Sciences Po, is supported by a public grant overseen by the French National Research Agency (ANR) as part of the "Investissements d'Avenir" program within the framework of the LIEPP center of excellence (ANR11LABX0091, ANR 11 IDEX000502).

More details at https://lafabriquedelaloi.fr/a-propos.html

the-law-factory-parser's People

Contributors

asone avatar boogheta avatar davidbgk avatar davidgayou avatar fmassot avatar mdamien avatar ronnix avatar rouxrc avatar seb35 avatar teymour avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

the-law-factory-parser's Issues

Git commits should refer to individual amendments

Wonderful project, thanks!
As it stands it seems that there is a Git commit for each review by one assembly (i.e. one commit for the first reading, then one commit for the CMP...). Could we get one commit per amendment instead?

wrong date format seldom appears in dossiers file

In some cases dates having one digit days are missing the 0 to fulfill the format yyyy-mm-dd

{
enddate: "2012-03-06",
source_url: "http://www.assemblee-nationale.fr/13/ta/ta0885.asp",
date: "2012-03-6",
step: "hemicycle",
has_interventions: false,
directory: "14_l.définitive_assemblee_hemicycle",
debats_order: null,
institution: "assemblee",
nb_amendements: 0,
stage: "l. définitive"
},

wrong chmod on .sh files

Hello,

*.sh files are not marked as executable when cloning the repository.

chmod +x *.sh && chmod +x script/*.sh

should fix this issue.

Regards,

generate_data_from_senat_url.sh never ends

Hello,

I'm trying to generate data using generate_data_from_senat_url.sh:

bash generate_data_from_senat_url.sh http://www.senat.fr/dossier-legislatif/ppl12-118.html

I see a perl process taking 100% CPU but it never ends and nothing (except and empty data/ppl12-118/.tmp/dossier.csv` file) is created.

Sometimes, it works but it takes more than 5 minutes. Most of that time is spent being idle. I suspect it's waiting for some curl call to finish. Why would the requests take that much time?

Any idea?

Get missing elements from parse_dossier_senat in correct_from_dossier_an

The mismatched urls are all good in the senate case but often wrong when corrected, so I commented the script here eef01cf

Although, this would be useful for other cases, where the senate does not even give a line for each step whereas AN does. 2 cases found since 2010:

  • Texte déposé, id missing in dossier Sénat:
## Working on http://www.senat.fr/dossier-legislatif/pjl11-497.html
dossier.csv: missing step 4
http://www.assemblee-nationale.fr/14/dossiers/accord_Serbie_cooperation_policiere.asp
 ->  http://www.assemblee-nationale.fr/14/ta/ta0103.asp
## Working on http://www.senat.fr/dossier-legislatif/pjl10-511.html
 -> http://www.assemblee-nationale.fr/13/dossiers/accord_fiscal_Costa_Rica.asp TA 734
## Working on http://www.senat.fr/dossier-legislatif/pjl10-512.html
 -> http://www.assemblee-nationale.fr/13/dossiers/accord_fiscal_Liberia.asp TA 738
## Working on http://www.senat.fr/dossier-legislatif/pjl10-513.html
 -> http://www.assemblee-nationale.fr/13/dossiers/accord_fiscal_Brunei.asp TA 733
## Working on http://www.senat.fr/dossier-legislatif/pjl10-514.html
 -> http://www.assemblee-nationale.fr/13/dossiers/accord_fiscal_Belize.asp TA 732
## Working on http://www.senat.fr/dossier-legislatif/pjl10-515.html 
-> http://www.assemblee-nationale.fr/13/dossiers/accord_fiscal_Dominique.asp TA 735
## Working on http://www.senat.fr/dossier-legislatif/pjl10-516.html
- > http://www.assemblee-nationale.fr/13/dossiers/accord_fiscal_Anguilla.asp TA 730

Handle rectifications texts by the Senate / AN

Extra Feature bonus for mod1

it might be interesting to give a try to put the text diff column as a slide push like on this demo page :

http://tympanus.net/Blueprints/SlidePushMenus/

It would offer a way to profit of the whole width of the window to display the legislative process scheme to may it less dense. On clicking on an article/amendment then the column would appear on the side, creating a slight "wow!" effect and an optimization of window space.

Insert additional info on first dossiers file

The first dossiers file (dossiers_0_49.json) should contain additional information in order to build the visualization:

  • beginning of the first law in temporal order
  • length in days of the (temporally) longest law

Complete and debug Interventions data

  • fix ND/NS api to provide also séances with interventions in Section corresponding to law id in ListSeancesByLoi
  • fix ND/NS api to query list of interventions of a seance filtered by law id (meaning having hte tag or being part of the corresponding section)
  • adapt generate_data.sh to call Séance api by law id
  • filter interventions file by date of textes déposé/adopté to avoid duplicates
  • check we get data in commission on NS and ND

Check start date of single steps in dossiers files

In the dossiers files (dossiers_0_49.json etc) some laws have steps having a start date earlier than the beginning of the law project. See in example ppl09-264, fourth step, or pjl12-505, first step.

Amendments rollback

Amendments roll back to an (apparently) previous state on TLF. When auto-updating at regular intervals, once in a while the API returns a lot of amendments with the "non voted" state. A few seconds/minutes later everything is back to normal. Here is an example:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.