Coder Social home page Coder Social logo

code4puertorico / contratospr-api Goto Github PK

View Code? Open in Web Editor NEW
37.0 7.0 12.0 938 KB

Web application to gather, display and make searchable contracts made by the Puerto Rico state government.

Home Page: https://contratospr.com

License: Apache License 2.0

Python 93.38% Shell 5.26% Dockerfile 0.87% HTML 0.49%
civic-tech search indexing puerto-rico

contratospr-api's Introduction

Open Gov Hack Night

Website for Puerto Rico's weekly Open Gov Hack Night

Built in plain HTML, Javascript and CSS

Javascript libraries

Projects and People

The projects and people pages are powered by Github and civic-json-worker, a script we run every 5 minutes that fetches data from the Github API.

The JSON files are backed up every hour in the civic-json-files repository.

contratospr-api's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar froi avatar jpadilla avatar rnegron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

contratospr-api's Issues

Get a current database dump

I've been asked for a dump of our database with our most current data. The purpose is to use this dump for some data analysis.

@jpadilla any thoughts on how best to fulfill this request?

Find out which deployment is stable and tag it

What is going on

We are having problems with our dependencies when deploying to Heroku. The test we are doing locally are not showing us the issue. While we figure out what is going on we need to find out which deployed version is the most stable, create a tag, and mark it as a release.

Making a release will let us deploy a specific tag with confidence and will permit us rollback to a version. It will also give us confidence in deploying a project state.

For now I've removed automatic deployments from our master branch to ensure that the application will not be affected by our work.

Discovery / investigation

The activity in Heroku shows the following rolbacks:

v193 (rollback to 191)
----> v191 (Update REDIS by heroku-redis)
--------> v190 (Rollback to v185) -> v185 (rollback to v181)
------------> v181 (Update REDIS by heroku-redis)
----------------> v175 (last non rollback or deployment error version) - hash ece048a

I've narrowed it down to these three hashes:

  • 9cfbf26 - 8 days ago (Update dependencies)
  • a88184e - 9 days ago (Changing app server to gunicorn)
  • 96ba667 - 12 days ago (Django version dump from Dependabot)
  • 0a66ba9 - 12 days ago (GitHub Action workflow file change)
  • 5c25abf - 12 days ago
  • b8b5ed5 - 5 months ago
  • ece048a - 5 months ago

Add support for fiscal year 2020+

Is your feature request related to a problem? Please describe.

Make list fiscal year choices dynamic.

start = 2016
range = end - start
end = get_current_fiscal_year()

Describe the solution you'd like

def get_current_fiscal_year():
now = timezone.now()
fiscal_year_end = timezone.make_aware(datetime.datetime(now.year, 6, 30))
if now > fiscal_year_end:
current_fiscal_year = now.year + 1
else:
current_fiscal_year = now.year
return current_fiscal_year

class HomeSerializer(serializers.Serializer):
fiscal_year = serializers.ChoiceField(
choices=[(2016, "2016"), (2017, "2017"), (2018, "2018"), (2019, "2019")],
allow_null=False,
initial=get_current_fiscal_year() - 1,
)

Análisis

Idealmente tenemos una sección donde mostramos un análisis de la data disponible.

- Trends
  - Date range - Presets: PR Fiscal Year(July 1st - June 30th)
  - $$$
  - # Contratos
  - # Contratistas
  - Average $ per contract

- Contracts
  - Biggest and smallest contracts
  - Trends

- Service groups / Service
  - Biggest and smallest contracts
  - Trends

- Contractor
  - Biggest and smallest contracts
  - Trends

Attached example UI from Stripe

screen shot 2018-11-26 at 10 33 58 pm

Searchable PDFs

Is your feature request related to a problem? Please describe.
If you really want to index the whole thing, PDFs have to be searchable.

Describe the solution you'd like
I've got code to OCR -even difficult to OCR(e.g. rotated) - PDFs.

Describe alternatives you've considered
Workaround would be to read the PDF (i.e. the contract). Another competitive advantage would be that your search also includes contract text. This opens the possibility to interesting statistics and document tagging.

Additional context
My intent is to understand what you have got in terms of infrastructure and processes to download PDFs. OCR can be added to that process.

Exclude amendments when aggregating contracts

Describe the bug

Some aggregated contract counts and totals are including amendments.

To Reproduce

Happens in contractor search results for contracts count. Could be wrong in other places too.

Expected behavior

Aggregations need to exclude a contract's amendments.

Add API functionality

Is your feature request related to a problem? Please describe.

It would be great to have an API exposing this data. This could potentially help people create their own analysis and visualizations.

Describe the solution you'd like

A simple and concise API that would let user search and obtain the data we've collected in a programatic way.

Describe alternatives you've considered

None.

Additional context

None

Rip out FilePreviews and Google Cloud Vision fallbacks

Initially text extraction happened via FilePreviews with a fallback of Google Cloud Vision. Local text extraction with pdftotext is good enough for now. Let's remove these additional dependencies to simplify as there is no immediate value add from them.

Extracción de datos sobre documentos de contratos

Is your feature request related to a problem? Please describe.

Extracción de datos sobre documentos de contratos

  • nombres de las comunidades para los contratos que ejecute la Oficina para el Desarrollo Socioeconómico y Comunitario de Puerto Rico (ODSEC)
  • números de teléfono, empezando con contratos de construcción, y/o contratos multianuales multimillonarios
    ...

Describe the solution you'd like
Poder automatizar la extracción de datos específicos sobre documentos de contratos, verificar que estén redactados correctamente y categorizarlos.

Describe alternatives you've considered
Para confirmación de datos se podría utilizar NLP (nltk, textblob, etc)

AttributeError: 'Response' object has no attribute '_resource_closers' in production logs

Describe the bug
We've been seeing the following error appear in our production logs but not our tests:

2020-06-29T16:13:45.612999+00:00 app[web.1]: [2020-06-29 16:13:45 +0000] [13] [ERROR] Error handling request /v1/contracts/2020-000028-4969339/
2020-06-29T16:13:45.613046+00:00 app[web.1]: Traceback (most recent call last):
2020-06-29T16:13:45.613048+00:00 app[web.1]: File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 134, in handle
2020-06-29T16:13:45.613049+00:00 app[web.1]: self.handle_request(listener, req, client, addr)
2020-06-29T16:13:45.613057+00:00 app[web.1]: File "/usr/local/lib/python3.7/site-packages/gunicorn/workers/sync.py", line 175, in handle_request
2020-06-29T16:13:45.613058+00:00 app[web.1]: respiter = self.wsgi(environ, resp.start_response)
2020-06-29T16:13:45.613058+00:00 app[web.1]: File "/usr/local/lib/python3.7/site-packages/django/core/handlers/wsgi.py", line 133, in __call__
2020-06-29T16:13:45.613058+00:00 app[web.1]: response = self.get_response(request)
2020-06-29T16:13:45.613059+00:00 app[web.1]: File "/usr/local/lib/python3.7/site-packages/django/core/handlers/base.py", line 76, in get_response
2020-06-29T16:13:45.613059+00:00 app[web.1]: response._resource_closers.append(request.close)
2020-06-29T16:13:45.613060+00:00 app[web.1]: AttributeError: 'Response' object has no attribute '_resource_closers'

Up to now we don't know the reason but will be looking into it.


CC @jpadilla

Make application work without data

Is your feature request related to a problem? Please describe.

When running the application for the first time it expects the database to have some sort of data. These are not initial fixtures but the "real data" that it needs to scrape. This will increase the difficulty of new developers to run the application.

Describe the solution you'd like

Have the application work with an empty database.

Describe alternatives you've considered

Include fixtures that would populate example data in the newly created database.

Additional context

This will help new contributors to the project get started faster.

Extract text from PDFs of images

Is your feature request related to a problem? Please describe.

There are ~3000 documents which we can't extract text from using pdftotext because each page is an image.

Describe the solution you'd like

Use pdftoppm to extract PDF pages and use tesseract to extract text. We should use this as a fallback for whenever pdftotext fails.

Servicios duplicados

Describe the bug

Some services appear to be duplicated.

Expected behavior

Services should be unique by name and group.

Screenshots

screen shot 2018-11-26 at 10 38 09 pm

Add Tests

Is your feature request related to a problem? Please describe.
The project needs some testing.

Describe the solution you'd like
Start by creating unit tests.

"project_setup" incompatible with python_version > 3.7

Describe the bug

The logic in the project_setup makes it so that you are required to have python 3.7 installed, even though the language suggest that you could run it with higher versions. This is not a big problem, either make it clear that python 3.7 is the only supported version or change the logic so that is includes python_version > 3.7.

If the solution is to make clear that only python 3.7 is supported then people could be warned not to change the requirements on the Pipfile fo force to work with python 3.8.

To Reproduce

Steps to reproduce the behavior:
Un-install python3.7
install python3.8
setup stops and a message shows up

"Please install Python version 3.7.2 or greater"

Expected behavior
work with any version of python that is higher than 3.7

Remove CircleCI config

Is your feature request related to a problem? Please describe.

Remove CircleCI configuration in place of GitHub Actions. See PR #87

@jpadilla any thoughts?

Describe the solution you'd like

Pretty much the same as above ☝️

Describe alternatives you've considered

The alternative is to not use actions and keep using CircleCI.

Additional context

CircleCI build don't alway run when contributions come from outside of the core contributors. With GItHub actions we do not have this problems and status checks run on every push.

Memory leak

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
Screen Shot 2019-10-07 at 7 25 07 PM

Additional context
Add any other context about the problem here.

Add a app,json file

We need to add a app.json to be able to use review apps in Heroku.

This will let us test the api before merging into master.

Añadir busqueda de Entidad

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Quiero buscar entidades y poder ver detalles del mismo, como contratos y tendencias de ellos.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Ingresar un entity id (el id usado por https://consultacontratos.ocpr.gov.pr/) para hacer una busqueda de entidad o contratista especifico.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

  1. Que el campo de buqueda cambie su contexto cuando se este viendo contratos o entidades.
  2. Crear un tipo de DSL para poder hacer busquedas avanzadas.
    Eg. entity_id: 3125 devolveria Autoridad de Carreteras ....
    Para esto se tiene que hacer un mapping de enttity_id a source_id.

Additional context
Add any other context or screenshots about the feature request here.

La pagina del contralor esta ofreciendo data dumps en formato csv. El nosotros no poder filtrar o buscar por numero de entidad podria ser un problema para nosotros y nuestros usuarios.

Add contribution guidelines

Code of Conduct

Issues

Development

  • Fork and clone
  • Install pipenv: pip install pipenv
  • Install dependencies: pipenv install --dev
  • Install pre-commit hooks: pipenv run pre-commit install
  • cp example.env .env
  • Install docker-compose: https://docs.docker.com/compose/install/
  • docker-compose up -d
  • Create super user: docker-compose exec web python manage.py createsuperuser
  • Scrape some contracts: docker-compose exec web python manage.py scrape_contracts --limit 100
  • open http://localhost:8000

Pull requests

Normalizacion de contratistas

Is your feature request related to a problem? Please describe.
Varios contratistas aparecen "repetidos" y con sus nombres escritos en multiples maneras.

Eg.
image

Esto se debe a que la data es enviada al contralor por cada entidad (municipio o agencia) y cada cual entra el mismo contratista. Debido a que los sistemas de las entidades no estan intercomunicados y que el sistema del contralor tampoco normaliza la data creo que deberiamos darle un vistazo a ver que posibilidades tenemos.

Describe the solution you'd like
Ninguna. Esto debería ser exploratorio por el momento.

Refinar búsqueda

Esto es más un issue de descubrimiento. Debemos de estar refinando nuestra búsqueda a medida que vayamos entendiendo nuestra data mejor.

Esto afectaría:

  • filtros
  • búsqueda en general
  • búsqueda de "keywords"
  • búsqueda de "full text"
  • peso de campos

Más detalles se añadiran en los comentarios de este issue a medida que resalten.

Contratos Impugnados

Esto es mas bien un posible feature que pueden añadir. Seria bueno tener visibilidad cual contrato esta siendo impugnado y una posible busqueda con contratistas que tengan contratos impugnados.

Puedes ser desplegado con algun marcador en el contrato. Y en la lista de contratistas algun count de los contratos impugnados.

Saved Searches/Views/Groups

Is your feature request related to a problem? Please describe.

I'd like a way to link to sort of specialized views.

For example, link to a view that composes multiple contractors as a visualization of "2019, el año de los "hijos talentosos"

Improving on a more flexible advanced search could get us there. But also, having a way to group contractors(to solve #48) would allow arbitrary grouping of multiple contractors.

I'll expand on this later.

Project setup fails using backup.dump file

Describe the bug

The backup.dump file that we are including in our setup instructions is causing the Django migrations to fail. The main reason is that the sequences for some tables are not being exported correctly.

I noticed this when I was trying to run migrations and would get an null constraint error. Checking the django_migrations table I saw the sequence was missing.

To Reproduce
Steps to reproduce the behavior:

  1. Follow the setup steps, it'll fail when trying to run the migrations.

Expected behavior
Migrations should be applied without an error.

Screenshots

Error can be seen here #86 (comment)

Additional context

I think the best way to solve this for future contributors / users is to not have the backup.dump and export the data using the manage.py dumpdata command. The resulting file can then be used after all migrations have been made. This way we let the framework do it's job.

Contratos otorgados por año fiscal

Just noticed a new feature on https://consultacontratos.ocpr.gov.pr

Screen Shot 2019-05-12 at 2 05 54 PM

Here's a snippet of the CSV downloaded for 2018-2019.

Número de Entidad | Entidad | Número de Contrato | Enmienda | Otorgado En | Vigencia Desde | Vigencia Hasta | Tipo de Servicio | Categoría de Servicio | Cancelado | Cuantía | Contratista
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
4070 | Municipio de Toa Baja | 1994-000264 | AM | 05-01-2019 | 05-01-2019 | 05-01-2020 | VIVIENDAS | COMPRA, VENTA Y/O ALQUILER DE INMUEBLES |   | 9132 | NELSON OQUENDO DIAZ
4040 | Municipio de Juncos | 1996-000042 | R | 07-19-2018 | 08-01-2018 | 07-31-2019 | VIVIENDAS | COMPRA, VENTA Y/O ALQUILER DE INMUEBLES |   | 4596 | ISMAEL ALEJANDRO GUTIERREZ
4048 | Municipio de Maricao | 1997-000040 | V | 10-31-2018 | 10-31-2018 | 10-31-2019 | VIVIENDAS | COMPRA, VENTA Y/O ALQUILER DE INMUEBLES |   | 3720 | JARIL HOLDINGS CO. INC.
4070 | Municipio de Toa Baja | 1997-000208 | BN | 08-01-2018 | 08-01-2018 | 08-01-2019 | VIVIENDAS | COMPRA, VENTA Y/O ALQUILER DE INMUEBLES |   | 3912 | GILBERTO RAMOS VERA
4070 | Municipio de Toa Baja | 1997-000208 | BO | 09-01-2018 | 09-01-2018 | 09-01-2019 | VIVIENDAS | COMPRA, VENTA Y/O ALQUILER DE INMUEBLES |   | 5184 | GILBERTO RAMOS VERA
4070 | Municipio de Toa Baja | 1997-000215 | AA | 09-24-2018 | 10-01-2018 | 10-01-2019 | VIVIENDAS | COMPRA, VENTA Y/O ALQUILER DE INMUEBLES |   | 5676 | CARLOS D. AYALA
4070 | Municipio de Toa Baja | 1997-000310 | V | 10-01-2018 | 10-01-2018 | 10-01-2019 | VIVIENDAS | COMPRA, VENTA Y/O ALQUILER DE INMUEBLES |   | 4740 | ANGEL A. AVILES

Maybe we can use this to our advantage somehow 🤔

ValueError related to Redis URL during project setup

Describe the bug
Following the CONTRIBUTING.mdproject setup flow doesn't work, there is a conflict between the example.env file and the CELERY_BROKER_URL property in the settings, which appends "/0" to the Redis URL from the env file, which already includes a "/0". This results in the following exception:

ValueError: invalid literal for int() with base 10: '0/0'

To Reproduce
Steps to reproduce the behavior:

  1. Follow the project setup flow, copying example.env into .env.
  2. Try to run docker-compose up.

Expected behavior
The project setup flow should work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.