Coder Social home page Coder Social logo

ccdb5-api's People

Contributors

adamzarger avatar alexm118 avatar amymok avatar chosak avatar cwdavies avatar dependabot[bot] avatar higs4281 avatar imuchnik avatar jeffreymfarley avatar jslay-excella avatar rosskarchner avatar schbetsy avatar sephcoster avatar willbarton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ccdb5-api's Issues

Total Record Count shows as 0

I see that there is an attempt to get total_record_count here but when I run it, it turns up zero:

  "_meta": {
    "total_record_count": 0,
    "last_updated": "2017-07-10T00:00:00.000Z",
    "license": "CC0"
  },

Add ability to aggregate by company

Would be a very useful feature to be able to aggregate complaints by complaints per company -- even limited to the top 10 or 20 companies for a particular filter.

Get List of All Products and Companies

Hi,
Is there a way to get all products and companies using one of the GET endpoint? (I would assume they would have different query parameters if this is possible).

Swagger documentation on `/?field` appears to be incorrect.

Per the documentation:

Search by particular field. If this parameter is not specified, default behavior
will return only complaints with narratives.

Although, the first hit when you perform a search with no criteria is the complaint with ID 2662569, which does not have a narrative.

Suggest endpoint is not suggesting

It seems the current implementation is not behaving as expected.

Steps to reproduce:

  1. /_suggest/?text=Mort&size=6
  2. Returns
{
"sgg": [
  {
   "text": "Mort",
   "length": 4, 
   "options": [],
   "offset": 0
  }
 ],
}

I would expect "mortgage" as one example

State Aggregations seem to be cut off

When I search with state=WY, I get 166 matches, but aggregations.state does not list WY as one of the buckets.

Is this aggregation being limited? Notice sum_other_doc_count

  "aggregations": {
    ... snip ...
    "state": {
      "doc_count": 166148,
      "state": {
        "buckets": [
          {
            "doc_count": 23505,
            "key": "CA"
          },
          {
            "doc_count": 15105,
            "key": "FL"
          },
          {
            "doc_count": 14685,
            "key": "TX"
          },
          ... snip ...
          {
            "doc_count": 245,
            "key": "SD"
          },
          {
            "doc_count": 204,
            "key": "VT"
          },
          {
            "doc_count": 193,
            "key": "ND"
          }
        ],
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 597
      }
    },

Sort by Date is causing a 400 error

When I try http://localhost:8000/data-research/consumer-complaints/search/api/v1/?field=all&size=25&sort=created_date_asc

I get:

{"error":"Elasticsearch error: search_phase_execution_exception"}

The same happens for sort=created_date_desc

Installation fails with pip 10

Installation of this package with pip 10 fails due to the use of pip.req here in setup.py.

$ pip --version
pip 10.0.1 from ...
$ pip install git+https://github.com/cfpb/[email protected]#egg=ccdb5-api
Collecting ccdb5-api from git+https://github.com/cfpb/[email protected]#egg=ccdb5-api
  Cloning https://github.com/cfpb/ccdb5-api.git (to revision v1.0.5) to /private/var/folders/gg/wxm3673x2356ll65n0xbm_7r0000gn/T/pip-install-CacAhx/ccdb5-api
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/gg/wxm3673x2356ll65n0xbm_7r0000gn/T/pip-install-CacAhx/ccdb5-api/setup.py", line 32, in <module>
        install_requires = parse_requirements()
      File "/private/var/folders/gg/wxm3673x2356ll65n0xbm_7r0000gn/T/pip-install-CacAhx/ccdb5-api/setup.py", line 21, in parse_requirements
        requirements = pip.req.parse_requirements(
    AttributeError: 'module' object has no attribute 'req'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/gg/wxm3673x2356ll65n0xbm_7r0000gn/T/pip-install-CacAhx/ccdb5-api/

See this related discussion about use of pip internals; an alternative approach would be to define requirements in the setup.py install_requires instead of in requirements.txt. See related setuptools documentation here.

Highlighting is not working as expected

Highlight should occur in the field searched.

  • If user searches All , occurrences in any field should be highlighted.
  • If user searches Company, only company.
  • If user searches Narratives, only narratives.

Hits.total for the `/` endpoint does not parse as an integer

There seems to be a discrepancy between the Swagger documentation and the actual response object. The documentation indicates that Hits.total should parse as an integer.

In practice, for the / endpoint, I found that hits.total is an object that looks something like this:

{
            "value": 3198,
            "relation": "eq"
}

Example API call:

www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/?date_received_max=2021-09-15&date_received_min=2021-09-14

Sanitize special characters

Trends API seems redirects to CFPB website

Data Export Problems

1.) I still cannot use the raw size of the dataset. When I include the full size, it errors:

/data-research/consumer-complaints/search/api/v1/?date_received_min=2017-01-03&field=all&format=json&no_aggs=true&size=135811&sort=relevance_desc

2.) I get an error when I try to use CSV format

/data-research/consumer-complaints/search/api/v1/?date_received_min=2017-01-03&field=all&format=csv&no_aggs=true&size=10000&sort=relevance_desc

Questions about `catch_es_error` decorator's status code and response body

When doing some testing against a cf.gov environment that could not talk to Elasticsearch, I saw the following error at /data-research/consumer-complaints/search/api/v1/?field=all&size=25&sort=created_date_desc:

Elasticsearch error: <urllib3.connection.HTTPConnection object at 0x7f5558fef650>: Failed to establish a new connection: [Errno 111] Connection refused

This was in a non-debug environment where I would not expect to see a raw exception leaking into a response like that. I thought I'd open an issue in case this is something that could be adjusted.

I'm also wondering if a 400 (see here) is appropriate in this scenario, since it seems more like a 500 situation to me.

Several warnings and errors when running tests in Python 3

When running tox, during the py36-dj111 testing step, the tests pass, but they result in several warnings and errors printed to the testing output.

Here's a sample run of the test suite:

complaint_search/tests/test_view_suggest_company.py:2: RemovedInDjango20Warning: Importing from django.core.urlresolvers is deprecated in favor of django.urls.
  from django.core.urlresolvers import reverse
Creating test database for alias 'default'...
ccdb5_api/urls.py:20: RemovedInDjango20Warning: Passing a 3-tuple to django.conf.urls.include() is deprecated. Pass a 2-tuple containing the list of patterns and app_name, and provide the namespace argument to include() instead.
  url(r'^admin/', include(admin.site.urls)),
ccdb5_api/urls.py:21: RemovedInDjango20Warning: Specifying a namespace in django.conf.urls.include() without providing an app_name is deprecated. Set the app_name attribute in the included module, or pass a 2-tuple containing the list of patterns and app_name instead.
  url(r'^', include('complaint_search.urls', namespace="complaint_search")),
System check identified no issues (0 silenced).
..............................................................tox/py36-dj111/lib/python3.6/site-packages/django/core/handlers/base.py:52: RemovedInDjango20Warning: Old-style middleware using settings.MIDDLEWARE_CLASSES is deprecated. Update your middleware and use settings.MIDDLEWARE instead.
  "instead.", RemovedInDjango20Warning
TransportError(N/A, 'Error')
....TransportError(N/A, 'Error')
....Out of memory
.TransportError(N/A, 'Error')
................................................TransportError(N/A, 'Error')
.......TransportError(N/A, 'Error')
....TransportError(N/A, 'Error')
....
----------------------------------------------------------------------
Ran 133 tests in 3.675s

  lint: commands succeeded
  py36-dj111: commands succeeded
  congratulations :)

Search within "company name" not "company response"

According to the design specs, we need to search within the company field.

Steps to reproduce:

  1. Call the API with the query string ?field=company&search_term=bank
  2. Response is {"field":["\"company\" is not a valid choice."]}

API documentation is inaccurate on date_received_max parameter

It says that

date_received_max
string
(query)
Return results with date < date_received_max (i.e. 2017-03-04)

But in fact it queries with date <= date_received_max, for instance:

?date_received_min=2012-01-01&date_received_max=2012-01-01&format=default&size=50

Returns 14 records with date_received = 2012-01-01

I hope that this behavior won't change and the documentation will be fixed accordingly

Question about size parameter

The following GET call returns a response with 100 entries as oppose to the requested 150. It seems that any size parameter above 100 is limited to the first 100 entries.

Not sure if a feature or a bug but believe that any meaningful analysis would require more entries.

https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/?size=150&no_aggs=true

R chunck:

url <- 'https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/'
query_params <- list(size = 150, no_aggs = 'true')

response <- GET(url=url, query = query_params)

content <- jsonlite::fromJSON(httr::content(response, "text", encoding = 'UTF-8'))

My goal is to create an R API Client package for the complaint database and this issue limits most of its functionality.

Full API client package implementation can be found https://github.com/joseandresmontes/cfpbR

README missing documentation on how to load data

The first instructions in the setup section of the README are:

This repository assumes that you have an instance of elasticsearch running with complaint data set up and running.

I believe it would help get users off the ground if you could be more specific here, with links to any public data or even the expected schema of the data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.