cfpb / ccdb5-api Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 15.0 7.43 MB

An API that provides an interface to search complaint data.

License: Creative Commons Zero v1.0 Universal

Shell 0.79% Python 99.21%

api django elasticsearch hacktoberfest

ccdb5-api's People

Contributors

Stargazers

Watchers

Forkers

amymok jeffreymfarley sephcoster adamzarger higs4281 yunghuffy schbetsy chosak dalavancloud gaybro8777 fagan2888 isabella232 bocarrasco47 verifiedfan lestachoez63

ccdb5-api's Issues

Total Record Count shows as 0

I see that there is an attempt to get total_record_count here but when I run it, it turns up zero:

  "_meta": {
    "total_record_count": 0,
    "last_updated": "2017-07-10T00:00:00.000Z",
    "license": "CC0"
  },

Pagination is not working

http://localhost:8000/data-research/consumer-complaints/search/api/v1/?product=Debt%20collection&size=10&sort=relevance_desc

produces the same set as

http://localhost:8000/data-research/consumer-complaints/search/api/v1/?product=Debt%20collection&size=10&sort=relevance_desc&frm=20

But with frm=20 that shouldn't be the case

Add ability to aggregate by company

Would be a very useful feature to be able to aggregate complaints by complaints per company -- even limited to the top 10 or 20 companies for a particular filter.

Get List of All Products and Companies

Hi,
Is there a way to get all products and companies using one of the GET endpoint? (I would assume they would have different query parameters if this is possible).

Unexplainable/undocumented behaivour of API

When you go to
https://cfpb.github.io/api/ccdb/api/index.html#/Complaints/get_
click "search consumer complaints", then "Try it out" and "Execute"

If you don't specify the format the response will be an html-page with js.
If you specify the format as json or csv it will never not load (hangs).

I get the same result with urllib library from python. The only workaround was to specify "?format=default" which is not documented

My insurance card hasnt came in but i have the number...its saying not active..

Swagger documentation on `/?field` appears to be incorrect.

Per the documentation:

Search by particular field. If this parameter is not specified, default behavior
will return only complaints with narratives.

Although, the first hit when you perform a search with no criteria is the complaint with ID 2662569, which does not have a narrative.

Suggest endpoint is not suggesting

It seems the current implementation is not behaving as expected.

Steps to reproduce:

/_suggest/?text=Mort&size=6
Returns

{
"sgg": [
  {
   "text": "Mort",
   "length": 4, 
   "options": [],
   "offset": 0
  }
 ],
}

I would expect "mortgage" as one example

State Aggregations seem to be cut off

When I search with state=WY, I get 166 matches, but aggregations.state does not list WY as one of the buckets.

Is this aggregation being limited? Notice sum_other_doc_count

  "aggregations": {
    ... snip ...
    "state": {
      "doc_count": 166148,
      "state": {
        "buckets": [
          {
            "doc_count": 23505,
            "key": "CA"
          },
          {
            "doc_count": 15105,
            "key": "FL"
          },
          {
            "doc_count": 14685,
            "key": "TX"
          },
          ... snip ...
          {
            "doc_count": 245,
            "key": "SD"
          },
          {
            "doc_count": 204,
            "key": "VT"
          },
          {
            "doc_count": 193,
            "key": "ND"
          }
        ],
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 597
      }
    },

Sort by Date is causing a 400 error

When I try http://localhost:8000/data-research/consumer-complaints/search/api/v1/?field=all&size=25&sort=created_date_asc

I get:

{"error":"Elasticsearch error: search_phase_execution_exception"}

The same happens for sort=created_date_desc

Documentation wrongly defines max date as "<="

In API documentation for Trends, date_received_max is defined as "Return results with date <= date_received_max (i.e. 2017-03-04)"
https://cfpb.github.io/api/ccdb/api/index.html#/Trends/get_trends

However, I believe that this actually returns results with date < the entered date, not <=. Please double check, but a couple searches seems to confirm this.

Installation fails with pip 10

Installation of this package with pip 10 fails due to the use of pip.req here in setup.py.

$ pip --version
pip 10.0.1 from ...
$ pip install git+https://github.com/cfpb/[email protected]#egg=ccdb5-api
Collecting ccdb5-api from git+https://github.com/cfpb/[email protected]#egg=ccdb5-api
  Cloning https://github.com/cfpb/ccdb5-api.git (to revision v1.0.5) to /private/var/folders/gg/wxm3673x2356ll65n0xbm_7r0000gn/T/pip-install-CacAhx/ccdb5-api
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/gg/wxm3673x2356ll65n0xbm_7r0000gn/T/pip-install-CacAhx/ccdb5-api/setup.py", line 32, in <module>
        install_requires = parse_requirements()
      File "/private/var/folders/gg/wxm3673x2356ll65n0xbm_7r0000gn/T/pip-install-CacAhx/ccdb5-api/setup.py", line 21, in parse_requirements
        requirements = pip.req.parse_requirements(
    AttributeError: 'module' object has no attribute 'req'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/gg/wxm3673x2356ll65n0xbm_7r0000gn/T/pip-install-CacAhx/ccdb5-api/

See this related discussion about use of pip internals; an alternative approach would be to define requirements in the setup.py install_requires instead of in requirements.txt. See related setuptools documentation here.

Highlighting is not working as expected

Highlight should occur in the field searched.

If user searches All , occurrences in any field should be highlighted.
If user searches Company, only company.
If user searches Narratives, only narratives.

Hits.total for the `/` endpoint does not parse as an integer

There seems to be a discrepancy between the Swagger documentation and the actual response object. The documentation indicates that Hits.total should parse as an integer.

In practice, for the / endpoint, I found that hits.total is an object that looks something like this:

{
            "value": 3198,
            "relation": "eq"
}

Example API call:

www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/?date_received_max=2021-09-15&date_received_min=2021-09-14

Sanitize special characters

An API request with special characters (curly quotes “) times out:
https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/?date_received_max=2022-01-04&date_received_min=2019-01-04&field=all&search_term=%E2%80%9Cmortgage%20default%E2%80%9D~3&size=25&sort=created_date_desc

Whereas one with regular quotes is fine:
https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/?date_received_max=2022-01-04&date_received_min=2019-01-04&field=all&search_term=%22mortgage%20default%22~3&size=25&sort=created_date_desc

The API should handle special characters in the request.

Trends API seems redirects to CFPB website

The Trends API seems broken.

Using the following code: curl -X GET "https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/trends?lens=overview&trend_interval=yearly" -H "accept: application/json"

The URL redirectS to the site https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/trends?dataLens=Overview&dataNormalization=None&dateInterval=Month&dateRange=3y&date_received_max=2020-07-15&date_received_min=2017-07-15&from=0&page=1&searchField=all&size=25&sort=created_date_desc&tab=Map

And if used in command line, curl returns the page HTML, rather than anything useful.

Other APIs don't seem to give the same problem.

Update django-flags version to match cfgov

https://github.com/cfpb/cfgov-refresh/blob/master/requirements/libraries.txt#L8

Undocumented server response on GET / search consumer complaints

Receiving undocumented response (TypeError: Failed to fetch)

api docs updates

There's a typo

https://cfpb.github.io/api/ccdb/api/index.html#/Complaints/get__complaintId_

it says find "comsumer" but should be "consumer"

Also I believe that the trends endpoint https://cfpb.github.io/api/ccdb/api/index.html#/Trends/get_trends
is missing the "focus" parameter description

Data Export Problems

1.) I still cannot use the raw size of the dataset. When I include the full size, it errors:

/data-research/consumer-complaints/search/api/v1/?date_received_min=2017-01-03&field=all&format=json&no_aggs=true&size=135811&sort=relevance_desc

2.) I get an error when I try to use CSV format

/data-research/consumer-complaints/search/api/v1/?date_received_min=2017-01-03&field=all&format=csv&no_aggs=true&size=10000&sort=relevance_desc

API limitations are not allowing for a meaningful usage

Queries' results are cut to a maximum of 100 records and parameter from doesn't work which prohibits most meaningful applications of the API presented

Questions about `catch_es_error` decorator's status code and response body

When doing some testing against a cf.gov environment that could not talk to Elasticsearch, I saw the following error at /data-research/consumer-complaints/search/api/v1/?field=all&size=25&sort=created_date_desc:

Elasticsearch error: <urllib3.connection.HTTPConnection object at 0x7f5558fef650>: Failed to establish a new connection: [Errno 111] Connection refused

This was in a non-debug environment where I would not expect to see a raw exception leaking into a response like that. I thought I'd open an issue in case this is something that could be adjusted.

I'm also wondering if a 400 (see here) is appropriate in this scenario, since it seems more like a 500 situation to me.

Several warnings and errors when running tests in Python 3

When running tox, during the py36-dj111 testing step, the tests pass, but they result in several warnings and errors printed to the testing output.

Here's a sample run of the test suite:

complaint_search/tests/test_view_suggest_company.py:2: RemovedInDjango20Warning: Importing from django.core.urlresolvers is deprecated in favor of django.urls.
  from django.core.urlresolvers import reverse
Creating test database for alias 'default'...
ccdb5_api/urls.py:20: RemovedInDjango20Warning: Passing a 3-tuple to django.conf.urls.include() is deprecated. Pass a 2-tuple containing the list of patterns and app_name, and provide the namespace argument to include() instead.
  url(r'^admin/', include(admin.site.urls)),
ccdb5_api/urls.py:21: RemovedInDjango20Warning: Specifying a namespace in django.conf.urls.include() without providing an app_name is deprecated. Set the app_name attribute in the included module, or pass a 2-tuple containing the list of patterns and app_name instead.
  url(r'^', include('complaint_search.urls', namespace="complaint_search")),
System check identified no issues (0 silenced).
..............................................................tox/py36-dj111/lib/python3.6/site-packages/django/core/handlers/base.py:52: RemovedInDjango20Warning: Old-style middleware using settings.MIDDLEWARE_CLASSES is deprecated. Update your middleware and use settings.MIDDLEWARE instead.
  "instead.", RemovedInDjango20Warning
TransportError(N/A, 'Error')
....TransportError(N/A, 'Error')
....Out of memory
.TransportError(N/A, 'Error')
................................................TransportError(N/A, 'Error')
.......TransportError(N/A, 'Error')
....TransportError(N/A, 'Error')
....
----------------------------------------------------------------------
Ran 133 tests in 3.675s

  lint: commands succeeded
  py36-dj111: commands succeeded
  congratulations :)

Search within "company name" not "company response"

According to the design specs, we need to search within the company field.

Steps to reproduce:

Call the API with the query string ?field=company&search_term=bank
Response is {"field":["\"company\" is not a valid choice."]}

API documentation is inaccurate on date_received_max parameter

It says that

date_received_max
string
(query)
Return results with date < date_received_max (i.e. 2017-03-04)

But in fact it queries with date <= date_received_max, for instance:

?date_received_min=2012-01-01&date_received_max=2012-01-01&format=default&size=50

Returns 14 records with date_received = 2012-01-01

I hope that this behavior won't change and the documentation will be fixed accordingly

Question about size parameter

The following GET call returns a response with 100 entries as oppose to the requested 150. It seems that any size parameter above 100 is limited to the first 100 entries.

Not sure if a feature or a bug but believe that any meaningful analysis would require more entries.

https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/?size=150&no_aggs=true

R chunck:

url <- 'https://www.consumerfinance.gov/data-research/consumer-complaints/search/api/v1/'
query_params <- list(size = 150, no_aggs = 'true')

response <- GET(url=url, query = query_params)

content <- jsonlite::fromJSON(httr::content(response, "text", encoding = 'UTF-8'))

My goal is to create an R API Client package for the complaint database and this issue limits most of its functionality.

Full API client package implementation can be found https://github.com/joseandresmontes/cfpbR

README missing documentation on how to load data

The first instructions in the setup section of the README are:

This repository assumes that you have an instance of elasticsearch running with complaint data set up and running.

I believe it would help get users off the ground if you could be more specific here, with links to any public data or even the expected schema of the data.