Coder Social home page Coder Social logo

insights-host-inventory's Introduction

Host Based Inventory

You've arrived at the repo for the backend of the Host Based Inventory (HBI). If you're looking for API, integration or user documentation for HBI please see the Inventory section in our Platform Docs site.

Getting Started

Snappy

This project uses Snappy compression to enhance its Kafka usage. The python-snappy package requires the core Snappy library to be installed on the machine, so start off by running the following:

sudo dnf install snappy

pg_config

Local development also requires the pg_config file, which is installed with the postgres developer library. To install this, use the command appropriate to your system:

Fedora/Centos

sudo dnf install libpq-devel

Debian/Ubuntu

sudo apt-get install libpq-dev

MacOS (using Homebrew)

brew install postgresql

Install dependencies

This project uses pipenv to manage the development and deployment environments. To set the project up for development, we recommend using pyenv to install/manage the appropriate python (currently 3.8.x), pip and pipenv version. Once you have pipenv, do the following:

pipenv install --dev

Afterwards you can activate the virtual environment by running:

pipenv shell

Included is a docker-compose file dev.yml that will start a postgres database that is useful for development.

docker-compose -f dev.yml up

Initialize the database

Run the following commands to run the db migration scripts which will maintain the db tables.

The database migration scripts determine the DB location, username, password and db name from the INVENTORY_DB_HOST, INVENTORY_DB_USER, INVENTORY_DB_PASS and INVENTORY_DB_NAME environment variables.

make upgrade_db

By default the database container will use a bit of local storage so that data you enter will be persisted across multiple starts of the container. If you want to destroy that data do the following:

docker-compose down

Running the Tests

You can run the tests using pytest:

pytest --cov=.

Or you can run the tests individually:

./test_api.py
pytest test_db_model.py
./test_unit.py
pytest test_json_validators.py

Depending on the environment, it might be necessary to set the DB related environment variables (INVENTORY_DB_NAME, INVENTORY_DB_HOST, etc).

Sonar Integration

This project uses SonarQube to perform static code analysis, monitor test coverage, and find potential issues in the Host Inventory codebase. The analysis is run automatically for each PR by the "host-inventory pr security scan" Jenkins job. The results are uploaded to RedHat's SonarQube server, on the console.redhat.com:insights-host-inventory project.

Contributing

This repository uses pre-commit to check and enforce code style. It uses Black to reformat the Python code and Flake8 to check it afterwards. Other formats and text files are linted as well.

Install pre-commit hooks to your local repository by running:

pre-commit install

After that, all your commited files will be linted. If the checks don’t succeed, the commit will be rejected, but the altered files from the linting will be ready for you to commit again if the issue was automatically correctable.

If you're inside the Red Hat network, please also make sure you have rh-pre-commit installed; instructions on installation can be found here. Then, verify the installation by following the Testing the Installation section. If you follow the instructions for Quickstart Install, and then re-enable running hooks in the repo's .pre-commit-config.yaml (instructions in the Manual Install section), both hooks should run upon making a commit.

Please make sure all checks pass before submitting a pull request. Thanks!

Running the server locally

Prometheus was designed to run in a multi-threaded environment whereas gunicorn uses a multi-process architecture. As a result, there is some work to be done to make prometheus integrate with gunicorn.

A temp directory for prometheus needs to be created before the server starts. The prometheus_multiproc_dir environment needs to point to this directory. The contents of this directory need to be removed between runs.

If running the server in a cluster, you can use this command:

gunicorn -c gunicorn.conf.py run

When running the server locally for development, the Prometheus configuration is done automatically. You can run the server locally using this command:

python3 run_gunicorn.py

Running all services locally

Honcho provides a command to run MQ and web services at once:

honcho start

Config env vars

 prometheus_multiproc_dir=/path/to/prometheus_dir
 APP_NAME="inventory"
 PATH_PREFIX="/r/insights/platform"
 INVENTORY_DB_USER="insights"
 INVENTORY_DB_PASS="insights"
 INVENTORY_DB_HOST="localhost"
 INVENTORY_DB_NAME="insights"
 INVENTORY_DB_POOL_TIMEOUT="5"
 INVENTORY_DB_POOL_SIZE="5"
 INVENTORY_LOGGING_CONFIG_FILE=logconf.ini
 INVENTORY_DB_SSL_MODE=""
 INVENTORY_DB_SSL_CERT=""
 UNLEASH_TOKEN='*:*.dbffffc83b1f92eeaf133a7eb878d4c58231acc159b5e1478ce53cfc'
 UNLEASH_CACHE_DIR=./.unleash
 UNLEASH_URL="http://localhost:4242/api"

To force an ssl connection to the db set INVENTORY_DB_SSL_MODE to "verify-full" and provide the path to the certificate you'd like to use.

Identity

It is necessary to provide a valid Identity both when testing the API, and when producing messages via Kafka. For Kafka messages, the Identity must be set in the platform_metadata.b64_identity field of the message. When testing the API, it must be provided in the authentication header x-rh-identity on each call to the service. For testing purposes, this required identity header can be set to the following:

x-rh-identity: eyJpZGVudGl0eSI6eyJvcmdfaWQiOiJ0ZXN0IiwidHlwZSI6IlVzZXIiLCJhdXRoX3R5cGUiOiJiYXNpYy1hdXRoIiwidXNlciI6eyJ1c2VybmFtZSI6InR1c2VyQHJlZGhhdC5jb20iLCJlbWFpbCI6InR1c2VyQHJlZGhhdC5jb20iLCJmaXJzdF9uYW1lIjoidGVzdCIsImxhc3RfbmFtZSI6InVzZXIiLCJpc19hY3RpdmUiOnRydWUsImlzX29yZ19hZG1pbiI6ZmFsc2UsImlzX2ludGVybmFsIjp0cnVlLCJsb2NhbGUiOiJlbl9VUyJ9fX0=

This is the Base64 encoding of the following JSON document:

{"identity":{"org_id":"test","type":"User","auth_type":"basic-auth","user":{"username":"[email protected]","email":"[email protected]","first_name":"test","last_name":"user","is_active":true,"is_org_admin":false,"is_internal":true,"locale":"en_US"}}}

The above header has the "User" identity type, but it's possible to use a "System" type header as well.

x-rh-identity: eyJpZGVudGl0eSI6eyJvcmdfaWQiOiAidGVzdCIsICJhdXRoX3R5cGUiOiAiY2VydC1hdXRoIiwgInN5c3RlbSI6IHsiY2VydF90eXBlIjogInN5c3RlbSIsICJjbiI6ICJwbHhpMTN5MS05OXV0LTNyZGYtYmMxMC04NG9wZjkwNGxmYWQifSwidHlwZSI6ICJTeXN0ZW0ifX0=

This is the Base64 encoding of the following JSON document:

{"identity":{"org_id": "test", "auth_type": "cert-auth", "system": {"cert_type": "system", "cn": "plxi13y1-99ut-3rdf-bc10-84opf904lfad"},"type": "System"}}

If you want to encode other JSON documents, you can use the following command:

echo -n '{"identity": {"org_id": "0000001", "type": "System"}}' | base64 -w0

Identity Enforcement

The Identity provided limits access to specific hosts. For API requests, the user can only access Hosts which have the same Org ID as the provided Identity. For Host updates via Kafka messages, A Host can only be updated if not only the Org ID matches, but also the Host.system_profile.owner_id matches the provided identity.system.cn value.

Using the legacy api

Some apps still need to use the legacy API path, which by default is /r/insights/platform/inventory/v1/. In case legacy apps require this prefix to be changed, it can be modified using this environment variable:

 export INVENTORY_LEGACY_API_URL="/r/insights/platform/inventory/api/v1"

Payload Tracker Integration

The inventory service has been integrated with the Payload Tracker service. The payload tracker integration can be configured using the following environment variables:

KAFKA_BOOTSTRAP_SERVERS=localhost:29092
PAYLOAD_TRACKER_KAFKA_TOPIC=platform.payload-status
PAYLOAD_TRACKER_SERVICE_NAME=inventory
PAYLOAD_TRACKER_ENABLED=true

The payload tracker can be disabled by setting the PAYLOAD_TRACKER_ENABLED environment variable to false. The payload tracker will also be disabled for add/delete operations that do not include a request_id. When the payload tracker is disabled, a NullObject implementation (NullPayloadTracker) of the PayloadTracker interface is used. The NullPayloadTracker implements the PayloadTracker interface but the methods are no-op methods.

The PayloadTracker purposefully eats all exceptions that it generates. The exceptions are logged. A failure/exception within the PayloadTracker should not cause a request to fail.

The payload status is a bit "different" due to each "payload" potentially containing multiple hosts. For example, the add_host operation will only log an error for the payload if the entire payload fails (catastrophic failure during processing...db down, etc). One or more of the hosts could fail during the add_host method. These will get logged as a "processing_error". If a host is successfully added/updated, then it will be logged as a "processing_success". Having one or more hosts get logged as "processing_error" will not cause the payload to be flagged as "error" overall.

The payload tracker status logging for the delete operation is similar. The overall status of the payload will only be logged as an "error" if the entire delete operation fails (a 404 due to the hosts not existing, db down, etc).

Generating a database migration script

Run this command to generate a new revision in migrations/versions

make migrate_db message="Description of revision"

Building a docker container image

A Dockerfile is provided for building local Docker containers. The container image built this way is only intended for development purposes (e.g. orchestrating the container using docker-compose) and must not be used for production deployment.

Note some of the packages require a subscription. Make sure the host building the image is attached to a valid subscription providing RHEL.

docker build . -f dev.dockerfile -t inventory:dev

By default, the container runs the database migrations and then starts the inventory-mq service.

Metrics

The application provides some management information about itself. These endpoints are exposed at the root path / and thus are accessible only from inside of the cluster.

  • /health responds with 200 to any GET requests; point your liveness or readiness probe here.
  • /metrics offers metrics and monitoring intended to be pulled by Prometheus.
  • /version responds with a json doc that contains the build version info (the value of the OPENSHIFT_BUILD_COMMIT environment variable)

Cron jobs such as reaper and sp-validator push their metrics to a Prometheus Pushgateway instance running at PROMETHEUS_PUSHGATEWAY. Defaults to localhost:9091.

Release process

This section describes the process of getting a code change from a pull request all the way to production.

1. Pull request

It all starts with a pull request. When a new pull request is opened, some jobs are run automatically. These jobs are defined in app-interface here.

Should any of these fail this is indicated directly on the pull request.

When all of these checks pass and a reviewer approves the changes the pull request can be merged by someone from the @RedHatInsights/host-based-inventory-committers team.

2. Latest image and smoke tests

When a pull request is merged to master, a new container image is built and tagged as insights-inventory:latest. This image is then automatically deployed to the Stage environment.

3. QE testing in the Stage environment

Once the image lands in the Stage environment, the QE testing can begin. People in @platform-inventory-qe run the full IQE test suite against Stage, and then report the results in the #platform-inventory-standup channel.

4. Promoting the image to the production environment

In order to promote a new image to the production environment, it is necessary to update the deploy-clowder.yml file. The ref parameter on the prod-host-inventory-prod namespace needs to be updated to the SHA of the validated image.

Once the change has been made, submit a merge request to app-interface. For the CI pipeline to run tests on your fork, you'll need to add @devtools-bot as a Maintainer. See this guide on how to do that.

After the MR has been opened, somebody from AppSRE/insights-host-inventory will review and approve the MR by adding a /lgtm comment. Afterwards, the MR will be merged automatically and the changes will be deployed to the production environment. The engineer who approved the MR is then responsible for monitoring of the rollout of the new image.

Once that happens, contact @platform-inventory-qe and request the image to be re-tested in the production environment. The new image will also be tested automatically when the Full Prod Check pipeline is run (twice daily).

Monitoring of deployment

It is essential to monitor the health of the service during and after the production deployment. A non-exhaustive list of things to watch includes:

Deployment rollback

Should unexpected problems occur during the deployment, it is possible to do a rollback. This is done by updating the ref parameter in deploy-clowder.yml to point to the previous commit SHA, or by reverting the MR that triggered the production deployment.

Updating the System Profile

In order to add or update a field on the System Profile, first follow the instructions in the inventory-schemas repo. After an inventory-schemas PR has been accepted and merged, HBI must be updated to keep its own schema in sync. To do this, simply run this command:

make update-schema

This will pull the latest version of the System Profile schema from inventory-schemas and update files as necessary. Open a PR with these changes, and it will be reviewed and merged as per the standard process.

Deploying Host Inventory and Xjoin-search to Kubernetes Namespaces

A list of high level steps is provided here

Debugging Local Code with Services Deployed to Kubernetes Namespaces

This section relies on successful deployment of host-inventory and xjoin-search following the instructions in the previous section. To make local code work with the services running in Kubernetes requires some actions provided here

Running ad hoc jobs using a different image

There may be a job (ClowdJobInvocation) which requires using a special image that is different from the one used by the parent application, i.e. host-inventory. Clowder out-of-the-box does not allow it. Running a Special Job describes how to accomplish it.

insights-host-inventory's People

Contributors

aleccohan avatar beav avatar blakeholifield avatar bsquizz avatar casey-williams-rh avatar chambridge avatar ckyrouac avatar codeheeler avatar dehort avatar dependabot[bot] avatar fabriciadinizrh avatar fellipeh avatar fstavela avatar github-actions[bot] avatar glutexo avatar gmcculloug avatar jharting avatar jhjaggars avatar jpramos123 avatar kruai avatar kylape avatar paulway avatar roliveri avatar ronnypfannschmidt avatar ryandillinfelton avatar shdunning avatar stevehnh avatar tahmidefaz avatar thearifismail avatar victoremepunto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

insights-host-inventory's Issues

Get count of hosts in query

For Advisor's 'total systems' statistics, we only need to get a count of the systems with an Advisor ID, rather than getting the full transfer of every host's data and then counting them in Advisor. If there was a way to have some queries only return the count of hosts, rather than the list of hosts, that would save a lot of data transfer and time.

Ultimately fix code formatting

The code is a formatting mess. Virtually every Python file is formatted completely randomly, only remotely adhering to PEP 8.

@jhjaggars suggested using Black, which is a reasonable choice. It’d be nice to use a linter to block pull requests from being merged, but as first step, we can at least:

  • Reformat existing code.
  • Don’t push unformatted code any more.
  • Don’t approve unformatted pull requests.

The first bullet point is the only one doable immediately to close this issue. I am aware that this would break all pull requests. Let’s take advantage of the fact that there are not many of them pending at the moment.

Mention database migration in README

Before starting using the application it is necessary to run the database migrations. Otherwise there will be no hosts table and requests will fail.

Mention this in the README and provide a guide on how to initialize the database. This includes:

  • Create the database. This is probably done automatically when using Docker Compose.
  • Run the migrations with python manage.py upgrade. Don’t forget to set the INVENTORY_DB_NAME and other environment variables.
  • Mention that if using different a database for testing, the migration must be run for it too.

Replace Host.to_json() methods with different exporter objects

Think about replacing the to_json() methods (there are 2 now) with an object that exports the data in the correct format.

Thinking out loud here... Maybe this doesn't really need to be an object. Maybe a simple exporter function that gets passed around. Maybe its a class with the __call__() method on it.

The Host db object shouldn't be responsible for knowing all the different export formats.

Prefix private functions in API module

PEP 8 encourages prefixing a function name with an underscore if it’s intended for internal use. In the api.host module, the only public functions are those that represent API operations. All other functions are used only internally by this module and thus should be prefixed. This improves readability.

single_leading_underscore: weak "internal use" indicator. E.g. from M import * does not import objects whose names start with an underscore.

It may be even worth exposing the API operation functions in the all variable. Although they are not manually imported anywhere, they form this module’s interface and it’s what Connexion uses. This, again, is recommended by PEP 8.

To better support introspection, modules should explicitly declare the names in their public API using the all attribute. Setting all to an empty list indicates that the module has no public API.

Even with all set appropriately, internal interfaces (packages, modules, classes, functions, attributes or other names) should still be prefixed with a single leading underscore.

Wrong format of `x-rh-identity` in auth/__init__.py

Identity header should have identity key and then should follow user informations.

Example:

  • Curent
    {account_number: '1111', org_id: '11111c1'} in BASE64: eyJhY2NvdW50X251bWJlciI6IjExMTEiLCJvcmdfaWQiOiIxMTExMWMxIn0=
  • Expected
    {identity: {account_number: '1111', org_id: '11111c1'}} in BASE64 eyJpZGVudGl0eSI6eyJhY2NvdW50X251bWJlciI6IjExMTEiLCJvcmdfaWQiOiIxMTExMWMxIn19

Host 'id' field - UUID or Integer?

In the model definition of the id field:
id = db.Column(db.Integer, primary_key=True)
In the migration definition of the id field:
Column("id", UUID(), primary_key=True)
Is it an integer or a UUID?

TypeError when terminating local Gunicorn

When a local Gunicorn server run by the run_gunicorn.sh script is terminated by a KeyboardInterrupt, another exception occurs:

Traceback (most recent call last):
  File "/…/insights-host-inventory/run_gunicorn.py", line 38, in <module>
    run_server()
  File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/…/insights-host-inventory/run_gunicorn.py", line 24, in prometheus_temp_dir
    putenv(PROMETHEUS_ENV_VAR, orig)
TypeError: expected str, bytes or os.PathLike object, not NoneType

This is caused by returning a backed up PROMETHEUS_ENV_VAR value. It doesn’t work if the value was not set in the first place as it’s not possible to store None in an environment value.

Namespace KeyError creates an internal server error

When posting to the \hosts endpoint, if the body's facts dictionary does not contain the key namespace an internal server error will be raised.

Logs:

File "/insights-host-inventory/app/auth/init.py", line 46, in _wrapper
return view_func(*args, **kwargs)
File "/insights-host-inventory/api/host.py", line 35, in addHost
input_host = Host.from_json(host)
File "/insights-host-inventory/app/models.py", line 93, in from_json
convert_json_facts_to_dict(d.get("facts", [])),
File "/insights-host-inventory/app/models.py", line 41, in convert_json_facts_to_dict
if fact["namespace"] in fact_dict:
KeyError: 'namespace'

Potential Fix:

File: app/models.py

def convert_json_facts_to_dict(fact_list):
    fact_dict = {}
    for fact in fact_list:
        if fact.get("namespace", False):
            if fact["namespace"] in fact_dict:
                fact_dict[fact["namespace"]].update(fact["facts"])
            else:
                fact_dict[fact["namespace"]] = fact["facts"]
        else: 
            # 400 Error
    return fact_dict

Host lists should be paginated correctly

The current (attempt at) pagination structure returned by getHostList and getHostById is not defined in the Swagger interface description. It is also not implemented as yet.

The usual pagination data in a paginated view includes:

  • The current page length
  • The current page number OR the offset from the first item
  • The total number of items in the requested set (where possible)
  • The values as a list
  • Links to the 'next' and 'previous' pages, if available.

It should also be possible to request a particular page of items or the offset from the first item, and the number of items per page.

Use of dash in properties conflicts with Javascript and Python usage

After @jhjaggars's work breaking the canonical_facts JSON into separate JSON properties, e.g. insights-id and fqdn, some of these properties contain dashes. As such they then cannot be imported into Javascript and used as object properties, because object property names cannot contain dashes. For the same reason, they are not compatible with Django-rest-framework serializers as serializer fields are implemented as class methods which have the same restriction on using dashes.

I would recommend these be changed to underscores throughout.

This would also make them more consistent with properties like display_name, created_on and modified_on.

Inconsistent facts constrains

The API specification imposes inconsistent constrains on facts in different operations.

I think this can be resolved by referencing ($ref) the Facts definition from the FactSet.facts field.

Steps to reproduce:

  1. Create a host with a non-string fact.
    POST http://localhost:8080/r/insights/platform/inventory/api/v1/hosts
    
    {
      "account": "abc123",
      "display_name": "A host",
      "insights_id": "3f01b55457674041b75e41829bcee1da",
      "facts": [
        {
          "namespace": "complicated",
          "facts": {
            "complex": ["this", "is", "an", "array"]
          }
        }
      ]
    }
  2. See that the hosts is successfully created.
    Response code: 200 (OK)
    
  3. Replace (or merge) the host’s facts with the same set.
    PUT http://localhost:8080/r/insights/platform/inventory/api/v1/hosts/d24f3e10-58ed-4f11-8ed7-2142cc849d84/facts/complicated
    
    {
      "complex": ["this", "is", "an", "array"]
    }
  4. See that the request fails as a Bad Request.
    {
      "detail": "['this', 'is', 'an', 'array'] is not of type 'string'",
      "status": 400,
      "title": "Bad Request",
      "type": "about:blank"
    }
    Response code: 400 (BAD REQUEST)
    

Make the tests structured

All tests are currently stuffed in two files: test_api.py and test_unit.py. As new tests and cases are added with every pull request, it’d be nice to have them more structured.

I suggest putting them in a /tests folder and then possibly into api and unit subfolders. Then I’d put every test case in a single file, or at least keep only the most closely related together.

Any further improvements like splitting into finer categories or utilizing Pytest etc. can be done subsequently as new issues.

Sorting parametrs

It looks like we are missing sorting parameters for inventory table. Requred sort fields are name || fqdn || id and last sync. We are currently sorting just page user is located at which is not good.

Running the tests destroys the current database

The test script test_api.py seems to use the current database name and credentials to set up the test environment. During test tear-down, the database is destroyed in lines 40-42:

            # drop all tables
            db.session.remove()
            db.drop_all()

This causes the deletion of all data in the current database.
I would recommend the tests construct a temporary database and initialise it. Then tear-down of that database will not affect the main database.

Querying hosts with large page number

When trying to fetch page that is out of page range (10 pages in database, but user requests 15th page) we should return last page instead of error.

Unify the query functions

Many of the query functions called by get_host_list are essentially one-liners, or will became ones if #145 gets merged. At the same time, the else branch doesn’t have its own function. It would be nice to unify this a bit, doesn’t really matter which way.

Do not build URIs from strings

In various places in the code, URIs are built and manipulated as bare strings. This applies mainly to:

In the configuration, environment variables are loaded and combined to configure the application itself. In the tests, URIs are built to make requests to the test client.

In simple cases, string manipulation may be sufficient. Since different parts of the URL (hostname, path, query) have different encoding and serialization rules, it can easily end up with bugs occurring with specific values. Also, further issues may occur like when comparing URI queries in #187.

It’d be good to leverage urllib.parse as it’s already done by some tests.

Start using Prometheus for metrics

Use the official Prometheus client to collect metrics from the app. This consists of several steps:

  • Allow a composition of WSGI applications (#59, #60)
  • Install the client and use it. (#63, 64)
  • Create a custom metric to ensure everything works.
  • Configure this in OpenShift.

There are already pull requests waiting for a review that make the app use Prometheus with its default settings.

Create a health check endpoint

The application needs a health check endpoint simply returning 200 to any GET request.

The health check request must not require authentication. Thus it’s necessary to allow unauthenticated operations (#50) in the first place. Everything up to the real-world usage is reviewable and mergeable even now though.

  • Create a health check endpoint (#39)
  • Document the health check endpoint (#49)
  • Enable a health probe on the deployed instance

Pull requests:

  • #39 ready for a review
  • #49 waiting for #39 to get reviewed and merged

HostWrapper constructor is not stateless

Never use a mutable object as a function argument default value! The HostWrapper constructor does this and it results in unexpected behavior. See:

>>> type(HostWrapper().insights_id)
<class 'NoneType'>
>>> HostWrapper().insights_id = "337a287f-e2d9-4692-856d-19efc548a04d"
>>> HostWrapper().insights_id
'337a287f-e2d9-4692-856d-19efc548a04d'

Why this happens? Because the default constructor argument (data) is always the same object. This happens:

  1. The first instance receives a default argument – a dictionary.
  2. This dictionary is stored in an instance variable (__data).
  3. The "insights_id" value is stored as a new key into the __data dictionary.
  4. The second instance receives the same dictionary as a default value. This dictionary already has the "insights_id" key from the assignment to the first instance.

If you create many hosts without passing the data argument to the constructor. All the instances will share the same __data attribute.

Finish and clean-up requiring identity header by API specs

The current PR that automatically makes the requests authenticated by reading the OpenAPI specification is only in its smallest working state. It requires some polishing for it to have its fixed place in the repository:

  • Remove the black magic that patches existing methods in an imported module. This is a simple code, but an obvious anti-pattern though. Use a custom resolver instead.
  • Support parameter overloading: a Path Item object can override the parameters defined for the whole path. Examine the described and the real behavior.
  • Support custom resolvers. The Connexion app can be given a custom resolver class that translates the operation to a view function. That is the one to be decorated.

Fix minor issues in the OpenAPI specification file

  • collectionFormat: csv is not necessary, it’s the default value.
  • Every request can result in a 400 Bad Request response if it doesn’t match the specification. No need to explicitly state it for operations that don’t programmatically return it.
  • getHostById actually don’t return 404 Not Found if a host is not found, but 200 with an empty list.
  • mergeFacts and replaceFacts don’t return 404 Not Found if an invalid host or namespace is given, but
  • Unify string quoting, at least for the HTTP response codes.
  • Remove unnecessary string quoting, especially in the paths
  • Remove commented out operations.
  • DRY the hostId parameter by extracting it to the parameters specification.
  • Unify the parameter names to use only snake case or camel case.
  • Use plural names for array parameters like hostId.
  • Missing reference from a FactSet to Facts.
  • Use "type": ["string", "null"] for nullable types instead of non-standard x-nullable: true.
  • Use hyphens in the UUID examples. First, check that the API accepts this format.
  • Disable additional object fields by adding additionalProperties: False to make the schema more robust.
  • Extract pagination to a standalone object and reference it from HostQueryOutput.

Emptiness check on fact merge is unnecessary

Checking whether the fact dictionary is empty on fact merge is not necessary. There is nothing wrong with a no-op merge. The check with its 400 response can be removed.

If there is some reason to keep this check I’m not aware of, it can be moved to the OpenAPI specification. That would mean to specify a schema for a non-empty fact set.

In both cases, the result would be a more coherent and simpler code.

Get all hosts with a particular service identifier

For Advisor, it is useful if we only get hosts which have an Insights ID. These are the only hosts that will have Advisor reports filed for them. Advisor does not want to display hosts that are only identified to other services.

No filtering by account when finding by hostname or ID

When hosts are filtered by the hostname_or_id parameter, they are not filtered by account number. The condition simply isn’t there. An account_number argument is passed, but it’s not used.

Steps to reproduce:

  1. Create a host with one account number. (Here: abc123)
    POST http://localhost:8080/r/insights/platform/inventory/api/v1/hosts
    Content-Type: application/json
    x-rh-identity: eyJpZGVudGl0eSI6eyJhY2NvdW50X251bWJlciI6ImFiYzEyMyIsIm9yZ19pZCI6ImJjZDIzNCJ9fQ==
    
    [
      {
        "account": "abc123",
        "fqdn": "some.thing.cz"
      }
    ]
  2. Query for this hosts using a different account number. (Here: 0000001)
    GET http://localhost:8080/r/insights/platform/inventory/api/v1/hosts?hostname_or_id=some.thing.cz
    x-rh-identity: eyJpZGVudGl0eSI6eyJhY2NvdW50X251bWJlciI6IjAwMDAwMDEifX0=
    
  3. See that the host is retrieved.
    {
      "count": 1,
       "page": 1,
      "per_page": 50,
      "results": [
        {
          "account": "abc123",
          "bios_uuid": null,
          "created": "2019-03-18T12:47:28.151448Z",
          "display_name": "some.thing.cz",
          "external_id": null,
          "facts": [],
          "fqdn": "some.thing.cz",
          "id": "e6e717fb-eba2-47c7-ac00-a37532e33ce2",
          "insights_id": null,
          "ip_addresses": null,
          "mac_addresses": null,
          "rhel_machine_id": null,
          "satellite_id": null,
          "subscription_manager_id": null,
          "updated": "2019-03-18T12:47:28.151468Z"
        }
      ],
      "total": 1
    }

Remove the auth bypass

If the debug mode (FLASK_DEBUG) is on an a NOAUTH environment variable is set, the x-rh-identity header is not required. Although it looks nice, it doesn’t work as intended: virtually all the API calls crash if the identity data is not available. I would simply remove this.

Extract the marker pagination

The marker pagination code (from #143) used to download is a great candidate for extraction. This is a very common pattern that is probably going to be used not only in the future here, but also possibly in other repositories.

My idea is that this will be some kind of iterator. It would be constructed from a base query and a list of the marker columns and would then yield single records.

Failing metrics test

The test_api.HealthTestCase.test_metrics is failing. The problem is that this endpoint uses the Prometheus library and requires the prometheus_multiproc_dir environment variable to be set.

  • Skip the test if the environment variable is not set.
  • Create a test launcher that uses a temporary directory for this. Reuse the mechanism from run_gunicorn.py.

Report Internal Server Errors in a standard way

When an Internal Server Error occurs, the response should still a valid error JSON. Now, the response is an HTML document:

<html>
<head>
    <title>Internal Server Error</title>
</head>
<body>
<h1><p>Internal Server Error</p></h1>

</body>
</html>

The error should be in the same format that Connexion emits on failure. Content-Type is application/problem+json and it looks like this:

{
  "detail": "You don't have the permission to access the requested resource. It is either read-protected or not readable by the server.",
  "status": 403,
  "title": "Forbidden",
  "type": "about:blank"
}

I expect that this can be fixed by setting up a Flask error handler. It might be even possible to use the Connexion’s error handler instead of just mimicking its format.

Validation errors when running tests

Running test_api.py raises a lot of ValidationErrors. They are apparently caused by #160 introducing Marshmallow to do validations. It’s not possible to add some hosts now.

Sample error:

Input validation error while adding host: {'account': '000501', 'display_name': '', 'ip_addresses': ['10.10.0.1'], 'facts': [{'namespace': 'ns1', 'facts': {'key1': 'value1'}}]}
Traceback (most recent call last):
  File "/Users/stomsa/Vývoj/Insights/Repozitáře/insights-host-inventory/api/host.py", line 30, in add_host_list
    (host, status_code) = _add_host(host)
  File "/Users/stomsa/Vývoj/Insights/Repozitáře/insights-host-inventory/api/host.py", line 67, in _add_host
    validated_input_host_dict = HostSchema(strict=True).load(host)
  File "/Users/stomsa/.virtualenvs/insights-host-inventory-_ShLCp88/lib/python3.6/site-packages/marshmallow/schema.py", line 588, in load
    result, errors = self._do_load(data, many, partial=partial, postprocess=True)
  File "/Users/stomsa/.virtualenvs/insights-host-inventory-_ShLCp88/lib/python3.6/site-packages/marshmallow/schema.py", line 711, in _do_load
    raise exc
marshmallow.exceptions.ValidationError: {'display_name': ['Length must be between 1 and 200.']}

Creating a host via the `addHost` op does not set the id

Using the following command to create a new host:

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/problem+json' -d '{"account": "1234567", "display_name": "system1.example.com", "canonical_facts": {"insights_uuid": "00112233-4455-6677-8899-AABBCCDDEE01"}}' 'http://127.0.0.1:5000/api/hosts'

generates the following error:

sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, 1234567, system1.example.com, 2018-11-01 00:01:53.569721, 2018-11-01 00:01:53.569729, {}, [], {"insights_uuid": "00112233-4455-6677-8899-AABBCCDDEE01"}).
 [SQL: 'INSERT INTO hosts (account, display_name, created_on, modified_on, facts, tags, canonical_facts) VALUES (%(account)s, %(display_name)s, %(created_on)s, %(modified_on)s, %(facts)s, %(tags)s, %(canonical_facts)s) RETURNING hosts.id'] [parameters: {'account': '1234567', 'display_name': 'system1.example.com', 'created_on': datetime.datetime(2018, 11, 1, 0, 1, 53, 569721), 'modified_on': datetime.datetime(2018, 11, 1, 0, 1, 53, 569729), 'facts': '{}', 'tags': '[]', 'canonical_facts': '{"insights_uuid": "00112233-4455-6677-8899-AABBCCDDEE01"}'}] (Background on this error at: http://sqlalche.me/e/gkpj)

I can't see anywhere in the code where a new ID is generated for a new host.

Allow unauthenticated routes

There are at least two reasons why we need to allow unauthenticated routes:

  • A health check endpoint for the health probe (#46)
  • The Swagger UI Console

To be compliant with the OpenAPI specification file and to avoid ugly hacks for the Swagger UI, authenticated operations must be explicitly marked. All other ones are considered not requiring the identity header.

Steps:

  • Support unauthenticated operations (#50)
  • Allow a creation of additional operations in tests (#45)
  • Test unauthenticated operations (#52)
  • Mark the unauthenticated operations automatically by the OpenAPI specification (No pull request yet)

Pull requests:

  • #50 ready for a review, without functional tests
  • #45 ready for a review, will allow having functional tests for #50
  • #52 waiting for #50 and #45 to get reviewed and merged

Remove extra `api/` in base path

Based on the REST API Guidelines IPP APIs should listen on a path formatted like /$PATH_PREFIX/$APP/v$VERSION/$RESOURCE/$ID

Which, for example, would give me a path like /r/insights/platform/inventory/v1/hosts rather than the current path which is /r/insights/platform/inventory/api/v1/hosts

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.