Coder Social home page Coder Social logo

openownership / lib-cove-bods Goto Github PK

View Code? Open in Web Editor NEW
1.0 9.0 0.0 374 KB

Check that your data complies with the Beneficial Ownership Data Standard (BODS) using our install our data review library to analyse files via your command line interface

Home Page: https://datareview.openownership.org/

License: Other

Python 99.38% Shell 0.62%
beneficial-ownership beneficial-ownership-data

lib-cove-bods's Introduction

Lib Cove BODS

Command line

Installation

Installation from this git repo:

git clone https://github.com/openownership/lib-cove-bods.git
cd lib-cove-bods
python3 -m venv .ve
source .ve/bin/activate
pip install -e .

Running the command line tool

Call libcovebods.

libcovebods -h

Running tests

python -m pytest

Code linting

Make sure dev dependencies are installed in your virtual environment:

pip install -e .[dev]

Then run:

isort libcovebods/ tests/ setup.py
black libcovebods/ tests/ setup.py
flake8 libcovebods/ tests/ setup.py

Updating schema files in data

This library contains the actual data files for different versions of the schema, in the libcovebods/data directory.

To update them, you need:

To update a file:

First go to your checkout of the data standard repository and make sure you have checked out the correct tag or branch. ie. To update the libcovebods/data/schema-0-2-0.json file, check out 0.2.0

Run the compile tool, telling it where the codelists directory is and pipe the output to the file for the version you have checked out:

compiletojsonschema -c openownership-data-standard/schema/codelists/ openownership-data-standard/schema/bods-package.json > openownership-lib-cove-bods/libcovebods/data/schema-0-2-0.json  

Due to openownership/data-standard#375 you may have to do some editing by hand when using early versions of the schema, pre 0.3. Open the files in libcovebods/data. At the top level there is an oneOf with 3 statement types - people, entity, and ownershipOrControl. In each of these statement types, there is an enum for the statementType field. This enum should have one option only - the value for whatever type of statement it is. (ie The person statement should only have the personStatement value). This tool may have added extra options - if so, remove them by hand.

lib-cove-bods's People

Contributors

bjwebb avatar blueskies00 avatar kd-ods avatar odscjames avatar rhiaro avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lib-cove-bods's Issues

New stat/report: bodsVersion

A breakdown of Statement.publicationDetails.bodsVersion. Would give reports like:

Publication details

  • 236 statements published as BODS v1.0
  • 16 statements published as BODS v0.3

Additional check: hasPublicListing must exist and be true in certain conditions

Check: If either of the PublicListing.companyFilingsURLs or PublicListing.securitiesListings arrays is non-empty then hasPublicListing should be true.

Error message: "hasPublicListing does not exist or is false. Information has been provided under companyFilingsURLs or securitiesListings so hasPublicListing must be true."

New stat/report: pepStatusDetails.missingInfoReason

Number of Person Statements with a pepStatusDetails.missingInfoReason value which also have a hasPepStatus value.

Would give reports like:

Person Statements which have a reason for missing PEP information but which do declare a PEP status: 2

Problem validating BODS json without a 'statementDate'

statementDate is not a required field. When I run this example file (4a-simple-pep-declaration.json) through CoVE, I get the following error:

Traceback (most recent call last):
  File "/home/kadie/code/lib-cove-bods/.ve/bin/libcovebods", line 11, in <module>
    load_entry_point('libcovebods', 'console_scripts', 'libcovebods')()
  File "/home/kadie/code/lib-cove-bods/libcovebods/cli/__main__.py", line 19, in main
    file_type='json'
  File "/home/kadie/code/lib-cove-bods/libcovebods/api.py", line 38, in bods_json_output
    common_checks_bods(context, output_dir, json_data, schema_bods, lib_cove_bods_config=lib_cove_bods_config)
  File "/home/kadie/code/lib-cove-bods/libcovebods/common_checks.py", line 18, in common_checks_bods
    'statistics': get_statistics(json_data),
  File "/home/kadie/code/lib-cove-bods/libcovebods/lib/common_checks.py", line 79, in get_statistics
    year = int(statement['statementDate'].split('-')[0])
KeyError: 'statementDate'

COVE Validation check: Check that dates of death of natural persons are sane

I'm moving this over from the (internal ODSC) issue. There was no further specification in the issue.

I think we could apply the following sanity checks:

  • date of death is not before the date of birth
  • date of death is less than 120 years after the date of birth
  • date of death is not in the future
  • date of death is not prior to some date in the distant past (1800?)

CoVE requirements: Jurisdiction codes

jurisdiction/code should be an ISO_3166-2 2-Digit country code, or ISO_3166-2 sub-division code.

country/code should be an ISO_3166-2 2-Digit country code.

We don't use an enum in the schema, so perhaps a warning is appropriate.

Tighten up reporting of entity identifiers

If I submit data on 3 entities which have the following identifiers:

{
    "id": "07444723", 
    "scheme": "XX-XXX"
  }

  {
    "id": "Helloimanid",
    "schemeName": "Kadie's register"
  }

  {
    "id": "07444723", 
    "scheme": "GB-COH"
  }

The following is reported:
entityIDs

@kd-ods has suggested that this would be more useful if:

i) The text read:

  • % of registered or legal entities with an identifier (an ID plus scheme or schemeName):
  • % of registered or legal entities with a 'scheme' value from Org-ID.guide:

ii) The statistical calculations were altered accordingly. So the data above would give:

  • % of registered or legal entities with an identifier (an ID plus scheme or schemeName): 100%
  • % of registered or legal entities with a 'scheme' value from Org-ID.guide: 33%

Validation error vs Additional checks

We need to be clear what the difference is between the two, be able to state which of the two it is when raising a request for a new test and document these differences in the interface so that it is clear to people when looking at their results.

Count of previous statements that are referenced from replacesStatement and NOT in the dataset

There may be the following scenario:

Statement A [ MISSING ]
Statement B Replaces Statement A
Statement C Replaces Statement A, B

@odscjames raised the question of whether that data set is valid - should Statement C list A as well as B?

The docs are not clear on what the expected use of replacesStatement is in this case. @kd-ods suggested that we should require that Statement C lists both A and B, which would support the streaming API use case.

In relation to this particular statistical count, @kd-ods thinks the count should be 1. Naming this count 'Replaced statements missing from the dataset' would make that clearer.

Ingest BODS 0.4 data

  • Upgrade requirements (match versions in data-standard, solve any issues)
  • build schema package file
  • data: add schema-0-4-0.json
  • Update DataReader (count_statement_types etc.)
  • Update SchemaBODS (default version, types methods)
  • Update JSONSchemaValidator (switch Draft4/2020-12 base on schema version)

Unhelpful validation error message

(I hope this is the right repo for this issue.) Try running this json through the data review tool:

[ { "annotations": [ { "motivation": "correcting" } ], "addresses": [ { "country": "EE", "type": "residence" } ], "birthDate": "1985-02", "identifiers": [ { "id": "280871-*****", "schemeName": "GTR-PASSPORT" } ], "isComponent": false, "names": [ { "familyName": "Nonymou", "fullName": "A Nonymous", "givenName": "A", "type": "individual" } ], "nationalities": [ { "code": "GB" } ], "personType": "knownPerson", "publicationDetails": { "bodsVersion": "0.2", "license": "http://creativecommons.org/publicdomain/zero/1.0/", "publicationDate": "2020-11-10", "publisher": { "name": "Test Register", "url": "http://www.test.com" } }, "source": { "assertedBy": [ { "name": "Me" } ], "type": [ "selfDeclaration", "officialRegister" ] }, "statementDate": "2020-10-30", "statementID": "d5764567474-d2a5-4e89-be41-614crgdtgsdgsgsfg", "statementType": "personStatement" } ]

It throws the error: '{'motivation': 'correcting'} is not valid under any of the given schemas'. This does not help with troubleshooting. A better message would be:

'{'motivation': 'correcting'} is not valid according to the BODS schema. See https://github.com/openownership/data-standard/tree/master/schema.

Or if we could link to the relevant place in the BODS schema that would be even better. (In this case:

https://github.com/openownership/data-standard/blob/master/schema/components.json#L394, OR

https://github.com/openownership/data-standard/blob/7693b43c7d506106196bd83c7cd440e57236341d/schema/components.json#L329)

Issues running with libcove 0.19.0

I've run into an issue today where my fresh installation of lib-cove-bods 0.8.0 cannot run on the CLI because of a missing config attribute on the SchemaBODS object.

It seems config was recently introduced in lib-cove 0.19.0 and it gets installed because lib-cove-bods specifies >= 0.18.0 in setup.py

As a temporary workaround I can specify libcove==0.18.0 in my requirements.txt to make sure I get a compatible version, but it seems like we either need to fix the incompatibility or pin the version more strictly. I suspect the latter won't affect anyone, you're only likely to have libcove because of libcovebods, so perhaps that's a good idea anyway?

Establish whether v0.3 property renaming requires code updates

In BODS 0.3 the following properties have been renamed:

  • interestLevel renamed to directOrIndirect
  • incorporatedInJurisdiction renamed to jurisdiction

We need to ascertain whether any additional checks relate to these properties and update the code if necessary.

New validation check: isComponent part 2

If an entity, person or ooc statement has isComponent value ‘true’ check that the primary ooc Statement appears after the component statement in the BODS package.

Custom validation tests: update specification

Existing custom tests (aka 'additional checks) are:

  • entity_identifier_scheme_not_known
  • inconsistent_schema_version_used
  • wrong_address_type_used
  • person_birth_year_too_early
  • person_birth_year_too_late
  • ownership_or_control_statement_has_is_compontent_and_component_statement_ids
  • statement_is_beneficialOwnershipOrControl_but_no_person_specified
  • alternative_address_with_no_other_address_types
  • statement_is_component_but_is_after_use_in_component_statement_id
  • entity_statement_out_of_order
  • person_statement_out_of_order
  • entity_statement_not_used_in_ownership_or_control_statement
  • statement_is_component_but_not_used_in_component_statement_ids
  • person_statement_not_used_in_ownership_or_control_statement
  • entity_statement_missing
  • person_statement_missing
  • component_statement_id_not_in_package
  • duplicate_statement_id

We need to set up some way of specifying (for now and into the future) which custom tests are needed for each version of the data standard. We can probably build on this sheet which was created to support the building and testing of the in-schema validation checks for 0.4.

Edit: 17-06-2024

The list above needs to be supplemented with the following, I think:

  • CheckHasPublicListing
  • CheckEntityTypeAndEntitySubtypeAlign
  • CheckEntitySecurityListingsMICSCodes

(See the classes defined at the end of this script.)

New stat/report: hasPepStatus

Person Statements with a hasPepStatus value (true or false).

Would give reports like:

Person Statements where PEP status is declared: 235 of 241

Test: check that personal data fields are not populated for anonymousPerson

There should be a warning if:

PersonStatement.personType is 'anonymousPerson'

AND

  • PersonStatement.names contains any name object with a non-empty string for *name; or
  • PersonStatement.identifiers contains any identifier object with a non-empty string for id; or
  • PersonStatement.birthDate contains a non-empty string

(I think those are the key properties that we would want to flag.)

ownership-or-control-statement checks

In an ownership-or-control-statement,

subject/describedByEntityStatement should always refer to a statement with statementType == entityStatement.
interestedParty/describedByEntityStatement should always refer to a statement with statementType == entityStatement.
interestedParty/describedByPersonStatement should always refer to a statement with statementType == personStatement.
These should be checked.

New validation check: isComponent part 1

If an entity, person or ooc statement has isComponent value ‘true’ check that there is a primary ooc Statement which references it (by statementID) from the primary’s componentStatementIDs property.

Warning: when no beneficial owners are declared within the data

I'm not sure we have the concept of a warning?? If not, then let's add this as an additional check.

We need to throw a warning when no interest objects in the data have beneficialOwnershipOrControl set to true. A message like "No individuals are disclosed as beneficial owners. beneficialOwnershipOrControl must be set to true within an Interest object to indicate that the interested party is a beneficial owner."

data quality for address

Moving @kd-ods spec to GH.

Under an 'Address quality' subheading:

Report the number of address objects in the dataset
Report the percentage of addresses that have a non-empty postcode field
Report the percentage of addresses that have a non-empty country code field
Report the percentage of addresses where a non-empty string in the postcode field is replicated in the Address field

Additional check: componentStatementIDs

If a statement has isComponent declared as FALSE, but is included in the componentStatementIDs field in an OOC statement, the review tool does not flag any issues. (The tool does flag up when the situation is reversed)

Example JSON used:
isComponent FALSE.txt

Statistic: breakdown of use of interest.directOrIndirect

Under the OOC Statements section > Total Interest Statements, add:

  • ... with directOrIndirect 'direct': XX%
  • ... with directOrIndirect 'indirect': XX%
  • ... with directOrIndirect 'unknown': XX%

(Also, while we're there, can we retitle 'Total Interest Statements' to 'Total Interest objects'.)

Additional check: entitySubtype aligns with entityType

Check: Where entitySubtype.generalCategory has a non-empty value, the first part of its value (up to the dash) is the same as the value of entityType. (See the codelists.)

Error message: "The specified entitySubtype is not valid for the specified entityType."

Additional check: market identifier codes (MICs)

See https://standard.openownership.org/en/master/schema/guidance/identifiers.html#market-identifier-codes-mics

The checks would be:

If one of the properties marketIdentifierCode (MIC) or operatingMarketIdentifierCode (operating MIC) has a non-empty string value, then:

  • the other property should also have a non-empty string value. (Error message: "You have supplied a value for [marketIdentifierCode/operatingMarketIdentifierCode] so a value for [operatingMarketIdentifierCode/marketIdentifierCode] should also be provided")
  • marketIdentifierCode and operatingMarketIdentifierCode values should be a valid MIC - operating MIC pair. (Error message: "Values for marketIdentifierCode and operatingMarketIdentifierCode do not accurately identify an operating market or segment, according to ISO standard 10383")

Rework PEP property stats for 0.3 data

In the Person Statement, hasPepStatus and pepDetails have been wrapped in a PoliticalExposure object and renamed status and details.

If there are existing lib-cove-bods tests relating to hasPepStatus and pepDetails, they may need to be updated.

COVE Validation check: Check that dates of death of natural persons are sane

Check that dates of death of natural persons are sane. As a start I'll suggest:

  • no one has a date of death in the future
  • no one has a date of death more than 120 years after their date of birth

which are largely in keeping with the checks that we have implemented for date of birth.

@odscjames we added a similar set of checks for date of birth. It looks as though we made changes to the code here for the general case and then added configuration to the cove-bods repo. Is it ok for me to raise the whole of the issue here, or should I split it into the two issues?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.