openownership / data-standard Goto Github PK

The Beneficial Ownership Data Standard (BODS) is an open standard providing a specification for modelling and publishing information on the beneficial ownership and control of corporate vehicles

Home Page: http://standard.openownership.org

License: Other

Python 86.69% Jupyter Notebook 13.31%

beneficial-ownership beneficial-ownership-data open-standards

data-standard's People

Contributors

Stargazers

Watchers

Forkers

nikeshbalami kindly scatteredink stephenabbott bjwebb edugomez laurenceoo antoinekopij rhiaro fooz33 thorwolpert vnova01

data-standard's Issues

Replacing entity.createdDate and entity.endDate with foundingDate and dissolutionDate

Based on a review of Popolo Project schemas, which re-use schema.org, this looks like a better re-use of concepts.

Create a Jupyter notebook that gives pass/fail validation for BODS data

We need a way to validate BODS data. Initially, this should be a Jupyter notebook. Over time, this will need to be a cmdline tool and ultimately a web app.

It doesn't need to run entirely automatically at first. It's ok if there are steps which have to be done manually. It just needs to go through the process.

The initial users for this are @ScatteredInk and @kd-ods

Create a new instance of sphinx-base to give @kd-ods somewhere to work

Same repo as current docs
Create new docs on a new branch, with example of first couple of headings of spec
@timgdavies to skeleton docs from there
Option to remove or keep current docs

Modelling issue: provenance and verification

An issues related to the draft conceptual model

We have looked at the PROV provenance ontology but fear this may be too complex for our use-cases.

We are looking for good precedents for simple provenance information and will need to develop good guidance on writing and reading provenance chains.

As part of this, we are also explore issues of verification. What sort of statements should the schema allow about the way in which other information has been verified or certified?

Real-world example data

We'd like to test the standard by building example BODS data using a variety of organisational structures, ownership chains and sources of data. A start has been made in the exampledata branch, but if you have any particularly good examples or edge cases, please add a link to a source here and I will attempt to model in BODS.

Modelling issues: Artificial legal entities

An issues related to the draft conceptual model

Our draft conceptual model separates out entity statements and person statements.

Entities can be the subject of ownership or control by an interested party (either another entity or a person), but persons cannot be the subject of ownership or control. In other words, persons can only appear at the end of a beneficial ownership chain - not in the middle.

However, there are cases where:

A group of people jointly constitute the interested party in an entity, distinct from cases where a group of people hold independent interests in the same entity;
A person is acting as the nominee for another person, and so, in effect, exists in the middle of a beneficial ownership chain;

In these cases, our approach would be to model an artificial legal entity using an entity statement. Entities could have a range of types, including:

Incorporated body
Unincorporated association
Controlled person

Worked examples and issues

In the jointly constituted interested party case, the data might looks something like the below, with four beneficial ownership statements, and the link between the people and the ultimately controlled entity (ES1) accessible through the intermediate artificial legal entity ES2.

Beneficial ownership statements

Statement ID	Interested Party	Entity	Interest type	Ownership percentage
BOS1	ES2	ES1	Shares	100
BOS2	PS1	ES2	Member
BOS3	PS2	ES2	Member
BOS4	PS3	ES2	Member

Entity statements

Statement ID	Type	Name	Company number
ES1	Incorporated body	AnyCorp Ltd	12345678
ES2	Unincorporated association	ABC Association

Person statements

Statement ID	Name	Date of Birth
PS1	Mrs A	April 1970
PS2	Mr B	Jan 1982
PS3	Ms C	10th April 1970

In the nominee case, where, for example, ‘Mr Laws’ is acting for ‘Ms Owner’ this could be modelled through an intermediate artificial legal entity (E2) which has two beneficial ownership statements, one linking the entity to the person who controls it, and other linking to the ‘role holder’ (e.g. the person who is the nominee).

Beneficial ownership statements

Statement ID	Interested Party	Entity	Interest type	Ownership percentage
BOS1	ES2	ES1	Shares	100
BOS2	PS1	ES2	Has nominee
BOS3	PS2	ES2	Role holder

Entity statements

Statement ID	Type	Name	Company number
ES1	Incorporated body	AnyCorp Ltd	12345678
ES2	Legal Nominee

Person statements

Statement ID	Name	Date of Birth
PS1	Ms Owner
PS2	Mr Laws

Use case research - narrative cases

Design choices in the development of the Beneficial Ownership Data Standard must be based around identified user needs. We will be mapping user needs through a three stage process:

A draft report with eight use case narratives is ready for first review.

The use case narratives considered are:

Data supply
- Public Registers of Beneficial Ownership
- Self-submitted data
- Third-party submitted data
- Self-published data
Data use
- Procurement & onboarding screening and audit
- General investigations
- Data-led investigations
- Data validation

Feedback is invited either directly on the document, or in the thread here.

Reviewing qualification statements

In our original scoping (#7) we introduced the idea of a "qualification statement" described as follows:

A beneficial ownership statement may also link to a qualification statement that expresses some additional information about the relationship described, such as qualifying that control rights can only be exercised under certain conditions.

In the draft model each beneficial ownership statement can have an array of qualification statements, described as follows:

A qualification statement can be used to record any additional information about this Beneficial Ownership Statement, including information about any reasons non-disclosure of information.

We currently allow these statements to have:

A type - from a codelist of [ ‘non-disclosure’, ‘redaction’, ‘restrictions-on-control’ ]

and

A description - into which free text, or semi-structured text from source systems, can be placed to describe the nature of the qualification.

As I work with this I'm left with a number of questions:

(1) Terminology: is qualification the right term?

(2) Modelling: should reasons for non-publication, and restrictions on the nature of ownership, be represented together in this same object?

(3) Cardinality: If we split out the concepts of non-publication from the concept of a qualification on an interest, do we need an array, or a separate object at all? Would qualification just become a property of interest, and non-disclosure a set of fields directly attached to the beneficial ownership statement?

Feedback welcome.

Handling changes in statements

How do we handle changes in statements, and in particular how do users know that the underlying facts that a statement refers to have changed, especially if statements are immutable.

For example, let's say there's a 'statement' that says that A controls B via share ownership. Let's say that there is a subsequent statement that says A controls B via the right to appoint the majority of directors. Should the user understand this to mean:
i) The new statement replaces the old one: A now controls B only via the right to appoint the majority of directors
ii) The new statement is in addition to the old one: A now controls B only via share ownership and the right to appoint the majority of directors

As a slight variation, imagine the scenario where A ceases to control B via share ownership. Should this be modelled as a second statement (with start and end dates), and how do the users know that this is the same relationship that had just a start date?

Field request: trading name and alternative names

At the moment entity statements only have a name field.

This is defined as "The declared name of this entity."

There is user demand for other names, such as AKAs and Trading Names of a company.

There may also be a desire to identify when a company name is the legal name, branch name or something else.

Correction and redaction of data

An issues related to the draft conceptual model

Data publishers may wish to update previously published data, or in some cases, request removal/redaction of previously published data.

The use of unique and persistent statement identifiers can facilitate this, but a documented set of approaches for this may be required.

Comments on appropriate approaches are welcome.

Schema effectively forces publishers to use UUIDs

Our documentation on identifiers says:

Publishers MUST generate globally unique and persisent identifiers for each statement.

These SHOULD start with a uuid to avoid any clash between identifiers from different publishers, and MAY be suffixed with additional characters to distinguish versions of a statement as required by local implementations.

In many implementation scenarios, it will be appropriate to simply generate a distinct uuid for each statement.

The schema says:

"ID": { "title": "ID", "description": "A persistent globally unique identifier for this statement.", "type": "string", "minLength": 32 }

The documentation reads as if UUIDs are optional but desirable but the schema suggests that it is enforcing use of UUIDs, or IDs of equivalent length. This is either forcing publishers down the UUID route, or pushing them to create longer IDs.

We could remove the minLength requirement and make the documentation clearer.

Apply Matt's theming work to the BODS docs

Building on #69 , apply the work Matt did for us to the BODS docs

Extension: entity.registrationStatus and details

We have the need to represent Organization registration status, and the current codelist of available statuses is (status description in Ukrainian):

cancelled: скасовано
registered: зареєстровано
beingTerminated: в стані припинення
terminated: припинено
banckruptcyFiled: порушено справу про банкрутство
banckruptcyReorganization: порушено справу про банкрутство (санація)
invalidRegistraton: зареєстровано, свідоцтво про державну реєстрацію недійсне

Is there dedicated field for Organization registration status in BODS already? Is codelist like the one above being discussed, and where? Are English language codes valid for Organization State, or you'd have different code in specific states (we attempted to provide the code as closely matching the description as possible)?

EDIT: Per discussion below, the proposed extension properties are:

entity.registrationStatus for the Organization registration status as free-text codelist, to be developed for/by every registry individually (one of such codelists is outlined above) and
entity.registrationStatusDetails for free-text verbose human-readable description of the Organization registration status (examples of such descriptions are outlined above)

Strengthening schema validation of statement identifiers

An issues related to the draft conceptual model

Each statement published by an organisation MUST have a unique statement identifier.

These identifiers should be persistent.

There is an important question of whether these identifiers should be globally unique or only need to be unique to the given publisher.

Considerations:

Globally unique identifiers aid integration of data from multiple sources.
Publishers may have existing identifiers from their internal systems that they want to use in published data. Any method for generating globally unique identifiers must also ensure statement identifiers are persistent.
Globally unique identifiers are trickier to produce via some data publication methods (e.g. spreadsheet data)
URIs are a possible candidate for globally unique identifiers, but:
- URIs are not very intuitive for users as identifiers when accessed in spreadsheets, databases etc, and do not work well as components of other URIs;
- Many publishers will find it difficult to provide dereferenceable URIs;
- Some publishers may not maintain persistent URIs, as web property locations are often affected by technical changes;

Questions:

What statement identifier requirements should the standard set out. Should it require a particular format of identifier?

How would consuming applications handle locally unique identifiers (i.e. the possibility that two different publishers have the same identifier for different statements)

Refactor schema into multiple sub-schemas for validation

We have recognised that for validation of a fully flat model, we need to:

Detect the statement type;
Validate with a statement specific schema;

This suggests we may want to restructure into sub-schemas for each statement type, and then an overall schema that pulls these together.

Create a mechanism to create translatable, themable diagrams

Minimum requirement is guidance for @kd-ods to start writing docs (eg recommended SVG tool or the informed belief that we can use Mermaid (or similar)) even if we need to do more work to make it pretty
Doing the work of prettification is out of scope, as long as it's <10days work to do such prettification

Redundancy and check fields

An issues related to the draft conceptual model

In some cases, redundancy of data can be useful to check accuracy. For example, when cross-referencing a personStatement by ‘id’, there could also be a requirement to include the ‘name’ value of the target personStatement. This can help detect incorrect identifiers.

Although counts can be calculated from the number of entries in an array (for example), it can be useful in some cases to have explicit counts to check no data has gone missing.

For example, the UK PSC API responses include ‘ceased_count’ and ‘active_count’ values. Our proposed entityStatement object could include fields to specify the number of known beneficial ownership statements that are active or superseded, allowing consuming systems to check if they are missing important data.

Views are welcome on approaches to redundancy and check fields, and where they could or would be useful.

Rename statement.date to statement.statementDate

This should clarify that the 'date' field should represent the data on which the statement is asserted, not the date at which the information in the statement is true.

Modelling issue: package format

An issues related to the draft conceptual model

We will need to decided upon a packaging format for bulk releases of data.

This will involve identifying the meta-data that should be provided, and how statements should be packaged together.

We will look at the Data Package Specification as a reference point for this.

We note that the GLEIF has ‘Delta’ field to indicate when data is since for bulk files.

Ordering of statements and validation

Validation of BODS data requires checking:

(a) That each statement is structurally valid;
(b) That all the statements referenced by a beneficialOwnershipStatement exist;

This is made a lot easier if we impose an ordering constraint on data that all the constituent statements of a beneficial ownership statement must appear before the beneficial ownership statement itself.

JSON arrays are ordered, so if as per #57 we use [ ] as our top-level, or JSON lines (which are also sequentially ordered), then parsers would be able to enforce this constraint.

Prepare clear folder structure of test data: valid and invalid

Decide on folder structure
Create test data

Proposed changes to interestType values relating to trusts and other legal arrangements

The OECD Common Reporting Standard for Automatic Exchange of Tax information contains the following enumeration of controlling person types (for the controlling person of an account).

<xsd:simpleType name="CrsCtrlgPersonType_EnumType">
		<xsd:annotation>
			<xsd:documentation xml:lang="en">Controlling Person Type</xsd:documentation>
		</xsd:annotation>
		<xsd:restriction base="xsd:string">
			<xsd:enumeration value="CRS801">
				<xsd:annotation>
					<xsd:documentation>CP of legal person - ownership</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS802">
				<xsd:annotation>
					<xsd:documentation>CP of legal person - other means</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS803">
				<xsd:annotation>
					<xsd:documentation>CP of legal person - senior managing official</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS804">
				<xsd:annotation>
					<xsd:documentation>CP of legal arrangement - trust - settlor</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS805">
				<xsd:annotation>
					<xsd:documentation>CP of legal arrangement - trust - trustee</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS806">
				<xsd:annotation>
					<xsd:documentation>CP of legal arrangement - trust - protector</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS807">
				<xsd:annotation>
					<xsd:documentation>CP of legal arrangement - trust - beneficiary</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS808">
				<xsd:annotation>
					<xsd:documentation>CP of legal arrangement - trust - other</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS809">
				<xsd:annotation>
					<xsd:documentation>CP of legal arrangement - other - settlor-equivalent</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS810">
				<xsd:annotation>
					<xsd:documentation>CP of legal arrangement - other - trustee-equivalent</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS811">
				<xsd:annotation>
					<xsd:documentation>CP of legal arrangement - other - protector-equivalent</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS812">
				<xsd:annotation>
					<xsd:documentation>CP of legal arrangement - other - beneficiary-equivalent</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
			<xsd:enumeration value="CRS813">
				<xsd:annotation>
					<xsd:documentation>CP of legal arrangement - other - other-equivalent</xsd:documentation>
				</xsd:annotation>
			</xsd:enumeration>
		</xsd:restriction>
	</xsd:simpleType>

Explore ownership share: intervals and exact values

One value which I understand will vary across jurisdictions is the value of ownership share.

In some countries you can assign an interval "above XX%" while in other it will be an exact value eg. 29%.

I may have missed this from the report but did not see it discussed.

One living example where we have interval disclosures is the EU Transparency register where companies must disclose their lobbying spending:
https://lobbyfacts.eu/reports/lobby-costs/all

This gives some interesting results.

Handling protected information

The UK Persons of Significant Control register has two categories of 'protection' (see also):

(1) Excluding Usual Residential Address (URA) from the details shared with groups like Credit Reference Agencies.

(2) Excluding all details of the PSC from the register (called 'Super Secure')

Case (1) doesn't raise many issues, as it does not change the public record (which does not contain residential addresses in the first place, just a correspondence address).

The standard may, however, need to handle case (2) where the register would show that there is a 'Super Secure' PSC, but not their details.

Some category to indicate where redacted information is given should be used.

Make our own sphinx theme with bootstrap 4 integration

As a base for this work, we should make a generic bs4 theme for Sphinx (in the OpenDataServices org)

@kindly will be able to advise on specifics!

Making sure that this work is reusable is part of the scope here

Guidance and schema for representing transliterated and original content

There is a use-case for BODS to represent source material in its original form and a transliterated version in another alphabet.

Check data model against UK Standard Selection Questionnaire

UK government agencies are now being guided (required?) to use a new Standard Selection Questionnaire which contains a section to collect BO data from anyone tendering for, or awarded, public contracts.

In particular:

§ 1.1(n) asks for 'Details of Persons of Significant Control (PSC), where appropriate:'

Name;
Date of birth;
Nationality;
Country, state or part of the UK where the PSC usually lives;
Service address;
The date he or she became a PSC in relation to the company (for existing companies the 6 April 2016 should be used);
Which conditions for being a PSC are met;
- Over 25% up to (and including) 50%,
- More than 50% and less than 75%,
- 75% or more.

(Please enter N/A if not applicable)

§ 1.1(o) asks for 'Details of immediate parent company:'

Full name of the immediate parent company
Registered office address (if applicable)
Registration number (if applicable)
Head office DUNS number (if applicable)
Head office VAT number (if applicable)

(Please enter N/A if not applicable)

§1.1(p) asks for 'Details of immediate parent company:'

Full name of the immediate parent company
Registered office address (if applicable)
Registration number (if applicable)
Head office DUNS number (if applicable)
Head office VAT number (if applicable)

(Please enter N/A if not applicable)

The fact these are in a single cell of a word table, and have 'Enter N/A if not applicable') suggests that the SSQ has not provided any guidance on capturing this information as structured data to implementers - so:

(1) There is an opportunity for simple template versions of the BODS here

(2) We should think about how the statement-centric model of BODS could be easily communicated to someone planning to implement data collection as required by the SSQ, and what they could be reasonably expected to capture.

User request for data entry and collection: Flat file templates

For government/registration agencies conducting data collection from countries with limited ICT capacity, information management systems or limited data collection practises for company registration I think it would be useful to have easily assessible data entry methods available. One option could be to provide flat file templates which could be used for manual spreadsheet based data entry without requiring any webbased platforms.

To provide some additional context on the current data collection and dissemination capacity gap among EITI countries which are all requested to implement BO data collection over the coming years: Only 12 out of 46 EITI countries are publishing currently collected EITI data in machine readable format.

Assuming that far from all countries will be able to implement local webform based applications or adopt global submission applications, it would be important to have spreadsheet templates which can capture the data.

This might be outside the scope of the schema itself but I think it would be useful to prioritize for the near term future for example as part of pilot country projects.

Resolve questions over implementation of multiple serialisations of BODS data

In our updated modelling, we anticipate up to three kinds of BODS serialisation.

(1) Array of statements

E.g.

[
  {
    "id": "e9eeaaa3-d4d0-4036-8f47-731c5ddbf6c8",
    "type": "personStatement",
    "name":"Ed Example"
  },
  {
    "id": "d9cf4a36-fe23-4842-9dbe-5766f8744a05",
    "type": "entityStatement",
    "name":"E.G Corporation"
  },
  {
    "id": "e1fe7861-0494-406d-82c3-670652a9544c",
    "type": "beneficialOwnershipStatement",
    "subject": {
      "entity": {
        "identifiedByStatementID": "d9cf4a36-fe23-4842-9dbe-5766f8744a05"
      }
    },
    "interestedParty":{
        "person":{
            "identifiedByStatementID":"e9eeaaa3-d4d0-4036-8f47-731c5ddbf6c8"
        }
    }
  }
]

(2) Statement JSON lines

I.e. each statement on a single line.

{"id": "e9eeaaa3-d4d0-4036-8f47-731c5ddbf6c8","type": "personStatement","name":"Ed Example"}
{"id": "d9cf4a36-fe23-4842-9dbe-5766f8744a05","type": "entityStatement", "name":"E.G Corporation"}
{"id": "e1fe7861-0494-406d-82c3-670652a9544c","type": "beneficialOwnershipStatement","subject": {"entity": {"identifiedByStatementID": "d9cf4a36-fe23-4842-9dbe-5766f8744a05"}},"interestedParty":{"person":{"identifiedByStatementID":"e9eeaaa3-d4d0-4036-8f47-731c5ddbf6c8"}}}

(3) Nested API responses

Provided as an array of beneficial ownership statements, where the person and entity statements are nested. The below is only a draft, as we may want to consider API standard approaches for this.

This would be recommended only for interactive API responses.

[
  {
    "id": "e1fe7861-0494-406d-82c3-670652a9544c",
    "type": "beneficialOwnershipStatement",
    "subject": {
      "entity": {
        "id": "d9cf4a36-fe23-4842-9dbe-5766f8744a05",
        "type": "entityStatement",
        "name": "E.G Corporation"
      }
    },
    "interestedParty": {
      "person": {
        "id": "e9eeaaa3-d4d0-4036-8f47-731c5ddbf6c8",
        "type": "personStatement",
        "name": "Ed Example"
      }
    }
  }
]

Handling personal identification numbers, passport IDs

Looking over the Identifier section especially when applicable to person statements, I wonder if the issuer institution is not something that can be optionally included (when available) along with the identifier. The ID itself is of little use without this information. For example even the passport IDs are not unique across the world and without the information about issuer, one cannot even assume the country of residence or nationality (a person can have dual citizenship) ...

On a more broader context about the format of these passport IDs, maybe we can draw some inspiration out of the way ICAO handles the machine readable passport ID standard:

http://www.icao.int/publications/pages/publication.aspx?docnum=9303
more specifically this document in that list
http://www.icao.int/publications/Documents/9303_p4_cons_en.pdf

I realize this may be a bit too much for the purpose of this standard... i'm just sharing with you what i have found about this topic, maybe it's useful at some point later on, it's interesting to see how these unique numbers are constructed (page 35 in the above-mentioned PDF).

Thanks

Explore how BODS relate to 2nd level LEI data?

LEI is rolling out an extension or expansion of the standard which will enable companies to add information on one level up and one level down.

It would be interesting to consider or explore the community if a code list or a cross walk would add value?

Below from the LEI website:

"In May 2017, the process of enhancing the LEI data pool, by including ‘Level 2’ data to answer the question of ‘who owns whom’, began. This data allows the identification of the direct and ultimate parents of a legal entity and, vice versa, in order that the entities owned by individual companies can be researched.

The collection and validation of Level 2 data by the LEI issuing organizations for LEIs that existed prior to May 2017 takes place with the annual renewal of the LEI. Renewal means that the reference data connected to an LEI is re-validated annually by the managing LEI issuer against a third party source. It is expected that Level 2 data for the complete LEI population will be available in the course of the first half of 2018, i.e. towards the end of the one-year renewal cycle after the date when collection of Level 2 data started."

https://www.gleif.org/en/lei-data/access-and-use-lei-data/level-2-data-who-owns-whom

Types of beneficial ownership relation

An issues related to the draft conceptual model

We will need a codelist to represent the different kinds of beneficial ownership and control relationships between an interested party, and the entity it controls.

This may need to be a hierarchical codelist, in which we have a number of top-level concepts, and then a range of sub-types. In the examples above, we have identified:

Right to profits
- Share ownership
Right to direct decisions
- Role-based
- Contractual
Role holder - with no implication of ownership or control (such as when a lawyer holds the role of being the nominee, but can only exercise that on behalf of another party)

We will need to find, or develop, a more detailed codelist here.

Local implementations of the standard may need to map between this interoperable codelist and their own local legal requirements and systems.

Apply theme from Outlandish's recent work to new docs site

install bootstrap 4 theme
Set up outline navigation
@kindly to work with Matt from Outlandish and @kd-ods to get additional info required to translate his work into CSS. This needs to be on Weds or Fri next week due to working days.

'Done' is when the basics are done but the edges aren't finished, and we know how to do everything else.

Loose validation

The jurisdiction and nationalities string fields both specify a maxLength property:

"nationalities": {
          "title": "Nationality",
          "description": "An array of ISO 2-Digit country codes",
          "type": "array",
          "items": {
            "type": "string",
            "maxLength": 2,
            "minLength": 2
          }

"jurisdiction": {
          "title": "Jurisdiction",
          "description": "The jurisdiction in which this entity is registered, expressed using an ISO ISO_3166-2 2-Digit country code, or ISO_3166-2 sub-division code, where the sub-division in question (e.g. a sub-national state or region) has relevant jurisdiction over the registration of operation of this entity.",
          "type": "string",
          "minLength": 2,
          "maxLength": 10
        }

The source material for these fields does not always require information in ISO 3166-2 form, and so these restrictions can be seen as a barrier to publishing. The other side of this argument is that, if we allow jurisdiction information to include "British", "England", "uk", "Unknown" when then value should be "GB", then the data is less useful, and it is harder to push the policy argument for open ownership information further.

Modelling trusts

The beta version of the standard has a simple model of trusts, reflecting the lack of publicly-available data on ownership and control.

Trusts are covered under the entity type arrangement. An arrangement is described in the documentation as an artificial entity "associating one or more natural or legal persons together in an ownership or control relationship, but without implying that the parties to this arrangement have any other form of collective legal identity".

With the proposed changes to the EU's Fourth Anti-Money Laundering Directive, beneficial ownership information on trusts is likely to be stored in central registers and this information, in turn, may be made available to the public. There is therefore a strong argument that BODS will need to model trusts in a more sophisticated way in order to fully meet its selling point of describing who owns and controls wealth.

There are two parts to this:

Modelling beneficiaries, controllers and other natural persons involved in trusts. See #40
Modelling trusts as a legal structure, as opposed to a black-box arrangement, if scoping research show that this is the appropriate path to take.

Draft: Conceptual framework, core components and data serialisation

This document sets out a proposed conceptual framework for the Beneficial Ownership Data Standard and identifies core components that the standard must represent, along with the relationships between these components. It presents initial an initial approach to data serialisation and exchange.

This draws upon research into existing standards, the development of standard use-cases and requirements, and examination of existing beneficial ownership and control data.

The diagram below summarises the statement centric conceptual model we are exploring. This is detailed in the document with worked examples.

The document is open for comments until 6th February 2017.

Modelling issues arising from the paper are also raised in separate issues, tagged 'conceptual model'

Modelling Issue: Address standards

Addresses ... difficult for sure.

There are well established postal standards in North America at least from Canada Post and US Postal service. (e.g. https://www.canadapost.ca/tools/pg/manual/PGaddress-e.asp)

As well as implementation examples: Here is a Province of British Columbia example
https://github.com/bcgov/api-specs/blob/master/geocoder/singleLineAddressFormat.md

Also .. https://schema.org/PostalAddress might be useful

I raise this as addresses are notoriously digitally unfriendly and would be nice if we don't introduce further entropy into this particular data world.

Develop guidance page on names

See https://www.w3.org/International/questions/qa-personal-names and open-contracting/standard#637 (comment)

Introducing additional features for resilience to bad publishing, or deliberate obfuscation

From e-mail feedback:

The concept looks very appropriate, but [the current approach may] assume that “everything is going to be done as anticipated”. That’s true for the ethical side, but not so for those looking to short, omit and/or obscure. You’ll have “cheating” going on that you’ll want the constructs and facilities to trap. Not sure seeing that aspect being addressed here.

The current statement centric model tries to address one side of cheating, by atomising things out into statements which could be published as a ledger (possibly with options for cryptographic signing). But, there are likely quite a lot of adversarial strategies that could be used by actors to provide bad data, or to cause problems for existing data. What do we need to consider? How could these be mitigated?

Improving support for imprecise and relative dates (e.g. 'Sometime in March or April 2015' or 'before Jan 1989')

An issues related to the draft conceptual model

A number of the dates that may be provided in beneficial ownership data will be imprecise. For example, providing only year, or year and month, rather than full year, month and day.

The commonly used RFC 3339 date-time on the internet format requires fully qualified YYYY-MM-DDTHH:MM:SS strings, and so is not suitable for cases when the day, or specific times, are unknown.

ISO 8601 from which RFC 3339 is derived permits a much wider range of strings, including YYYY-MM when the day is not known. However, most schema languages do not provide a type for this. For example, xsd:date requires use of day as well as month.

Options

We could require data publishers to add ‘01’ (1st day of the month) to any incomplete dates.
We could allow incomplete dates, using a custom regex to validate data.
We could use a composite object to capture dates: with fields for ‘day’, ‘month’, and ‘year’.

For each option we will need to consider the business logic that users may need to apply to incoming data, and we may need to provide guidance on approaches to dealing with incomplete dates.

Splitting person, entity and null statements

We currently refer to multiple types of object in a single field, e.g. interestedParty can refer to an entityStatement, a personStatement or a nullStatement. This causes validation issues and makes analysis and database imports difficult. In the next BODS release, we should move to a pattern of having these objects in separate fields and validating that only one is present.

See OpenDataServices/flatten-tool#182 for discussion.

Consider JSON Lines rather than JSON

Dealing with lots of data in single JSON objects can bring a big overhead.

We should consider using JSONLines, which allows easier stream processing, and use of command line tools (e.g. simply grep over files) to work with large collections data.

Modelling issue: codelist languages

An issues related to the draft conceptual model

The schema will include a number of codelists.

Codelist codes can either be:

Based on English language. For example, [‘shares’,’contract’] as valid codelist values, with a codelist which provides further labels and descriptions.
Or
Alphanumeric, with a lookup list. For example [1,2] as valid codelist entries, and a codelist that provides further labels and descriptions.

English language codelist entries can make data more intuitive to some (english speaking…) readers. However, codes without an embedded language or semantics in the code name are more internationally applicable.

A decision on this trade-off will be needed.

Investigate need for documentation linkage

In some cases the primary source of a ownership information will come from paper documents and scanned copies.

For provenance reasons we may need a good way of representing links back to these documents.

Precedents, components and learning from existing standards

To inform the development of the Beneficial Ownership Data Standard, the OpenOwnership project commissioned Jack Lord of Kraken Research to conduct a rapid review of existing standards that might provide a direct basis, or relevant learning, for standard development.

A list of 32 different standards explored can be found here and the draft report for working group feedback is here.

Feedback is invited on:

Additional standards that should be considered as providing precedents or useful inputs for a beneficial ownership data standard;
The overall findings of the paper.

Read and review the paper here.

Field representation: Address information

We will need to collect address information in a number of places.

There are two broad options here:

(1) Require structured addresses
(2) Allow semi-structured addresses, and require users to parse them for matching

Structured Addresses

This would represent addresses using a collection of fields. For PostalAddress Schema.org uses:

postOfficeBoxNumber - The post office box number for PO box addresses.
streetAddress - The street address. For example, 1600 Amphitheatre Pkwy.
addressLocality - The locality. For example, Mountain View.
addressRegion - The region. For example, CA.
postalCode - The postal code. For example, 94043.
addressCountry - The country. For example, USA. You can also provide the two-letter ISO 3166-1 alpha-2 country code.

The challenge here is that source systems may:

(a) have a different set of source fields;
(b) only have unstructured address information; and/or
(c) assign address portions to different fields anyway

For example, three source systems could have the address of Chrinon Ltd as:

(1) Address: [ Aston House, Cornwall Avenue, London, N3 1LF ]

(2) Line 1: [ Aston House ]; Line 2: [ Cornwall Ave ]; Town: [ London ]; County: [ ]; Zipcode: [ N3 1LF ];

(3) Street Address: [ Aston House, Cornwall Ave ]; Town: [ ]; County: [ London ]; Postal Code: [ n3 1lf ]; Country: [ England ]

Any system trying to match these will need to do quite a lot of processing to normalise them.

Semi-structured addresses

A semi-structured address approach would include three fields:

Address - The address, with each line of the address separated by a line-break or comma. This can also include the postalcode.
Postcode - The postal code on it's own.
Country - The ISO Country code for the country

Users would then need to use a library such as libpostal to parse out structured address data when full addresses are needed.

Discussion

My initial sense is that, as users are going to need to do a lot of work to match addresses anyway, we should:

Provide implementation guidance for data-input systems to capture structured addresses;
Use the semi-structured address model in the schema itself - having just three fields there for address;

Extended properties and conformance

We need a statement of policy concerning how additional properties to the standard should be handled.

For example, if a system has an internal identifier for statements that it want's to include in output for debugging purposes, or a publisher wants to include organisation status (#28) in their data.

Some of the options:

Permit any additional properties;
Allow addition properties, but prefixed by x_
Require all additional properties to be supported by a declared 'extension' to the standard;

(Raised from a question from @myroslav)

Investigate ISO_10962 for voting rights information

https://en.wikipedia.org/wiki/ISO_10962

Refactor reference statements

In this schema patch to support example generation @ScatteredInk splits out the different kind of StatementReference objects in use in the standard to support validation that the right kind of statement is being used in the right place.

This looks like a sensible step for better validation of data and the [use of 'allOf' JSON schema properties](https://spacetelescope.github.io/understanding-json-schema/reference/combining.html#allof to provide inheritance from a basic StatementReference object is great.

However, the tooling currently used to generate documentation is not allOf aware, so I'm not planning to make this change in the main schema for the initial release.

openownership / data-standard Goto Github PK

data-standard's People

Contributors

Stargazers

Watchers

Forkers

data-standard's Issues

Structured Addresses

Semi-structured addresses

Discussion

Recommend Projects

Recommend Topics

Recommend Org