co-cddo / open-standards Goto Github PK
View Code? Open in Web Editor NEWCollaboration space for discussing and exploring technical and data standards
Collaboration space for discussing and exploring technical and data standards
I'm working (ever so painfully) to setup an auth chain using Azure Active Directory as the SSO service and believe me when I tell you that it's a huge hassle.
Do we have any established patterns around SSO? Over here in UKTI Microsoft stuff appears to be the norm, and we've already got a few products using Azure AD, so that's why I have to integrate it, but I think we'd do well not to make a habit of this.
Question from @philandstuff: “do you know of any appropriate standards for use-cases around bulk download & streaming? [12:13] we're kind of imagining how you might have "git clone" and "git pull" for a register copy a register, then at a later time, download everything that has since changed?”
Any named individual who writes a government document, standard or code should provide an ORCID iD, which should be published alongside their byline, and in associated metadata.
ORCID iDs are unique identifiers. The use of an ORCID disambiguates two authors with the same (or similar) names; and identifies the work of one person under a variety of names (for example because of differing use of initials, misspellings, name changes, or differing transliterations).
Individuals register and own their ORCID record; it goes with them when they change jobs, or write for other publishers. The record can include details of education, employment, funding and works authored, each of which can be made public or kept private.
Publishers, employers, funders and other bodies can incorporate ORCID into their back-end systems. APIs are available publicly and to paid members of ORCID or a local consortium. Some mandate that authors or people receiving funding must provide an ORCID iD. Others make it optional.
ORCID is a non-profit organisation, and public ORCID data is available under an open licence.
As an organisation, the government should encourage, and where appropriate mandate, the use of ORCID iDs, and should include parameters for them in relevant forms, and databases.
For more info, see http://orcid.org/ or https://en.wikipedia.org/wiki/ORCID
[My ORCID is in my Github profile.]
Would a repository for type definitions in the same spirit as schema.org be a worthy ambition?
Would you make use of it?
Would you contribute to it?
What style of presentation and what notations should it have?
Who would oversee? and what oversight model would work best : OSS, delegating sub schemas to departments, etc.
Who would pay?
Some services being developed are using SAML. How open is SAML and what is our position on it?
Persistent identifiers for public government documents
Data services and content lead at the British Library.
Manage the DataCite UK service, which makes Digital Object Identifiers (DOIs) available to UK organisations.
Assigning resolvable, globally unique and persistent identifiers, such as Digital Object Identifiers (DOIs) to public government documents (reports, data, other papers) allows them to be cited in a stable and trusted way, particularly within an academic context. This supports trust in the research itself.
Standard web addresses as URLs used in academic citations are prone to link rot (see: https://doi.org/10.1371/journal.pone.0115253), which reduces the ability to verify claims in the research. This also applies to government documents cited with URLs in the literature, and so undermines use of government documents in academia.
More widely, all users find it hard to keep track of the location of any given document as it moves around the government's web estate over the long-term.
Resolvable, persistent identifiers like DOIs applied to government documents will ensure stable links. Researchers will be confident that citations to government reports and data found in the literature will work, and content they reference will be available for the long-term. DOIs are well-recognised within the research community, as they have been used to cite online material for more than 15 years.
The additional layers of governance that come with use of identifiers such as DOIs ensure that all kinds of user will be able to find and access government documents via the same URL no matter where on the government's web estate the item is hosted over time.
Globally unique persistent identifiers will also enable government to see how each document is used. The use of each report can be more easily distinguished from its versions. Identifiers such as DOIs allow the tracking of usage metrics such as Altmetrics (http://altmetrics.org/manifesto/) and with services such as DataCite's Event Data (https://eventdata.datacite.org/).
Users need to be provided a persistent, trustworthy and globally unique way of referring to public government documents.
Government needs to be able to track the academic use of those documents.
Publishinganimal accidents data as open data!
Animal accidents involving livestock have been reducing but there has been some interesting ideas from local councillors to reduce them further, such as stopping the free roaming of commoners livestock, which would have serious implications to tourism. I think that by opening up the animal accident data and publishing reusable information could lead to both innovative ideas to reducing animal accidents involving vehicles and also improve community engagement with the verderers and NFPA.Apologies I haven't followed the standard issue template buthopefully the link below will provide a bit of context!
https://callumrtanner.com/2017/01/16/open-up-the-data-of-animal-accidents-in-the-new-forest/
Authentication and Authorisation is complex to understand and get right. GDS should be looking to create a framework on top of OAuth2 and OpenIDC which other departments can benefit from.
An IT Software Consultant currently working in an agency of the UK government.
The agency has a number of individual applications. Each application has it's own implementation of Authentication and Authorisation. Each application has its own user database. Some applications only allow access to internal users of the organisation and others other allow both internal and external users. Generally there is an inconsistent approach. The inconsistency means a larger attack surface area. It means that developers that don't have experience in the complexities and often bake their own implementation of OAuth2.0 authentication into their application. Use of a framework such as the OpenSource IdentityServer4, or a GDS equivalent would bring conformity.
The agency would benefit through reduced development time of new applications, better security, potentially a single database of users, easier to grow services in the future. Developers working for the agency\Government Organisation wouldn't need to understand the nuances of application security and make simple mistakes with huge implications, implementation would be quicker and more secure. End users of agency applications would have better protection of their information through a consistent approach.
See User Need
The choosing of technologies to adopt Authentication is fine but the following need to be addressed:
I will stop blurting out now...
We're aware that the Challenge Owners' Guide page contains a graphic which doesn't meet our accessibility requirements.
This will need to be updated before go-live.
Recently the topic of spend data has come up, particularly how everyone seems to publish their data in slightly different structures, sometimes the same organisation using a different structure each month.
I was aware of the Local Government Association's schema ( https://github.com/esd-org-uk/schemas/blob/master/Spend/Spend.json ) and a HMRC schema was mentioned. I haven't found the HMRC schema yet, although I am presuming it is core-department specific, so if anyone has any pointers.
There's likely to be a problem persuading people to use a specific schema, but we should at least have a schema to suggest and to that end, I'm hoping to gather opinions on the best approach for this. Should we be asking people like https://www.spendnetwork.com/ for guidance on what they would expect? Could @torgo arrange this if so, as I believe they are at ODI.
This might be of interest to @davidread as he's had some experience with the https://openspending.org/ codebase.
Discussion on versioning of APIs: best practices, current approaches, relevant standards...
The Standards Hub currently recommends Version 1.0 of the Open Contracting Data Standard.
Version 1.1 has recently been released
What would be the process to consider updating the recommendation to suggest use of version 1.1?
Disaster response has always been a challenge during and after major disasters due to the impact of disaster itself, the number of organizations and individuals participating in the response[1] and the lack of rapid social networking to support immediate community response. Disaster, regardless of etiology, exceeds the ability of the local community to cope with the event and requires specialized resources from outside the area impacted[2-4]. In a large-scale destructive event, one of the greatest challenges to public health workers and rescuing teams is to have stable and accessible emergency communication systems[5,6]. However, little researches currently exist regarding the use of communication platforms and internet social networks for emergency response.
Emergency response during disasters is often complicated because communication becomes unavailable. The Chi-Chi earthquake in Taiwan and Hurricane Katrina in US have proven that current telephone, radio and television-based emergency response systems are not capable of meeting all of the community-wide information sharing and communication needs of residents and responders during major disasters[7,8]. After 9/11, Preece and Shneiderman et al proposed the concept of community response grids[9] which would allow authorities, residents, and responders to share information, communicate and coordinate activities via internet and mobile communication devices in response to a major disaster. Information technologies has the potential to provide higher capacity and effective communication mechanisms that can reach citizens and government officials simultaneously[
n the case of typhoon disaster in Taiwan, internet social networking and mobile technology were found to be helpful for community residents, professional emergency rescuers, and government agencies in gathering and disseminating real-time information, regarding volunteer recruitment and relief supplies allocation. We noted that if internet tools are to be integrated in the development of emergency response system, the accessibility, accuracy, validity, feasibility, privacy and the scalability of itself should be carefully considered especially in the effort of applying it in resource poor settings.
Summary
Briefly describe yourself.
A short summary of the user need and expected benefits of this challenge. This summary will be used to help people to spot which challenges are of interest to them.
The user need that this challenge seeks to address, including a description of the types of users it involves.
In the case of typhoon disaster in Taiwan, internet social networking and mobile technology were found to be helpful for community residents, professional emergency rescuers, and government agencies in gathering and disseminating real-time information, regarding volunteer recruitment and relief supplies allocation. We noted that if internet tools are to be integrated in the development of emergency response system, the accessibility, accuracy, validity, feasibility, privacy and the scalability of itself should be carefully considered especially in the effort of applying it in resource poor settings.
The functional needs that the proposal must address.
I have some questions around the recommendations specified at:
https://www.gov.uk/government/publications/open-standards-for-government/exchange-of-location-point
The document says that:
What does this actually mean in practice in terms of specifying conformance criteria and designing data formats? For example, within the scopre of ETRS891, should a data file include points with a ETRS89 CRS and then may, in addition, include the points in other CRS? Or is there a choice?
The section on Functional Needs in the guidance doesn't really elaborate. In fact it makes a case for using WGS84.
As a concrete example, the newly published Brownfield Land Register standard says that local authorities should use ETRS89, but the standard allows points to be specified in in other CRS systems.
I was just at a workshop on this standard where was some debate about the utility of ETRS89. E.g. local authority systems may not store this natively and consumers of the open data are perhaps more likely to want WGS84.
There also seems to be some inconsistency in section 5, which says:
"applications that consume data sets containing points must promote and prefer WGS 84".
Promote and prefer WGS 84 seems at odds with requiring use of ETRS89?
I understand that ETRS89 is the standard CRS used in the EU, so can see why it has been referenced. But I think it might be useful to clarify some of the intended outcomes here.
Raised by @benlaurie on Twitter
I'd love to read all about it but all your docs are in Word format.
https://twitter.com/BenLaurie/status/870867721952645121
The .docx files in question are from https://www.riscs.org.uk/
RISCS is funded by NCSC, but is run by UCL.
An open question to our community - should we encourage our partners to use open standards? Should we require it? How would we enforce this?
Category: Data
David Read, tech arch at MoJ's Analytical Platform. Background with GDS on: data.gov.uk, Better Use of Data team.
Tabular data (e.g. CSV) is the most common data format but it is loosely defined and users would benefit from standardizing on the details.
This challenge is not about Excel workbooks or similar. It is about data that is primarily consumed by machines/software, rather than humans.
This challenge is not about metadata (e.g. schema / column types, licence) or validation. That's covered in challenge #40 and the options, including CSV on the Web and Tabular Data Package are both about putting metadata in a separate file, so is a separate conversation.
Off the top of my head:
We want to encourage government users and citizens to use government data more, for greater understanding and decision-making. There are plenty of barriers to this, including skills, tools, access, licencing etc but one small but significant one is a proliferation of usage of CSV. These often require users to do extra work::
The functional needs that the proposal must address.
A short title which describes the challenge. Avoid acronyms or jargon.
Terence Eden and Lawrence Greenwood are the challenge owners. The challenge was originally posted by Chris Little.
Humans are adept at accommodating and understanding a variety of time and date notations, such as 7th July 2016, 7/7/16, July 7 2016, 2016-07-07 and 10:12am, 12 past 10 in the morning, 09:12GMT, 09:12UTC, 10:12BST, 12:12EEST.
There is a well established, global, international notation for dates and times: ISO8601. This standard, and subsets of it, are widely used in the ICT domain, and have the advantage of automatically being 'sortable' on most computer systems. That is, dates like 7 Jul, 7 Aug and 7 Sept would usually sort into the alphabetic order: Aug, Jul, Sept, rather than the expected temporal order Jul, Aug, Sept.
By adopting ISO8601 notation for dates and times in online documents, spreadsheets, databases and for filenames and online references such as URLs, greater interoperability will be achieved at less cost and with less confusion.
2016-07-07T09:23:00Z
Inconsistent recording of dates and times, causing confusion especially with the international exchange of computer documents, where there are different cultural practices. For example, confusion of UK and USA dates, such as 9/11/2001, will be reduced.
Users will be able to more easily order documents of interest into strict date-time order, and there will be greater interoperability for transferring date-time information between disparate computer systems. There will be a better validation of date-time information.
Sorting algorithms can be simplified. Listings can be more readily understood. International exchange of information improved.
This challenge was presented to the Open Standards Board by A. Seles from The National Archives
Government information is produced on many different platforms and can include things like records, emails and data. Users of government information, both citizens and government officials, need to be able to understand it and use it, independent of any platform, furthermore, users should be able to examine and query information without having access to its full contents. In order to accomplish this, government systems need to create a standardised set of information (i.e. metadata) about the resources it manages.
Users in this context include citizens, civil society and government officials.
Furthermore, having standardised metadata will also allow public officials to meet legislative requirements as information can be easily retrieved to answer access requests under Freedom of Information or transfer records to The National Archives, as per the Public Records Act.
The solution to the challenge should be able to meet the following functional needs:
West Midlands Fire Service
We're a small in-house team within West Midlands Fire Service that has over 12 years experience in developing dynamically-rendered input forms (i.e. a technique that separates the definition of what a form should collect from the code that controls how the user interface is delivered).
This separation-of-concerns has worked really well for us over the years and continues to bring unexpected benefits. We've now accumulated a library of around 100 form definitions which cover all aspects of Fire Service activity.
As we begin work on our next-generation platform we'd like to align to a standard that helps widen the uptake of this approach.
This is a challenge to produce an open standard to express user-facing input forms.
The standard should cover several facets:
Facet | Description |
---|---|
Layout | The order and configuration of UI widgets. |
Binding | How UI widgets relate to an underlying data model. |
Appearance | Prompts, labels, grid arrangement, iconography, styling-overrides etc. |
Structure | Arranging UI widgets into sections, groups etc. |
Enumeration | For populating drop-down lists and similar. |
Context | The specification should be expressive enough to indicate how a form should behave/appear in different contexts (in-the-field, in-the-office etc.) |
Validation | Min/max ranges, required/optional attributes, [regular] expressions, function-binding etc. |
Dynamic content | Use of REST APIs to get content/values and perform server-side validation, "typeahead" support, UUID generation etc. |
Behaviour | Conditional appearance of items/groups/sections, enumerated values, read-only states etc. |
Nesting | Allow for repeating groups of UI widgets (with min/max allowed number of entries and similar). |
Advanced | Internationalisation, scripting support, tours, offline-fallback configuration... |
Along with our prototype DSL here are a couple of established reference-points to start:
Note!
As this is a more general challenge, it doesn't relate to one particular group of user.
That said, what is important from a user-perspective is that the specification be expressive enough for dynamic form renderers (be them running inside web pages or mobile apps) to deliver rich and efficient user experiences.
Assuming this standard could be used throughout government to define user-facing forms...
Benefactor | Benefits of a successful challenge |
---|---|
Users | All forms would be consistent (at least within a single platform/product which delivers a set of form definitions) meaning one interface to master and less systems to learn. Users can expect a greatly improved data-collection experience as any effort to improve a generic form renderer would then directly benefit all forms (and by extension, all users). If a form is being delivered from a generic platform, then it is safe to assume authentication is happening across all forms - reducing system-switching burdens |
Operational | Assuming a tool-ecosystem built around this standard: organisations will be much better positioned to set and refine their own data collection requirements, making for a more responsive and agile government. Support for deep validation, tours, keyboard-accelerators and other generic functionality will help drive-up general data quality and efficiency. If form content is delivered via a generic platform then authentication and authorisation management would be centralised. Government interoperability and transparency will be greatly improved. |
Social | This specification (combined some other challenges such as data models and workflow) begins to pave-the-way to a much more collaborative and open approach to assembling software - akin to the benefits attributed to low-code platforms. |
Environmental | When combined with quality tools, the ease of replacing paper forms with electronic equivalents may drive-down paper consumption? |
When quantifying the impact of introducing our initial dynamic-form platform, we internally estimated that it was saving 15,000 person-hours per year (as compared with the overheads incurred with traditional "discreet system" approaches and training). If such estimates still hold true today, then the cumulative impact of supporting a switch to dynamic-form-rendering across government would be significant.
We consider the proposal should be:
Need | Description |
---|---|
Agnostic | The specification should be independent of any technology or vendor. |
Lean | Not full of cruft, use intelligent defaults, JSON-over-XML etc. |
Intuitive | Needs to be logical: easily read and understood. |
Extensible | Over our 10 years, we've accumulated a palette of some 30 different UI components (covering the obvious text-boxes through to gazetteer-selectors and maps). However, the ability to express unforeseen specialist widgets would be required. |
Toolable | To deliver wider benefit, the specification will need to play nicely with IDEs, WYSIWYG editors and similar. |
Support inference | Meta information, such as prompts, descriptions, data types etc. can be inferred from an associated data model definition (soon to be the focus of another challenge!) As such, the specification should define explicit/predictable behaviour when inferring values. |
Bring in the data from:
The challenge was originally posted on standards.data.gov.uk by Shan Rahulan
There has been a proliferation in the number of usernames and passwords required by - Government users to access government systems. With the advent of cloud and the need to build digital services there is an opportunity to set some standards for authentication which will over time reduce this issue.
As a government user, I want to have one set of credentials to access all the services I need to do my job, instead of having lots of usernames and passwords to remember
Government organisations
Government end users
Just picking up on the questions posted by myself and @mattlewis_dvla in #tech-standards on slack.
To kick things off I'm reposting @mattlewis_dvla's helpful points from slack:
Very keen to see where this goes, as we are beginning our API journey at DfE too and would like to get things right!
Hi @alphagov/tech-standards - I am testing out the ability to notify a team on github and using that as an opportunity to remind you to register (if you have't done so already) for the workshop we're holding in the afternoon of the 18th of February - see https://ti.to/torgo/ukgov-standards-camp for registration info.
Forked from #9:
Documentation – We are favouring Swagger 2.0. There is a split between design first or use annotations in the code. What are others doing?
From @rossjones:
Are you aware that the swagger 2 spec is now forming the basis for the OpenAPI initiative? The specification is currently available at https://github.com/OAI/OpenAPI-Specification/blob/master/versions/2.0.md it might cover more than a few of these items?
The adopted standard for representing a Point Location is to use ETRS89 ( https://www.gov.uk/government/publications/open-standards-for-government/exchange-of-location-point )
However, a number of councils have come back to ask ‘which flavour of ETRS89 to use’, as GIS systems which support an export to ETRS89 typically offer many options.
There are at least 25 different versions of ETRS 1989
• ETRS 1989 DKTM1 to M5;
• Lots which relate to different countries (there are five different ETRS 1989’s for Poland alone!);
• ETRS 1989 UWPP 1992;
• ETRS 1989 UWPP 2000 PAS 5 to PAS 8
Advice from Ordnance Survey is to use EPSG::4258 (or http://www.opengis.net/def/crs/EPSG/0/4258), because that is what is required by the European Commission.
Can the HMG guidance be reviewed and improved with this in mind?
I'm Ben Henley, a tech writer for GaaP
OpenAPI/Swagger is a standard way to describe APIs. It provides a machine-readable description of an API which can be generated by the developers who work on the API. Swagger can be used to generate parts of the documentation, and to create tools like interactive API explorers. There are many tools available which understand Swagger. This makes it easier to maintain accurate documentation and update it quickly. At least two GDS projects already have Swagger descriptions of their APIs.
Developers who use government APIs need accurate documentation. If all projects that produce APIs were required to maintain a Swagger description, it would make it easier to introduce common documentation tools and make documentation more accurate.
Developers would benefit from more accurate API documentation and improved ways to learn about the APIs, like interactive tools. API documentation would be standardised and consistent between projects, reducing time spent for developers to find the information they need. This will increase trust in documentation and reduce time spent on support and increase the pace of integration. Tech writers will need to spend less time maintaining documentation. A Swagger description may also be required for other API-related tools, like monitoring services, management tools and API gateways.
Teams must be able to produce and maintain Swagger descriptions of their APIs.
I would appreciate any input on this. If a stakeholder requests it as an supported format for an API, is that reasonable? How many APIs are being developed that include XML? Or is it considered legacy and clients should simply update to be able to process JSON?
@philandstuff asks: on a practical level, why might one prefer RFC 3339 or ISO 8601 as a datetime standard? I'm considering adopting RFC3339 for #registers because it's freely available and simpler
OAuth 2.0 https://tools.ietf.org/html/rfc6750 is being used across different projects. What guidance do we have about its use?
The Open Standards Board has taken up a question of data standards for job posting. It's been proposed to use the Schema.org schema developed for this purpose. To quote the issue:
It has been adopted as a voluntary standard in the US to aid the building of a 'Veterans Job Bank' , and has seen some adoption by vacancy publishers and aggregators.
Without getting in the way of the open standards process, I'm just wondering what additional implementer experience there is on using this data schema standard, or indeed other similar schema standards.
Unicode 11 will be released on 2018-06-11 - http://unicode.org/versions/Unicode11.0.0/
This issue is for board members, and other interested parties, to leave comments on whether the Open Standards Team should update the Cross platform character encoding profile to support this newer revision of the standard.
As per the terms of reference, the board operates by consensus. If the majority of the board agrees with this change, or there are no significant objections from the community, the standard will be updated.
The current standard is for Unicode 6.2 and UTF-8.
Unicode 6.2 was released in 2012. There have been several important changes since then which will be particularly useful for Government.
There is no perceived negative impact to updating this standard. Most software will automatically update to support the newer version of Unicode.
Older software will still be able to read documents which use more recent versions of the standard, but newer characters may render as �
(Unicode Replacement Character)
It is the Open Standards Team's recommendation that this update be adopted.
status of API to manage Ltd. companies
I'm an activist and open source developers building elearning to teach underprivileged JavaScript with the goal of connecting them to remote JavaScript gigs to kickstart their open source and self-employment career.
I would like to help people into self employment and the UK Ltd. seems to be a perfect vehicle, because it's easy to use and affordable. If there was an API to open, manage & close a Ltd. company, it would be a lot easier to build "business software" that directly connects and automates burueacracy which is the major hurdle for underprivileged people who just simply do not have the money and time and often background to cope with the traditional process of managing a Ltd.
The fear of trying self employment, especially the bureaucracy that comes with it are one of the major show stoppers for many people to even try themselves as little entrepreneurs. Having an API and all the support necessary will allow open source solutions to build tools that will help a variety of users to actually try their luck.
Many refugees that came to europe in the recent months and in general unemployed people or people with backgrounds that makes them a hard fit for traditional employment could go into self employment if there was an open source ecosystem of users and contributors that would help to pave the way.
It would mitigate all kinds of rational and irrational fears connected with trying oneself as an entrepreneur.
The major cost currently is filing annual accounts and hiring accountants especially in the beginning, when income is lacking or self employment is a little side project. Every pound or euro matters and an accountant makes more than 80% of all the costs that come with opening and managing the bureaucracy connected to a Ltd. company.
A nice and easy to use API to open, manage and close a Ltd. company including documentation around how to use the API. HTTP would be great, but having some kind of subscribe machanism like hooks or websockets for real time updates wouldnt be bad
The LGA is leading work to produce a schema for CSV open data on election candidates and results. The main discussion is in the Knowledge Hub thread Election results schema second round consultation Aug-Oct 2016
The consultation overview document (PDF) is http://e-sd.org/dmKwu
The latest version of the schema description is http://e-sd.org/vgTJ3
Comments can go in the above forum or by email to [email protected]. Alternatively, I'll pass on anything posted here.
As a proposer, I want to see which standards have been proposed which have not been adopted. This will help me improve the quality of my submissions.
I am a software engineer of over 25 years experience.
PGP keys are a recognised open, federated, non-centralized standard for security on the internet. The 'web of trust' ideology relies on people signing each others keys in order to increase the trust the public keys are associated with the individuals that claim to own them. Key signing parties happen to facilitate this. However, since the Post Office provides an identity check system for passports and other official documents, it could likewise provide a key signing service.
There are many needs for authenticated documents (legal contracts and so on), and for secure document exchange. When purchasing my last property there was all sorts of pointless and insecure messing around with a so-called secure document service between myself and the mortgage provider. PGP provides an open, effective and non-centralized way of solving this problem.
People who have PGP keys can have them signed by a trusted public service.
A person should be able to go into the Post Office with a passport, driving licence, or whatever is required for the existing identity check service, and have the post office sign their PGP key with a trusted government key. The fee would be the same as for the existing identity check service.
Originally Submitted by pwalsh on Mon, 13/03/2017 on standards.data.gov.uk
Much data published by governments is in common tabular data formats: CSV, Excel, and ODS. This is true for the UK government and governments around the world. To provide assurances around reusability of tabular data, consumers (users) need information on the "primitive types" for each column of data (example: is it a number? is it a date?). This also allows for quality checks to ensure consistency and integrity of the data.
Publishing Table Schema with tabular data sources provides this information. Table Schema has previously been used in work by Open Knowledge International (OKI) with Cabinet Office to check the validity of 25K fiscal data, according to publication guidelines. Table Schema is also used widely by other organisations working with public data, such as the Open Data Insititute (ODI).
I've written several user stories below. Each user story applies equally to a range of users. The user personas are as follows:
User stories
As a user, I want all public data published by government to conform to a known schema, so I can use this information to validate the data.
As a user, I want public data published by government to have a schema, so I can read the schema and understand at a glance the type of information in the data, and the possibilities for reuse.
The functional needs that the proposal must address.
Report a Food Problem Open Standard
Adam Locker, Data Architect at the Food Standards Agency.
The FSA have a service where the public can report food problems, i.e. "I found a gnome in my soup" that sort of thing, providing details of the business, the issue and some limited details about themselves. The FSA then uses this information to work out the appropriate local authority who would investigate any problems and currently hand these off to the LA by email. We're currently in the process of rebuilding this service and we'd like to improve this by using a suitable open standard if possible.
FSA would benefit from more automated transfer of data to local authorities, preferably with the ability to receive or request updates. An open standard would also help LAs share food problem data better between them.
Reusing an open standard always preferable to creating a new one. Also, we could create a standard but it isn't really one without wide adoption. Why reinvent the wheel? Can you find us a wheel that works?
Not too restrictive on the fields we can pass to LAs.
There is a common need to stream multiple JSON documents, in such a way that you process each document one at a time rather than loading the whole stream in to memory. JSON itself is unsuitable for this type of processing.
There is a useful wikipedia page on JSON Streaming.
There are a number of competing independently-reinvented "standards" in this space:
The wikipedia page has a summary of existing use of each (of which, I'm most familiar with logstash and jq). I don't know if anyone uses RFC 7464 even if it has theoretical advantages - namely, that the RS byte cannot appear anywhere in a JSON byte stream.
Home Office Not Adopting ODF Internally
The Home Office is Not Adopting ODF Internally - despite published commitments to do so. GDS opens standards process has stopped chasing to ensure it happens.
The Home Office, like many other central government departments, published plans to phase adoption of the mandatory ODF open standard.
Here are the plans: https://www.gov.uk/government/publications/home-office-open-document-format-adoption-plan/home-office-open-document-format-adoption-plan
Adoption of ODF was not just about giving citizens using government digital services a choice over document formats and software (i.e. not imposing and favouring Microsoft) but also about dismantling the internal lock-in. That's why the blogs and published plans were forced to include internal adoption.
The Home Office has made zero attempts to adopt ODF.
There is no senior understanding, nor prioritisation, of this published commitment.
There is no longer any push from GDS / Cabinet Office to ensure departments stick to their plans to phase adoption of ODF internally.
What are the results of the discovery project initiated here:
bring in the text from standards.data.gov.uk so that we can have all documentation in one place.
Just a quick update on the adoption of ODF (and other open formats) for publishing on GOVUK. Attached is a CSV showing how many open format and closed form documents have uploaded over the last two years.
Important notes:
This shows the trends in publishing over the last two years. Each data point is how many attachments were published that month.
This graph shows the top 50 Departments by number of attachments. There is a long-tail which is not included in this image, but which is included in the data. (Click the image for a larger version.)
It is encouraging to see open formats gain in popularity - although there is still some way to go. We are making changes to the publishing process to make it clear that open formats must be published.
User feedback has generally been good - although some users with specific software needs still struggle with ODS.
Here is a CSV of the data:
Open Vs Closed format attachments by organisation and month.csv.zip
When dealing with a range of currently internal json interchange formats (@ONSdigital) we're pondering how / if to namespace and identify them as we create new data formats. Our current internal examples are survey schema's (how do you define a survey at a structural level, enabling a representation of that survey , be it a web experience, a voice IVR or a mobile app) and responses from the collection of data.
We're considering java style (uk.gov.ons.<SYSTEM_NAME>.<FILE_TYPE>) or some sort of mime-type style that would allow us to reuse it in APIs to set accept headers and response headers.
In trying to find prior art, we haven't found much - be interested in any examples of adding such metadata to json files (specifically as some of this data will exist on large filesystems just as much as being served via API's)
For further context see this documentation PR here: ONSdigital/ons-schema-definitions#1
And comment on that https://github.com/ONSdigital/edc-documentation/pull/1/files#r55227838
Are there any standards around bulk download and streaming of datasets? I'm thinking about git clone
and git pull
style operations for a dataset so I can download a dataset then, at a later time, fetch everything that's changed since.
Schema for data on election candidates and results
Local Government Association - Tim Adams <Tim.Adams.local.gov.uk>
A schema defining the structure data describing an election, the candidates and results. Initially the structure should define a CSV spreadsheet format so that data can be easily published and consumed by non-programmers as well as data experts. The data format should be suitable for first-past-the-post elections in which one or more candidates can be elected for an area.
The schema is needed particularly for local government elections for which candidates are declared and results published by a few hundred local authorities in different formats.
Although there is no statutory requirement to do so, local authorities generally publish local and national election results on their web sites once those results have been provided to them by the relevant returning officer. There is no guidance or common practice to publish such data in any particular style, format or web location other than the statutory requirement placed on the returning officer to give public notice of the name(s) of the elected candidate(s) (and the fact that they were duly elected), the total number of votes given to each candidate in a contested election and details of the rejected ballot papers as shown in the statement of rejected ballot papers.
Whilst this approach allows scrutiny and review by individuals who discover the local published web pages, the work to locate such information automatically on a larger scale and then to collate data from every local authority to create a national overview is difficult, labour intensive, time consuming and often error prone. Substantial savings and ease of data discovery and reuse is possible if electoral administration departments can be encouraged to publish their data to a simple consistent form which can be read by humans and machines. In May 2016, the Government published its revised National Action Plan for Open Government 2016-2018 and Commitment No. 7 proposes a move towards consistent publishing of elections data to facilitate improved citizen engagement, take-up and innovative re-use by analysts and app developers.
A schema that defines in human and machine readable format the structure of data describing an election including:
It is necessary to be able to validate that elections data published:
The CSV schema needs to be documented in a way that unambiguously allows data experts to extract spreadsheet data into a structured database format.
The LGA has consulted stakeholders and developed a draft standard according to the iStandUK process for standards development. These are the reference documents:
Copied from #9:
Naming Standards – adopting camelCase rather than snake_case or hypens. It would be useful to have a common messaging model so that we call the same elements the same things.
Refs:
Also I notice that Registers are using hyphens. @psd was there some specific thinking behind that?
The IANA definition for tsv is great but somewhat limited. In particular it says:
Note that fields that contain tabs are not allowable in this encoding.
One of the benefits of TSV is that it is sufficiently simple that it can be processed by naive command-line utilities: splitting a TSV line on a tab character is more robust than splitting a CSV line on a comma, because CSV has all sorts of quoting rules.
However we may sometimes have a need to actually render data that contains tabs, newlines, and other such things. Is there a good way of doing this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.