michalporeba / odis Goto Github PK

Search in decentralised systems. Search federation, result moderation, aggregation and feedback with hypermedia in ReSTful API to round it all of.

License: MIT License

Python 30.02% Makefile 0.28% HTML 7.85% Jupyter Notebook 6.27% CSS 55.58%

federated search discoverability mesh-networks data data-discovery information-discovery

odis's Introduction

Open Distributed Information Sharing System (ODIS)

A federated search service with result aggregation, moderation and feedback mechanism to potentially update the source data.

If you are interested in similar topics to those used in the project, I'd be happy to chat. If you think my approach makes no sense, I'd love to hear from you - create an issue and let me know where I went wrong.*

Now let me tell you more about the ODIS idea…

The Problem

A much fuller description of the problem that motivated the project is available in the Search for a Better Search paper, but in short...

We have increasing amounts of data all around us. Discoverability is a challenge. It is especially true in systems where fine-grained access controls are necessary across organisational boundaries.

Public crawling and indexing are not possible. Centralisation of data in data warehouses and lakes is currently the go-to solution. But that means the data governance has to be centralised. While it works in some organisations, it poses significant challenges in heterogeneous systems, and searches across multiple organisations.

The increasing amount of data, gives rise to a growing number of possible search results.

Why?

To improve data discoverability in systems where centralisation of data is not an option
To improve the search results based on the relative location of the searcher, and the information in the network topology
To find a mechanism to send feedback and suggest data improvements, from the results, back to the data sources
To get ready for a more distributed data landscape

What?

So far there is:

Sample Data Service - to provide test and demo data for the project using a range of formats.
Data Exchange Network Node - The server that can act as a node in the network doing all the federation of requests.

Later there will be:

A 'Standard' that allows for easy searches in data accross a distributed network of services in a range of interactions.
A Demo (or two)

How?

The search query will be federated (distributed) in a mesh search network
The results will be filtered by the source systems
The results will be moderated by the network
The results will be aggregated before presenting them to the user
The feedback will be pushed back through the network to the source systems to suggest updates

Possible Use Cases

Find Me Button - Search for and update your personal infromation held by any department accross the Civil Service.
Product Search - Building the 'bigger picture' of a product through single search without the data warehouse.
Acronym Buster - An oversimplified example of why topology as context can help improve search results.
No Need for Catalogs - Not really a concrete use case, but an sidea what could be possible with federated search approach in terms of data management and governance.

While I hope the use cases are of use, another way to look at what the search (or data exchange) network can offer, is to look at possible interactions within the system.

But the distributed systems can do more than just facilitate a search. The CRM example illustrated how hypermedia approach to API design can help in building extensible distributed systems.

Architecture

At the moment the project is in a conceptual design stage. Have a look at the solution's architecutre as it is developed.

Technology

API: OpenSearch, OpenAPI, REST, HAL, HATEOAS, JSON-LD,
Security: OAuth2.0, Open ID Connect (OIDC), Json Web Tokens (JWT)
Back-end: Python, Django
Front-end: JavaScript, Design System, React

odis's People

Contributors

Stargazers

Watchers

Forkers

robinbetts patryserwelch8

odis's Issues

Federated Search - Is there a protocol I can use?

Question

Is there an RESTful protocol for Federated Search I could use?
If no, is there a search protocol I could base the API on?

Background

Federated search has been studied for the last two decades. Many solutions have been developed, and perhaps there is no need to design something from scratch. However, the HATEOAS elements is important, so it is likely I will have to extend an existing standard.

Open standards are important for this project to follow the guidance for UK Civil Service on Open Standards.

Standards to consider

Open Search - originally developed by Amazon, appears to be the standard. It is used by Microsoft in Windows and SharePoint search. But the standard is based on RSS (XML) and hasn't been updated since 2005.

Search/Retrieve via URL (SRU) another XML based search standard promoted by the Library of Congress. It uses Contextual Query Language (CQL). The latest version was published at the beginning of 2013. It was created to replace the Z39.50.

Schema.org vocabulary can be used with many different standards.

Existing projects

Open Federated Search

References

Admin UX - How to configure the network?

Question

What is the user experience that should be available for network administrators to add new nodes, and perhaps new data sources.

Background

Is the network configurations, the other nodes to which a given node has access to federate the queries an infrastructure, data or metadata problem?

Should it be possible to manage network nodes using standard UI or would it be better to make it a deployment type activity which is better suited for automation in deployment pipelines?

OpenAPI - Is it applicable??

Question

How is OpenAPI applicable to ODIS idea?
How does OpenAPI fit with hypermedia API design?

Background

OpenAPI if very common in API designs and implementations. It is recommended by Government Digital Service (GDS)

HATEOAS - what are the options for implementation?

Questions

What HATEOAS standard implementation to choose?
Is JSON the only way?

Background

Hypermedia as the Engine of Application State (HATEOAS) is the style of architecture I will want to allow the data in search to flow both ways. But I want to use it also to create better software, better developer experience to help with the adoption.

"If we can create software that people can learn as if they are solving puzzles, they will learn it faster, they will enjoy working with it, they will become more productive. I think that sounds pretty good." Dylan Beattie in his talk Life, Liberty and the Pursuit of APIness: The Secret to Happy Code

In fact, Dylan Beattie has a few talks about the problem with developer (user) experience, APIs and hypermedia in them. The Rest of ReST inspired a lot of my thinking on the subject.

"REST is software design on the scale of decades: every detail is intended to promote software longevity and independent evolution. [...] Many of the constraints are directly opposed to short-term efficiency." Roy Thomas Fielding - Architectural Styles and the Design of Network-based Software Architectures.

In his paper in 2000, Roy Fielding defined 6 constraints of a RESTful system and included the HATEOAS in the "Uniform Interface" constraint. I think it is time to get closer to the REST original idea.

Shortlist for review

HATEOAS is just an architectural style. There are multiple standards to consider.

JSON-LD (Specification) - read only - works with schema.org
HYDRA - JSON-LD + interactions
JSON:API
Collection+JSON
SIREN
HAL
HTML is a hypermedia format too

It appears that 'modern' HATEOAS is almost exclusively associated with JSON, even though there is no requirement for it and early examples used XML a lot. It might be a problem as the OpenSearch - the standard for federated search - is XML based.

Looking at google trends, the three most popular are JSON:API, HYDRA and HAL.

References

Richardson Maturity Model (for APIs) by Martin Fowler
Paper on Semantic ReSTful API comparison from International Conference On Web Engineering
Hypermedia message examples on github
ripozo - python library to help creating multi format-hypermedia APIs

GraphQL - is it an option?

Question

Should I use GraphSQL for the Search API?

Background

Central Digital and Data Office published recently guidance on using GraphQL for APIs in Civil Service.

It recommends using GraphQL when it is important to

minimise bandwidth use
using multiple data sources in a single query
have applications with different data needs (mobile, website, etc)
explore what data is needed (in discovery phase)

However, it has security concerns. GraphQL is about making as much data as possible available.

References

https://stackoverflow.com/questions/46061755/hateoas-vs-graphql-decision-criteria-set-for-microservices

michalporeba / odis Goto Github PK

odis's Introduction

Open Distributed Information Sharing System (ODIS)

The Problem

Why?

What?

How?

Possible Use Cases

Architecture

Technology

odis's People

Contributors

Stargazers

Watchers

Forkers

odis's Issues

Question

Background

Standards to consider

Existing projects

References

Question

Background

Question

Background

Questions

Background

Shortlist for review

References

Question

Background

References

Recommend Projects

Recommend Topics

Recommend Org