Coder Social home page Coder Social logo

michalporeba / odis Goto Github PK

View Code? Open in Web Editor NEW
9.0 4.0 2.0 16.67 MB

Search in decentralised systems. Search federation, result moderation, aggregation and feedback with hypermedia in ReSTful API to round it all of.

License: MIT License

Python 30.02% Makefile 0.28% HTML 7.85% Jupyter Notebook 6.27% CSS 55.58%
federated search discoverability mesh-networks data data-discovery information-discovery

odis's Introduction

Open Distributed Information Sharing System (ODIS)

A federated search service with result aggregation, moderation and feedback mechanism to potentially update the source data.

If you are interested in similar topics to those used in the project, I'd be happy to chat. If you think my approach makes no sense, I'd love to hear from you - create an issue and let me know where I went wrong.*

Now let me tell you more about the ODIS idea…

 

The Problem

A much fuller description of the problem that motivated the project is available in the Search for a Better Search paper, but in short...

We have increasing amounts of data all around us. Discoverability is a challenge. It is especially true in systems where fine-grained access controls are necessary across organisational boundaries.

Public crawling and indexing are not possible. Centralisation of data in data warehouses and lakes is currently the go-to solution. But that means the data governance has to be centralised. While it works in some organisations, it poses significant challenges in heterogeneous systems, and searches across multiple organisations.

The increasing amount of data, gives rise to a growing number of possible search results.

Why?

  • To improve data discoverability in systems where centralisation of data is not an option
  • To improve the search results based on the relative location of the searcher, and the information in the network topology
  • To find a mechanism to send feedback and suggest data improvements, from the results, back to the data sources
  • To get ready for a more distributed data landscape

What?

So far there is:

Later there will be:

  • A 'Standard' that allows for easy searches in data accross a distributed network of services in a range of interactions.
  • A Demo (or two)

How?

  • The search query will be federated (distributed) in a mesh search network
  • The results will be filtered by the source systems
  • The results will be moderated by the network
  • The results will be aggregated before presenting them to the user
  • The feedback will be pushed back through the network to the source systems to suggest updates

 

Possible Use Cases

  • Find Me Button - Search for and update your personal infromation held by any department accross the Civil Service.
  • Product Search - Building the 'bigger picture' of a product through single search without the data warehouse.
  • Acronym Buster - An oversimplified example of why topology as context can help improve search results.
  • No Need for Catalogs - Not really a concrete use case, but an sidea what could be possible with federated search approach in terms of data management and governance.

While I hope the use cases are of use, another way to look at what the search (or data exchange) network can offer, is to look at possible interactions within the system.

But the distributed systems can do more than just facilitate a search. The CRM example illustrated how hypermedia approach to API design can help in building extensible distributed systems.

 

Architecture

At the moment the project is in a conceptual design stage. Have a look at the solution's architecutre as it is developed.

 

Technology

odis's People

Contributors

dependabot[bot] avatar michalporeba avatar robinbetts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

odis's Issues

Federated Search - Is there a protocol I can use?

Question

  • Is there an RESTful protocol for Federated Search I could use?
  • If no, is there a search protocol I could base the API on?

Background

Federated search has been studied for the last two decades. Many solutions have been developed, and perhaps there is no need to design something from scratch. However, the HATEOAS elements is important, so it is likely I will have to extend an existing standard.

Open standards are important for this project to follow the guidance for UK Civil Service on Open Standards.

Standards to consider

Open Search - originally developed by Amazon, appears to be the standard. It is used by Microsoft in Windows and SharePoint search. But the standard is based on RSS (XML) and hasn't been updated since 2005.

Search/Retrieve via URL (SRU) another XML based search standard promoted by the Library of Congress. It uses Contextual Query Language (CQL). The latest version was published at the beginning of 2013. It was created to replace the Z39.50.

Schema.org vocabulary can be used with many different standards.

Existing projects

Open Federated Search

References

Admin UX - How to configure the network?

Question

What is the user experience that should be available for network administrators to add new nodes, and perhaps new data sources.

Background

Is the network configurations, the other nodes to which a given node has access to federate the queries an infrastructure, data or metadata problem?

Should it be possible to manage network nodes using standard UI or would it be better to make it a deployment type activity which is better suited for automation in deployment pipelines?

HATEOAS - what are the options for implementation?

Questions

  • What HATEOAS standard implementation to choose?
  • Is JSON the only way?

Background

Hypermedia as the Engine of Application State (HATEOAS) is the style of architecture I will want to allow the data in search to flow both ways. But I want to use it also to create better software, better developer experience to help with the adoption.

"If we can create software that people can learn as if they are solving puzzles, they will learn it faster, they will enjoy working with it, they will become more productive. I think that sounds pretty good." Dylan Beattie in his talk Life, Liberty and the Pursuit of APIness: The Secret to Happy Code

In fact, Dylan Beattie has a few talks about the problem with developer (user) experience, APIs and hypermedia in them. The Rest of ReST inspired a lot of my thinking on the subject.

"REST is software design on the scale of decades: every detail is intended to promote software longevity and independent evolution. [...] Many of the constraints are directly opposed to short-term efficiency." Roy Thomas Fielding - Architectural Styles and the Design of Network-based Software Architectures.

In his paper in 2000, Roy Fielding defined 6 constraints of a RESTful system and included the HATEOAS in the "Uniform Interface" constraint. I think it is time to get closer to the REST original idea.

Shortlist for review

HATEOAS is just an architectural style. There are multiple standards to consider.

It appears that 'modern' HATEOAS is almost exclusively associated with JSON, even though there is no requirement for it and early examples used XML a lot. It might be a problem as the OpenSearch - the standard for federated search - is XML based.

Looking at google trends, the three most popular are JSON:API, HYDRA and HAL.

References

GraphQL - is it an option?

Question

Should I use GraphSQL for the Search API?

Background

Central Digital and Data Office published recently guidance on using GraphQL for APIs in Civil Service.

It recommends using GraphQL when it is important to

  • minimise bandwidth use
  • using multiple data sources in a single query
  • have applications with different data needs (mobile, website, etc)
  • explore what data is needed (in discovery phase)

However, it has security concerns. GraphQL is about making as much data as possible available.

References

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.