bluebrain / nexus-forge Goto Github PK

View Code? Open in Web Editor NEW

38.0 21.0 20.0 3.29 MB

Building and Using Knowledge Graphs made easy

Home Page: https://nexus-forge.readthedocs.io

License: GNU Lesser General Public License v3.0

Python 98.56% Jupyter Notebook 0.31% Gherkin 1.13%

knowledge-graph data-management data-science shacl rdf knowledge-engineering json-ld knowledgegraph

nexus-forge's Introduction

Blue Brain Nexus Forge

Blue Brain Nexus Forge is a domain-agnostic, generic and extensible Python framework enabling non-expert users to create and manage knowledge graphs by making it easy to:

Discover and reuse available knowledge resources such as ontologies and schemas to shape, constraint, link and add semantics to datasets.
Build knowledge graphs from datasets generated from heterogenous sources and formats. Defining, executing and sharing data mappers to transform data from a source format to a target one conformant to schemas and ontologies.
Interface with various stores offering knowledge graph storage, management and scaling capabilities, for example Nexus Core store or in-memory store.
Validate and register data and metadata.
Search and download data and metadata from a knowledge graph.

Getting Started

The examples directory contains many Jupyter Notebooks to get started with Blue Nexus Forge user features and usage scenarios.

You can run the Getting Started notebooks on Binder by clicking on .

For local execution, make sure that the jupyter notebook|lab is launched in the same virtual environment where Blue Brain Nexus Forge is installed. Alternatively, set up a specialized kernel.

In both cases, please start with the notebook named 00 - Initialization. It contains instructions for configuring the Forge with:

an example in-memory store and an example schema language,
Blue Brain Nexus as store and W3C SHACL as schema language.

After, it is recommended to run the notebooks following their number (01, 02, ...).

Installation

It is recommended to use a virtual environment such as venv or conda environment.

Stable version

pip install nexusforge

Upgrade to the latest version

pip install --upgrade nexusforge

Development version

pip install git+https://github.com/BlueBrain/nexus-forge

Funding and Acknowledgements

The development of this software was supported by funding to the Blue Brain Project, a research center of the École polytechnique fédérale de Lausanne (EPFL), from the Swiss government's ETH Board of the Swiss Federal Institutes of Technology, and from the European Union’s Horizon 2020 Framework Programme for Research and Innovation under the Specific Grant Agreement No. 785907 (Human Brain Project SGA2).

nexus-forge's People

Contributors

Stargazers

Watchers

nexus-forge's Issues

How does Nexus Forge handle linked resources on creation/update?

I have a question regarding the representation of linked resources in JSON-LD on creation/update.

Example:

The type Person can link to an Organization via the affiliation property.

org_mapping = DictionaryMapping("""
    type: Organization
    name: x['name']
    email: x['email']
""")

test_org = { 'name': 'my firm', 'email': '[email protected]'}

org_mapped = forge.map(test_org, org_mapping)
forge.register(org_mapped)

_register_one
True

person_mapping = DictionaryMapping("""
    type: Person
    familyName: x['fname']
    givenName: x['gname']
    identifier: x['orcid']
    name: x['fname']
""")

test_person = { 'fname': 'Duck', 'gname': 'Donald', 'orcid': 'https://orcid.org/0000-0001-2345-6789'}

person_mapped = forge.map(test_person, person_mapping, na='nan')
person_mapped.affiliation = org_mapped

forge.register(person_mapped)

_register_one
True

And then in Nexus I see the following data:

{
  {
    "@context": {...}
   },
  "@type": "Person",
  "affiliation": {
    "@id": "https://myserver/v1/resources/org/proj/_/c6e07950-1cbc-4022-bd40-d2d1771e0e0b",
    "@type": "Organization",
    "email": "[email protected]",
    "name": "my firm"
  },
  "familyName": "Duck",
  "givenName": "Donald",
  "identifier": "https://orcid.org/0000-0001-2345-6789",
  "name": "Duck"
}

and

{
  {
    "@context": {...}
   },
  "@type": "Organization",
  "email": "[email protected]",
  "name": "my firm"

In Nexus, there are now two instances: a Person linking to an Organization. But the Organization is also embedded inside the JSON-LD representing the Person.

Is Organization stored redundantly in Nexus? What happens if the Organization resource is mutated (consistency of the data embedded in Person)?

Does the compacted JSON-LD correspond to the original payload sent to the Nexus Delta API (I noted that the @id is missing: I assume this is because the @id was created by the system and is thus not part of the payload)?

Fix typo in service.py file: self.resolve_context

This line

nexus-forge/kgforge/specializations/stores/nexus/service.py

Line 189 in ada9092

context_resolver=self.resolve_contextn, na=nan)

contains a typo: self.resolve_contextn instead of self.resolve_context

'PathsWrapper' object has no attribute '_path' issue

When forge.template("Dataset") outputs:

type: { id: "" }
then using:

resources = forge.search(p.type == "Dataset", limit=5)

triggers a:

'PathsWrapper' object has no attribute '_path' issue

Nexus store specialization: forge.attach() method fails for files larger than 2 GB

The dataset was of 13.6 GB.

when asked about this to the backend team they mentioned it might be a Forge issue.

Incorrect URL encoding

When fetching a resource, Nexus Forge splits HTTP URLs into parts prior to the URL encoding
In the current implementation, a fragment following a # is not taken into account for the encoding
In consequence, a resource with the following @id can't be retrieved: http://purl.obolibrary.org/obo/NCBITaxon#_taxonomic_rank
The following error is thrown:
<action> retrieve <error> RetrievalError: 404 Client Error: Not Found for url: https://.../_/http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FNCBITaxon

Retrieving resources when accessing Resource.<attribute>

Following a discussion on Slack with @pafonta and @MFSY :

Currently, with some of the linked resources, the only way to navigate to a wanted resource (due to how they are registered on Nexus) is the following:

"""
a: instance of kgforge.core.Resource
path: instance of list
forge: instance of kgforge.core.KnowledgeGraphForge 
"""

r = a
path = ['b', 'c', 'd', 'e', 'f']
for x in path:
    rid = r.get(x).id
    r = forge.retrieve(rid)

I wanted to open up a discussion for being able to traverse the path in these cases without the user having to call for the for the forge.retrieve in each step. I.e. the user would be able to do:

"""
a: instance of kgforge.core.Resource
"""

f = a.b.c.d.e.f

Whether or not it is feasible, risky, useful, or if it should or should not be implemented, is up for discussion.

Resolving terms with nexus-resolver implementation: Properties missing on returned resources

With the current implementation of the OntologyResolver for nexus-store (https://github.com/BlueBrain/nexus-forge/blob/master/kgforge/specializations/resolvers/ontology_resolver.py), the properties isDefinedBy and subClassOf are only returned when the properties notation and prefLabel are present on the Class.

However, prefLabel and notation are optional. Even when absent, isDefinedBy andd subClassOf should be returned if contained in the term-to-resource mapping (https://github.com/BlueBrain/nexus-forge/blob/master/examples/configurations/nexus-resolver/term-to-resource-mapping.hjson)

Implement the option to target a specific ontology when resolving terms

With the current implementation of the OntologyResolver for nexus-store (https://github.com/BlueBrain/nexus-forge/blob/master/kgforge/specializations/resolvers/ontology_resolver.py), one can resolve terms from a given store (i.e. by specifying the bucket).

Since multiple ontologies may be contained within a single bucket, it would be great to be able to target a specific ontology (e.g. by its identifier or label) within a given bucket. This could be implemented by leveraging the isDefinedBy property present on the classes of the ontology.

AttributeError: 'list' object has no attribute 'items' Error when using an array context

When running:

resources = forge.search(p.type.id == "Person", limit=5)

I got the following error:

rewrite_sparql
AttributeError: 'list' object has no attribute 'items'

Looks like the forge.store.rewrite_sparql() function is expecting the forge._store.model_context.document to be a dict and not an array.

In my case, forge._store.model_context.document output is:

{'@context': [
   {
     '@vocab': 'https://neuroshapes.org/',
      'Person': {'@id': 'schema:Person'},
     'AcquisitionAnnotation': {'@id': 'nsg:AcquisitionAnnotation'},
     'Activity': {'@id': 'prov:Activity'},
     'AffineLinearTransform': {'@id': 'nsg:AffineLinearTransform'},
     'Agent': {'@id': 'prov:Agent'},
     'Analysis': {'@id': 'nsg:Analysis'},
     'AnalysisConfiguration': {'@id': 'nsg:AnalysisConfiguration',
      '@type': '@id'},
     'AnalysisReport': {'@id': 'nsg:AnalysisReport', '@type': '@id'},
     'AnalysisResult': {'@id': 'nsg:AnalysisResult'},
     'AnnotatedSlice': {'@id': 'nsg:AnnotatedSlice'},
     'Annotation': {'@id': 'nsg:Annotation'},
     'AnnotationBody': {'@id': 'nsg:AnnotationBody'},
     'ApicalAnnotation': {'@id': 'nsg:ApicalAnnotation'}
    },
   'https://bluebrain.github.io/nexus/contexts/resource.json'
]
}

pip install nexusforge not working

$ pip install nexusforge

ERROR: Could not find a version that satisfies the requirement nexusforge (from versions: none)
ERROR: No matching distribution found for nexusforge

How should nexusforge be installed in a conda environment (python 3.6)

Error When Using forge.search / forge.sparql

I encountered a problem when using forge.search / forge.sparql. The request is rejected and the following errors are returned:
HTTPError: 400 Client Error: Bad Request for url: https://***defaultSparqlIndex/sparql and QueryingError: 400 Client Error: Bad Request for url: https://***defaultSparqlIndex/sparql.

The URL seems valid: it consists of the server URL, the projects, and contains the endpoint config from the yml file (url encoded). From the admin interface, I am able to run the queries without any problem and they are sent to the same endpoint.

Example: forge.sparql("SELECT ?s ?p ?o WHERE {?s ?p ?o} LIMIT 20")

---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/opt/tljh/user/lib/python3.7/site-packages/kgforge/specializations/stores/bluebrain_nexus.py in _sparql(self, query, limit, offset)
    403                 self.service.sparql_endpoint, data=query, headers=self.service.headers_sparql)
--> 404             response.raise_for_status()
    405         except Exception as e:

/opt/tljh/user/lib/python3.7/site-packages/requests/models.py in raise_for_status(self)
    939         if http_error_msg:
--> 940             raise HTTPError(http_error_msg, response=self)
    941 

HTTPError: 400 Client Error: Bad Request for url: https://***defaultSparqlIndex/sparql

During handling of the above exception, another exception occurred:

QueryingError                             Traceback (most recent call last)
<ipython-input-38-25a4c77e3117> in <module>
----> 1 forge.sparql("SELECT ?s ?p ?o WHERE {?s ?p ?o} LIMIT 20")

/opt/tljh/user/lib/python3.7/site-packages/kgforge/core/commons/execution.py in wrapper(*args, **kwargs)
     62 
     63         try:
---> 64             return fun(*args, **kwargs)
     65         except Exception as e:
     66             stack = traceback.extract_stack()

/opt/tljh/user/lib/python3.7/site-packages/kgforge/core/forge.py in sparql(self, query, debug, limit, offset)
    336     def sparql(self, query: str, debug: bool = False, limit: int = 100,
    337                offset: Optional[int] = None) -> List[Resource]:
--> 338         return self._store.sparql(query, debug, limit, offset)
    339 
    340     @catch

/opt/tljh/user/lib/python3.7/site-packages/kgforge/core/archetypes/store.py in sparql(self, query, debug, limit, offset)
    254             print(*["Submitted query:", *qr.splitlines()], sep="\n   ")
    255             print()
--> 256         return self._sparql(qr, limit, offset)
    257 
    258     def _sparql(self, query: str, limit: int, offset: int) -> List[Resource]:

/opt/tljh/user/lib/python3.7/site-packages/kgforge/specializations/stores/bluebrain_nexus.py in _sparql(self, query, limit, offset)
    404             response.raise_for_status()
    405         except Exception as e:
--> 406             raise QueryingError(e)
    407         else:
    408             data = response.json()

QueryingError: 400 Client Error: Bad Request for url: https://***defaultSparqlIndex/sparql

Version used: nexusforge 0.6.2

I'd be grateful for any hint. Thanks a lot!

forge.sparql(): Allow `LIMIT` on the query itself

When using forge.sparql(), one cannot provide the LIMIT in the query string itself without setting the limit argument on the method to None.

While this throws a HTTPError: 400 Client Error: Bad Request for url: https://***defaultSparqlIndex/sparql and QueryingError: 400 Client Error: Bad Request for url: https://***defaultSparqlIndex/sparql error:

forge.sparql("SELECT ?s ?p ?o WHERE {?s ?p ?o} LIMIT 20")

the following work:

forge.sparql("SELECT ?s ?p ?o WHERE {?s ?p ?o}", limit=20)
forge.sparql("SELECT ?s ?p ?o WHERE {?s ?p ?o} LIMIT 20", limit=None)

A user should be allowed to provide the LIMIT in the query string itself without having to set the limit argument to None

The limit default value is set here:

nexus-forge/kgforge/core/forge.py

Line 347 in a65d4e3

def sparql(self, query: str, debug: bool = False, limit: int = 100,

The query string is being built here:

nexus-forge/kgforge/specializations/stores/bluebrain_nexus.py

Line 453 in 9287baa

query = f"{query} {s_limit} {s_offset}"

File retrieve support

As a user, I want to be able to retrieve nxv:File resources directly by resource_id, e.g. to be able to tag them.

Currently, trying to retrieve a nxv:File resource by resource_id gives the following stack trace:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-92-1d8dd79def0f> in <module>
----> 1 forge.retrieve("https://bbp.epfl.ch/neurosciencegraph/data/d2f04f98-155a-4a4f-80ee-48eb66b87939")

~/miniconda3/envs/kgforge/lib/python3.7/site-packages/kgforge/core/commons/execution.py in wrapper(*args, **kwargs)
     62 
     63         try:
---> 64             return fun(*args, **kwargs)
     65         except Exception as e:
     66             stack = traceback.extract_stack()

~/miniconda3/envs/kgforge/lib/python3.7/site-packages/kgforge/core/forge.py in retrieve(self, id, version, cross_bucket)
    307     def retrieve(self, id: str, version: Optional[Union[int, str]] = None,
    308                  cross_bucket: bool = False) -> Resource:
--> 309         return self._store.retrieve(id, version, cross_bucket)
    310 
    311     @catch

~/miniconda3/envs/kgforge/lib/python3.7/site-packages/kgforge/specializations/stores/bluebrain_nexus.py in retrieve(self, id, version, cross_bucket)
    204         else:
    205             data = response.json()
--> 206             resource = self.service.to_resource(data)
    207             resource._synchronized = True
    208             self.service.sync_metadata(resource, data)

~/miniconda3/envs/kgforge/lib/python3.7/site-packages/kgforge/specializations/stores/nexus/service.py in to_resource(self, payload)
    257     def to_resource(self, payload: Dict) -> Resource:
    258         data_context = deepcopy(payload["@context"])
--> 259         data_context.remove(NEXUS_CONTEXT)
    260         data_context = data_context[0] if len(data_context) == 1 else data_context
    261         metadata = dict()

AttributeError: 'str' object has no attribute 'remove'

Reason is line 259 in https://github.com/BlueBrain/nexus-forge/blob/master/kgforge/specializations/stores/nexus/service.py: data_context.remove(NEXUS_CONTEXT)

The variable data_context is expected to be a list, while nxv:File resources contain only the https://bluebrain.github.io/nexus/contexts/resource.json string as data_context and not a list.

Problem with Dependency pyparsing

Hi there

I had the following problem when instantiating KnowledgeGraphForge:

File "/shacl_env/lib/python3.9/site-packages/rdflib/plugins/sparql/init.py", line 33, in
from . import parser
File "/shacl_env/lib/python3.9/site-packages/rdflib/plugins/sparql/parser.py", line 184, in
Param('prefix', PN_PREFIX)) + Suppress(':').leaveWhitespace()
File "/shacl_env/lib/python3.9/site-packages/rdflib/plugins/sparql/parserutils.py", line 114, in init
self.name = name
AttributeError: can't set attribute

I could fix it by installing a specific version of pyparsing: pip3 install pyparsing==2.4.7 (instead of "pyparsing 3.0.3").

Support reshaping a resource using a SHACL shape

forge.from_jsonld(): Handling Types of Literal Values in JSON-LD

Hi there

I have a problem regarding types of literal values in JSON-LD.

Example:

data:

grant = {
  "@context": {
    "@vocab": "http://schema.org/",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },
  "@type": "Grant",
  "name": "My grant",
  "description": "My first granted grant",
  "startDate": {
      "@type": "xsd:date",
      "@value": "2021-09-23Z"
    }
}

schema file with shape

{
  "@context": [
    "https://incf.github.io/neuroshapes/contexts/schema.json",
    {
      "this": "https://schemaorgshapes.org/dash/grant/shapes/"
    }
  ],
  "@type": "nxv:Schema",
  "@id": "https://schemaorgshapes.org/dash/grant",
  "imports": [],
  "shapes": [
    {
      "@id": "this:GrantShape",
      "@type": "sh:NodeShape",
      "label": "A grant",
      "targetClass": "schema:Grant",
      "property": [
        {
          "path": "schema:name",
          "name": "Name",
          "description": "The Grant name.",
          "datatype": "xsd:string",
          "minCount": 1,
          "maxCount": 1
        },
        {
          "path": "schema:description",
          "name": "Description",
          "description": "The Grant description",
          "datatype": "xsd:string",
          "minCount": 1,
          "maxCount": 1
        },
        {
          "path": "schema:startDate",
          "name": "Start Date",
          "description": "The Grant's start date",
          "minCount": 1,
          "maxCount": 1,
          "datatype": "xsd:date"
        }
      ]
    }
  ]
}

When doing the following, I get a validation error:

res = forge.from_jsonld(grant)

forge.validate(res)

Constraint Violation in DatatypeConstraintComponent (http://www.w3.org/ns/shacl#DatatypeConstraintComponent):
Severity: sh:Violation
Source Shape: [ sh:datatype xsd:date ; sh:description Literal("The Grant's start date") ; sh:maxCount Literal("1", datatype=xsd:integer) ; sh:minCount Literal("1", datatype=xsd:integer) ; sh:name Literal("Start Date") ; sh:path schema:startDate ]
Focus Node: [ http://schema.org/description Literal("My first granted grant") ; http://schema.org/name Literal("My grant") ; http://schema.org/startDate [ http://schema.org/value Literal("2021-09-23Z") ; rdf:type xsd:date ] ; rdf:type http://schema.org/Grant ]
Value Node: [ http://schema.org/value Literal("2021-09-23Z") ; rdf:type xsd:date ]
Result Path: schema:startDate

To me, the encoding of the xsd:date seems correct in JSON-LD, see also here.

However , the message for the focus node looks a bit strange:

Value Node: [ http://schema.org/value Literal("2021-09-23Z") ; rdf:type xsd:date ]

Could it be that the @value is mistaken for the schema.org property http://schema.org/value?

I'd be grateful for any help. Thanks a lot.

File update support

As a user, I'd like to be able to create a new revision of an existing File.

Make necessary changes to the Forge API
Implement https://bluebrainnexus.io/docs/delta/api/files-api.html

Install required version of Pyshacl when installing Nexus Forge

The Shape' object has no attribute 'traverse' error described here: #36 is due to the installed version of Pyshacl. On 2020-07-10, a new version of Pyshacl was released on PyPi and Nexus Forge does not specify a particular Pyshacl version for installation. I.e., when installing the Nexus Forge now, the latest release Pyshacl 0.12.0 will be installed which causes the error. Running the same notebook as in #36 with Pyshacl 0.11.6.post1 works fine.

Implement loading forge configuration from URL

As a user, I won't to be able to load a Nexus Forge configuration from URL, including required mappings.

RuntimeError on nest_asyncio.apply() (There is no current event loop in thread 'Thread-3' )

Hi,

I have encountered an issue when initializing KnowledgeGraphForge on my local Flask server using

forge = KnowledgeGraphForge(forge_config_path, token=token)

This line produces the following traceback:

Traceback (most recent call last):
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/flask/app.py", line 2464, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/flask/app.py", line 2450, in wsgi_app
    response = self.handle_exception(e)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/flask/app.py", line 1867, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/oshurko/Workspace/BlueBrainEmbedder/app.py", line 66, in fit_model
    forge_config_path, token=token)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/kgforge/core/forge.py", line 171, in __init__
    self._model: Model = model(**model_config)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/kgforge/specializations/models/rdf_model.py", line 69, in __init__
    super().__init__(source, **source_config)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/kgforge/core/archetypes/model.py", line 47, in __init__
    self.service: Any = self._initialize_service(self.source, **source_config)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/kgforge/core/archetypes/model.py", line 184, in _initialize_service
    return self._service_from_store(store, context_config, **source_config)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/kgforge/specializations/models/rdf_model.py", line 144, in _service_from_store
    default_store: Store = store(endpoint, bucket, token)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/kgforge/specializations/stores/bluebrain_nexus.py", line 84, in __init__
    model_context)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/kgforge/core/archetypes/store.py", line 58, in __init__
    self.service: Any = self._initialize_service(self.endpoint, self.bucket, self.token)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/kgforge/specializations/stores/bluebrain_nexus.py", line 433, in _initialize_service
    return Service(endpoint, self.organisation, self.project, token, self.model_context, 200)
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/kgforge/specializations/stores/nexus/service.py", line 93, in __init__
    nest_asyncio.apply()
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/site-packages/nest_asyncio.py", line 10, in apply
    loop = loop or asyncio.get_event_loop()
  File "/Users/oshurko/opt/anaconda3/envs/bbembed/lib/python3.7/asyncio/events.py", line 644, in get_event_loop
    % threading.current_thread().name)
RuntimeError: There is no current event loop in thread 'Thread-3'.

In case it may give some insight on what is going on: in the source code the problematic line (nest_asyncio.apply()) is commented as '# This async to work on jupyter notebooks'

Fix forge.download() when distribution contains an object with a `url` property

The distribution shape (https://github.com/INCF/neuroshapes/blob/master/shapes/neurosciencegraph/commons/distribution/schema.json) allows one to provide a distribution object with a contentUrl or url property.

We have some resources with an array of distribution objects, some containing the contentUrl and others the url property. Currently, when performing forge.download(resources, "distribution.contentUrl", "./") on those resources, the following error is shown:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/reshaping.py in collect_values(data, follow, exception)
     87         r = Reshaper("")
---> 88         reshaped = dispatch(data, r._reshape_many, r._reshape_one, [follow], False)
     89         jsoned = as_json(reshaped, False, False, None, None, None)

~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/commons/execution.py in dispatch(data, fun_many, fun_one, *args)
     94     elif isinstance(data, Resource):
---> 95         return fun_one(data, *args)
     96     else:

~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/reshaping.py in _reshape_one(self, resource, keep, versioned)
     44     def _reshape_one(self, resource: Resource, keep: List[str], versioned: bool) -> Resource:
---> 45         return self._reshape(resource, keep, versioned)
     46 

~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/reshaping.py in _reshape(self, resource, keep, versioned)
     55             if isinstance(value, List):
---> 56                 new_value = self._reshape_many(value, leafs, versioned)
     57             elif isinstance(value, Resource):

~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/reshaping.py in _reshape_many(self, resources, keep, versioned)
     41         # Could be optimized in the future.
---> 42         return [self._reshape(x, keep, versioned) for x in resources]
     43 

~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/reshaping.py in <listcomp>(.0)
     41         # Could be optimized in the future.
---> 42         return [self._reshape(x, keep, versioned) for x in resources]
     43 

~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/reshaping.py in _reshape(self, resource, keep, versioned)
     53             leafs = [x[1] for x in levels if len(x) > 1 and x[0] == root]
---> 54             value = getattr(resource, root)
     55             if isinstance(value, List):

AttributeError: 'Resource' object has no attribute 'contentUrl'

During handling of the above exception, another exception occurred:

DownloadingError                          Traceback (most recent call last)
<ipython-input-32-556ae8d452fb> in <module>
----> 1 forge.download(r, "distribution.contentUrl", "./")

~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/commons/execution.py in wrapper(*args, **kwargs)
     62 
     63         try:
---> 64             return fun(*args, **kwargs)
     65         except Exception as e:
     66             stack = traceback.extract_stack()

~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/forge.py in download(self, data, follow, path, overwrite)
    336                  overwrite: bool = False) -> None:
    337         # path: DirPath.
--> 338         self._store.download(data, follow, path, overwrite)
    339 
    340     # Storing User Interface.

~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/archetypes/store.py in download(self, data, follow, path, overwrite)
    138                  overwrite: bool) -> None:
    139         # path: DirPath.
--> 140         urls = collect_values(data, follow, DownloadingError)
    141         count = len(urls)
    142         if count == 0:

~/miniconda3/envs/topo/lib/python3.8/site-packages/kgforge/core/reshaping.py in collect_values(data, follow, exception)
     91         return list(_collect(prepared))
     92     except AttributeError:
---> 93         raise exception("path to follow is incorrect")

DownloadingError: path to follow is incorrect

Since the _reshape() method (https://github.com/BlueBrain/nexus-forge/blob/40a2227b71640e2e9f8198699bbe211495ffb644/kgforge/core/reshaping.py) expects every element in an array to have the same properties.

Handling '.' in class and property names

Some ontologies use '.' in the property and class names. Example here: http://hl7.org/fhir/patient-example.ttl.html

If we use a tool such as TopBraidComposer to generate the SHACL shapes, the shapes also have '.' in the names.

I think this causes problems with some of the forge commands because I think in forge the '.' means we're trying to access another property. I don't think it expects that it's part of the name.

Is there a way to handle '.' is shapes and property names?

Support forge.sources() for a RDFModel

Resolving a resource using the agent's email address when available

It could be convenient to have the possibility of targeting agents using their email address as it is more specific than name/familyname (which could be similar between agents).

Support for configurable SPARQL endpoint in the BlueBrain Nexus Delta Store when using forge.search

When using BlueBrain Nexus Delta as a store, the default SPARQL view is used without a possibility for the user to change and use another SPARQL endpoint.

I would be good to allow users to configure whatever SPARQL view they want to use as an endpoint for SPARQL queries.

'Person' template does not have a 'type' property

Using the search method to retrieve the list of 'Persons' resources from bbp/agents, I encounter an issue about the PathWrapper object having no attribute 'type':

> p = forge_prod.paths("Person")
> resources = forge_prod.search(p.type == "Person")
'PathsWrapper' object has no attribute 'type'

In fact, it comes from the 'Person' template json not including the type property:

> forge_prod.template("Person", output="json")
{
    "id": "",
    "additionalName": "",
    "affiliation": {
        "id": "",
        "type": "Organization",
        "address": "",
        "email": "",
        "identifier": "",
        "name": "",
        "parentOrganization": {
            "id": "",
            "type": "OrganizationShape"
        }
    },
    "email": "",
    "familyName": "",
    "givenName": "",
    "identifier": {
        "id": "",
        "propertyID": "",
        "value": {
            "id": ""
        }
    }
}

The Person schema is:

> schema_id = forge_prod._model.schema_id('Person')
> print(schema_id)
https://neuroshapes.org/dash/person

Queries do not run the same in Forge and Studio

Hi,

Here's the query in Studio, the self is required for technical reasons:

  PREFIX vocab: <https://sandbox.bluebrainnexus.io/v1/vocabs/>
  PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
  SELECT DISTINCT ?title ?self
  WHERE {
  ?id nxv:self ?self ;
      nxv:deprecated false ;
      vocab:title ?title ;
      ^vocab:movieId / vocab:tag "thought-provoking" .
  }

If I run this piece of code through the Forge, I get the following error:

TypeError: init() got multiple values for argument 'self'

I can fix it if I remove either the ?self in the 'SELECT' or if I remove the assertion nxv:self ?self ;. Then the query runs fine.

Retrieve the storage path of a given file in a dataset distribution

Forge elastic does not propertly handle the results

Currently, forge.elastic returns raw Elasticsearch results, e.g.

{
    _id: <index_id>,
    _score: 0.00080160325,
    _source: <indexed_resource>,
   ....
}

We want to hide by default the ES result fields (_id, _index, _score) as we did for forge.sparql. Scores and other ES result fields can be obtained through result._store_metadata.

Receiving RecursionError

Hi,

After upgrading to v.0.5.2 from v0.3.3, when I run the following command:

forge.template("patient", output="json", only_required=True)

I receive the following error:

<action> _compile
<error> RecursionError: maximum recursion depth exceeded while calling a Python object

It was successful in v0.3.3. Let me know if you need more details.

Just a note for tracking that I originally mentioned this problem in #132 although that issue was about another problem I was having.

Create a forge session using a configuration URL

Implement option to deprecate related files when deprecating dataset resource

As a user, I want to be able to deprecate a dataset while specifying whether the related files in the distribution object should also be deprecated or not.

Nexus Delta will support synchronous indexing from 1.6 onwards

Please have a look at: https://bluebrainnexus.io/snapshot/docs/delta/api/resources-api.html#indexing

My guess is that users would like to specify that parameter when registering a resource with Forge, i.e. @DrTaDa

Add support for validation against a schema

The current implementation of .validate() expects a type attribute in the validated resource object. It would be useful to be able to validate against a schema by passing the schema as parameter in .validate()

Error when bulk downloading from a list of resources with length 1

An error occurs when trying to download from a list of resources whose length is 1.

The following code:

dirpath = "./downloaded/"
forge.download(data, "distribution.contentUrl", dirpath, overwrite=True)

Outputs:

<action> download
<error> AttributeError: 'list' object has no attribute '_store_metadata'

When checking len(data), we get 1. On the other hand, the following code:

forge.download(data[0], "distribution.contentUrl", dirpath, overwrite=True)

works fine (so, single download works without an issue)

Forge elastic does not handle user-provided resource limit

User provided limit is not taken into account, i.e.

forge.elastic(query, limit=100)

doesn't apply limit to results

add docstrings?

there are too few docstrings, improving on that would help a lot to use this code efficiently imho

Investigate forge.from_graph yielding a list of resources instead of a Resource

Make Nexus demo examples run on the sandbox

Implement instantiation of a Dataset from a Resource

Currently, there is no implementation to instantiate a kgforge.specializations.resources.datasets.Dataset type from an existing kgforge.core.resource.Resource type.

Having this capability would be useful to e.g. retrieve a Resource from the store as Dataset allowing a user to benefit from methods available on the Dataset specialisation.

Forge does not initialize on OpenShift pods with no access to internet (external network)

Initializing forge inside an OpenShift pod with no access to the external network fails with the following error:

  File "/usr/local/lib/python3.7/site-packages/kgforge/specializations/stores/nexus/service.py", line 167, in resolve_context
    raise ValueError(f"{iri} is not resolvable")
ValueError:  https://bluebrainnexus.io/contexts/metadata.json is not resolvable

Sometimes this error appears after a very long delay, when I think forge tries to connect to some external URL and waits for a response.

Here are some details on init and config:

In my app, I run forge = KnowledgeGraphForge(config, token=TOKEN), where config points to the file with the following content:

Model:
  name: RdfModel
  origin: store
  source: BlueBrainNexus
  context:
    iri: "https://bbp.neuroshapes.org"
    bucket: "neurosciencegraph/datamodels"

Store:
  name: BlueBrainNexus
  endpoint: https://bbp.epfl.ch/nexus/v1
  bucket: <MY_BUCKET>
  searchendpoints:
    sparql:
      endpoint: "https://bluebrain.github.io/nexus/vocabulary/defaultSparqlIndex"
  vocabulary:
    iri: "https://bluebrainnexus.io/contexts/metadata.json"
    namespace: "https://bluebrain.github.io/nexus/vocabulary/"
    deprecated_property: "https://bluebrain.github.io/nexus/vocabulary/deprecated"
    project_property: "https://bluebrain.github.io/nexus/vocabulary/project"
  max_connection: 50
  versioned_id_template: "{x.id}?rev={x._store_metadata._rev}"
  file_resource_mapping: ./file-to-resource

Resolvers:
  ontology:
    - resolver: OntologyResolver
      origin: store
      source: BlueBrainNexus
      targets:
        - identifier: terms
          bucket: neurosciencegraph/datamodels
      result_resource_mapping: https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/configurations/nexus-resolver/term-to-resource-mapping.hjson
  agent:
    - resolver: AgentResolver
      origin: store
      source: BlueBrainNexus
      targets:
        - identifier: agents
          bucket: bbp/agents
      result_resource_mapping: https://raw.githubusercontent.com/BlueBrain/nexus-forge/master/examples/configurations/nexus-resolver/agent-to-resource-mapping.hjson

Formatters:
  identifier: https://bbp.epfl.ch/neurosciencegraph/data/{}/{}

Support ns free filtering with store_metadata values

BlueBrainNexus store adds _createdAt, _updatedAt, _rev. It would be good to support filtering with those when calling forge.search() without specifying BlueBrainNexus store default namespace.

Add option to set the value for asyncio max_connections in the configuration

Currently, batch download for the nexus store is implemented through the _download_many method (https://github.com/BlueBrain/nexus-forge/blob/73f4becb842963adebf14981a0aae82e4d591f86/kgforge/specializations/stores/bluebrain_nexus.py) with max_connections set to 200 (https://github.com/BlueBrain/nexus-forge/blob/73f4becb842963adebf14981a0aae82e4d591f86/kgforge/specializations/stores/bluebrain_nexus.py)
With the current Delta deployment, this default value should be set to 50 instead of 200
Additionally, the value for max_connections should be configurable through the config

Add Github actions support to replace Travis

Currently there is no support for Github Actions and the testing and deployment is done using Travis. We should replace Travis with Github Actions for both testing and deployment stages.

Non-specific error message on registering resources with expired token

When the token expires and we are unauthorized, an error message is not specific enough, i.e. we get:

JSONDecodeError: Expecting value: line 1 column 1

Fix forge.update() for nested properties

The current behaviour of forge.update() is as follows:

Example resource with a nested property:

resource = forge.from_json({
    "outer": {
        "inner": "test"
    }
})
forge.register(resource)

Updating the inner property throws a resource should not be synchronized UpdatingError:

resource.outer.inner = "updated"
forge.update(resource)

While the following works fine:

resource.outer = resource.outer
forge.update(resource)

Fix module 'asyncio' has no attribute 'create_task' when running batch requests

asyncio.create_task(...) support was introduced in Python 3.7.

So calling it in will trigger:

AttributeError: module 'asyncio' has no attribute 'create_task'

For Python 3.6, let use:

loop = asyncio.get_event_loop() loop.create_task(...)

forge.freeze fails with "has no attribute '_rev'" error when called on a Dataset with generation and contribution

freeze(self, data: Union[Resource, List[Resource]]) -> None adds a revision value to all referenced resources. To do that it require to have a non null _store_metadata field present for every referenced resource. If not then the following error is thrown:

<action> _freeze_one
<succeeded> False
<error> AttributeError: 'NoneType' object has no attribute '_rev'

Dataset.add_generation(self, **kwargs) -> None and Dataset.add_contribution(self, agent_id: str, **kwargs) -> None are not requiring or keeping a _store_metadata field.

Add support for returning resolvers as an array

The current implementation of forge.resolvers() only prints the available resolvers. It would be more useful code-wise to also be able to return the available resolvers as an array.

Add validation support for multiple types

The current implementation of .validation() expects a type attribute in the validated resource. It would be useful to be able to validate against multiple types.