Coder Social home page Coder Social logo

strawberryfield's People

Contributors

alliomeria avatar diegopino avatar favenzio avatar giancarlobi avatar ksuquix avatar marlo-longley avatar patdunlavey avatar pcambra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

strawberryfield's Issues

Keep track of the User generating a Metadata change inside strawberryfield

USE CASE

we right now do a good initial attempt on keeping track of provenance (not sure if the right word) of data inside a strawberryfield by adding this

"as:generator": {
        "type": "Update",
        "actor": {
            "url": "http:\/\/localhost:8001\/form\/descriptive-metadata",
            "name": "descriptive_metadata",
            "type": "Service"
        },
        "endTime": "2019-04-15T15:30:33-04:00",
        "summary": "Generator",
        "@context": "https:\/\/www.w3.org\/ns\/activitystreams"
    },

But we are depending on the Drupal's node data (uid) to keep track of who did (human?, machine?) the actual action of updating. We need to make sure we allow a new structure that fits https://www.w3.org/TR/activitystreams-vocabulary/#actor-typesto keep track of the user and bring significant data into that structure so the info becomes a bit more independent of the current Drupal instance (e.g don't use uid:1, makes little sense out of this D8 context)

QUESTION

What data about an user or API or machine interaction is good enough to make the JSON self sustainable?

Tagging here @kllhwang since she came with this very very needed use case. Thanks!

Super new idea: the file voucher

What is this?

Late night idea i came up with! So, imagine you have a 1TB file you need to attach to an ADO. Imagine that 1TB file is a WARC, or a huge movie, who knows, all your childhood love letters in 600dpi TIFFS and PDFS zipped. Who cares. Well the owner cares. OK, the fact is those are large files. You for sure don't want to upload that via the UI. But you can use the multipart upload system of S3, drag and drop it directly into min.io or even FTP it. Let's say you uploaded it. Now how in the world will you attach it to your to be born ADO when filling up a webform? Drupal only allows you to upload files. And selecting files from Drupal (when you have a million...) is not such a good idea, like via a Autocomplete..(but could be done..), but there is this idea...

The File Voucher makes its debut!

Idea is: when you drop such a large file into S3, S3 tells Drupal, hey friend, there is a new file, and Drupal generates, a one time, voucher for you. Voucher is of course digital, a tiny unique hash or number. We can setup different folders per user or role in S3 and depending on where things get dropped, via a webhook, Drupal gets notified and Drupal notifies the user (via a Drupal message and an alert/view in their user profile). Now when creating the new ADO, the user instead of uploading a file, or pointing to a hard to remember URI (maybe the user does not even know the URI right?) just adds that voucher (can select from his available ones or paste one someone send him to a special field). And all the rest, attaching, classifying, JSON-ifying etc happens magic-ly. The Voucher expires and now the file becomes a fully Drupal driven entity.

I know i make this sound like incredible when its just so simple stuff. But guess what. Nobody has this! And its just super useful. Admins can upload large files at night and then pass vouchers to Metadata people. Vouchers can be resolved before being used so the user knows what the thing is.

Saturday and late. But i had this idea a few days ago and needed to bring it somewhere. Hope we get this done, sounds like a good use case!

@giancarlobi @marlo-longley maybe this makes sense to you

New JSON Key providers needed

What is needed?

As we move forward and our SBF JSON becomes more rich, we should start thinking and coding new types of JSON Key Provider Plugin implementations. This is related to #33 but goes beyond.

The ones i want and need:

  1. An Entity Reference Property. This is extremely useful to Index in Solr other nodes and entities referenced from inside the JSON to allow Drupal to see them natively. Specially since our JSON graphs are directed, means many times Parents will point to Children, but also on Collection Membership, we want Collection Descriptions to lead, on search, to children.

The implementation is quite simple:

ContentEntityBase provides already a method:

\Drupal\Core\Entity\ContentEntityBase::referencedEntities

Which goes field by field checking for EntityReference Properties

 /**
   * {@inheritdoc}
   */
  public function referencedEntities() {
    $referenced_entities = [];

    // Gather a list of referenced entities.
    foreach ($this->getFields() as $field_items) {
      foreach ($field_items as $field_item) {
        // Loop over all properties of a field item.
        foreach ($field_item->getProperties(TRUE) as $property) {
          if ($property instanceof EntityReference && $entity = $property->getValue()) {
            $referenced_entities[] = $entity;
          }
        }
      }
    }

    return $referenced_entities;
  }

So what we need is a JSON key provider that exposes a set (one or more) JSON property values (node ids for example ) as \EntityReference class properties.

We have at least two ways of providing the JSON keys as arguments:

First, automatic, by using the new "ap:entitymapping": [] key we preprocess (or should because webform maintainer dismissed my pull request for that...gosh)

Or by allowing people simply to type the keys (hopefully in this case a full JSON Path?) that contain entity references. Example are ismemberof, scene, etc.
With that Solr will allow us to co-index those referenced entities values, like their labels, etc.

  1. Make use of our SBF Vocabulary generator to automatically expose all Keys we are SURE will exist across the repository. Basically, if you go to http://localhost:8001/admin/structure/taxonomy/manage/strawberryfield_voc_id/overview
    All those vocabularies were generated by content indexed. So, we should allow people to use that directly too, instead of choosing, adding manually keys.

Here is how i envision that:

Only keys that should be exposed are the leafs of a branch.

So if we have :

  • as:document.*
    • checksum
    • crypHashFunc
    • dr:fid
    • dr:for
    • dr:uuid

What we want to expose is as:document.*.checksum for example, which is really just the value of what is inside .checksum in that hierarchy. That seems also straightforward to do, logic would be

3.- I want an aggregator KeyName provider, one that takes a few different keys from all over the JSON and unites them in a single property to JSON. The UI for that could be a little bit more cumbersome, and thinking loud, it could be even working on Properties we are already exposing via the other KeyName Providers? Or do you think we should keep this one at the same level? Same level means less dependencies.. that is good. After process, means a different level, means the keys can be selected, instead of typed by the user.
The need for this is: get all referenced external URLS around the JSON and put then inside a single Solr field named URIS.

Logic here is simple

This plugin takes a bunch of keys, accumulates the values from all of them and then exposes all under a single, different Key name.

Questions:

Do we want to name, prefix, fields coming from a given KeyName provider differently so people can deduce who exposed them?
Ideas?

@giancarlobi @marlo-longley

Verbosity is good. Silence is also nice: Make SBF generated messages an option

What? Do we say too much? What? WE?

Sometimes, specially me. But this is not about me speaking, its about SBF and the event generators being quite verbose on all operations that are happening.

We want to have a permission that enabled/disables that so users that do not need to know anything about File Persistence, etc, are not overwhelmed but our excessive (but needed many times) verbosity.

This requires
1.- A new permission so we are going to suggest any extenders/implementers of custom Event Subscribers(if any) use this

if ($this->account->hasPermission('display strawberry messages'))

before deciding to make the user happy with any new messages.

All EventSubscribers we implement will have that check

Add missing schema

What is the issue?

Missing schema for strawberry_textarea widget.

This was the only missing schema I found directly related to the strawberryfield repo but we can leave open this issue until all the schema errors are resolved.

Automatic File ordering on Ingest/Update

What is what we want?

Related to esmero/webform_strawberryfield#12 (but here since i want this to be part of src/StrawberryfieldFilePersisterService.php

One of the most common use cases is people uploading images that already have some type of naming convention related to a given order/sequence.

This is a quick first step that requires more code to be fully compliant with esmero/webform_strawberryfield#12 (comment), basically this won't respect a manual reorder, since it will always apply and re-apply this based on given file names. Still, it solves the most common and missing need. Give images a default order based on file name uploaded.

Some ideas on how to move forward:

1.- Only do this if no previous order: Which adds the following problem. What if someone adds during an edit an new File and there is already an order? Do we simply add the new image to the end? What if the order that is there was not manually given but automatic already? How do we do that differentiation?

2.- Have a key that allows overrides. Add an extra key that defines what order was applied. like 'sequence_type': 'natural' or 'manual' ?
If manual is present, don't touch. Still the question persist on what to do when adding a new one. Simply add the highest order +1 ? And let the user then, manually decide what to do? Could be a way

3.- There are sure other ordering issues i have not considered.

To be honest, i don't totally like the idea of another key inside the as:image etc structures. So maybe we can also do the following!. In case of manual order, why don't we add a fully new structure which acts as a ToC for that list of files? That would allow any arbitrary order, but still would allow to have, always, a natural ordered sequence like the one we generate automatically.

@giancarlobi @mitchellkeaney @marlo-longley i would love your opinion, hopefully any of you can in the few days think about use cases, edge cases and UIs to deal with this. Thanks

This is also related to esmero/webform_strawberryfield#12

New feature for 1.0.0 : Metadata patching via VBO

Being VBO Views Batch Operations this features is quite simple. This module needs to provide a few action plugins. A base one which allows strings to be replaced by other strings in the JSON.

As simple as that, with an exception: that the end result needs to be a valid JSON before and i would love to start using more powerful options since we are JSON fans.

JSON Patching and also JSON Diffs. Why? Because the order of things in JSON can not be ensured when dealing with properties, but also because a JSON Patch allows greater complexity. My main concern is the interface. Like i would love to have a Webform similar (if not the webform itself) to apply a change to a certain field and through that built the JSON Patch.

Also we have help for that. https://github.com/swaggest/json-diff

ACLs and ADO to ADO (Nodes bearing Strawberryfield) Grants inheritance

Good Morning

If you ever had time to read through my Roadmap feature list you will have notice probably that i have planned ACL/access-control-list integration on our Digital Objects in Archipelago. ACL is fundamental for our IR needs to. And it goes way over just nodes, but also files (yeah, let's not speak about media here, since we use file entities).

What Drupal provides with its users and permissions per roles is closer to something named RBAC/Role based access control but we need more, fain grained, and UI/UX simpler to apply rules but also inheritance (and to be able to control how inheritance works and when it applies) that does not imply a batch operation to copy permissions from parent to child as e.g Islandora 7 does. That is expensive and not in our spirit.

I read this good post from @rosiel today during my coffee and i feel that speaks a little bit about how complex permission system and UI/UX in a hierarchical structure/Drupal can be and what expectations from users differ. I wish i had that type of feedback here! So will humbly borrow that post to get started

What is my actual proposal.

Thank to our own 'ahead' planning we have some things in place already to help with this. We lack UI of course because everyone hates to write Forms, but we will get there. I will enumerate my idea

  • Every Archipelago Digital Object (means any Node bearing a Strawberryfield) will/can have an ACL.

    • OPTION 1: Attached/logically to the Same SBF but hidden from users. Filtering out keys from our main SBF is easy.
      • PROS: easier to manage for us, but requires field/wide hiding. Also permissions for the field under the Drupal scheme will apply to ACL itself. Not a big issue since we can swap users on important access operations just to read/access the ACL. Also File deposit is easier, means
      • CONS: mixes metadata / and functional access/metadata. Not sure if i like that conceptually. Also, what if some funky use case implies 2+ Strawberryfields? We tell people they don't need it, but we can't tell them what to do. So, in case the ACL is INSIDE a SBF, it would only apply to that SBF. Not sure i like that.
    • OPTION 2: Either a dynamic/virtual field, a sub type of SBF field type or both (a virtual field of that type) that is attached to every Strawberry bearing (Archipelago Digital Object) that contains and exposed the ACL. This has more potential. We already do this for the virtual dropbox files field. This would also allow us to code direct access to internal ACI (access control item/ each permission/control inside an ACL) as field properties.
      • PROS: at this level affects the whole ADO. I like that, and the ACL/ACI can target specific SBF field instances if there are many, and even deltas
      • CONS: Can make things a tiny little slower? Also requires more coding. WE need widgets and formatters and permissions and probably a subclass for our Plugin driven SBF field sub type.
  • In any case, the body, the ACLs are written and defined in JSON. Means the same logic we use for loading/reading/formatting/parsing and making accessible (and yes, we could even use them in Solr as we do with metadata now)

  • We will use something named Entity Access Grants (actually Node Access Grants). Cool thing is that they are way more powerful than hook able permissions or fixed/in code permissions. The later only trigger on/per Route (means you won't get the effect you want on, e.g a View, only on the canonical url of a Node, for context in our case at /do/uuid). We use relationships and autocompletes in many places (like collection membership or any other node to node relationship you can add/edit via some of our webform elements and we need those decisions, of access and visibility to apply there too. So this is how grants will work

    • GRANTS apply to CRUD. The result is basically a YES/NO to CRUD operations. On each grant invocation/ we will evaluate two options:
      • Does the ADO contain its own ACL? Yes, parse and evaluate the conditions. Only evaluate for the current operation or for every operation? I need think about that.
      • Does the ADO contain a ADO to ADO (NODE to NODE) relationship in its SBF JSON and that relationship (global config option!), lets say memberOf/see raw metadata here is under our list of 'ACL inheritance?' enforced properties. If yes, read the inheritance chain, node top up to node until you find a root or no more inheritance. Cache this whole thing. Why cache? Since the ACL is not specific to the current Node pretty sure the ACL evaluation can be reused of any other memberOf in the same situation. Cache gets cleared on any Chained Node change (because we can tag the cache using the traversed node keys). What if the Node is member of many? We can do some config there too. Define if inheritance will use the first, last relationship (JSON, so we can use the order!) or all. Of course all is more expensive.
    • Mix/Overlay all accumulated Evaluations for the current Operation and current Node (this could even be done in the previous step to cache the chained ones that are inherited, anyway..). Respond/ YES/NO to the operation. Done.
  • Finally, make an UI/UX to build an ACL.

    • I want to add also specific to Archipelago permissions in the chain, that are not! part of the Node Grant System evaluation, but will be used to define /read access to particular metadata/JSON elements. Means we can hide (e.g embargo) JSON keys and hide them form certain user if needed.
  • I also want some global configs here. Like which properties (ismemberof, partOf, relatedTo are considered ACL inheritance. This also allows us to have specific to ACL only properties. Like 'inheritsACL'. Not bad? But i also don't know if i would set that as default, feels like reusing Semantically ones could make more sense.

Caching here is quite important, since a deep nested evaluation will be expensive, and some taxonomy based systems (which are a differently labeled RBAC implementations) already make accessing a Node super slow and also, extremely deployment specific.

Also related to this post (crickets i know, but i got some shay private messages... gosh) https://groups.google.com/forum/#!topic/archipelago-commons/MQHUxU3_9wA

Who can be interested?

@giancarlobi ping! @rosiel ๐Ÿ˜Š hopefully my mention here does not add stress to your wok. My intent is the opposite, i feel there could be some ideas here you could find useful for your community project work but also because i highly respect and appreciate your comments and feedback. If you feel this is not your thing and you don't want any mentions i can edit the post and remove those. In any case thanks!

Notes and comments

Side note: This work goes into Beta3. I already started with the Node access grants and a quite simplistic demo ACL (for now fixed during testing) until we agree where the ACLs will be saved and what cool operations we want to allow.

Additional resources: This is a good example of how S3 uses ACLs. Since we do S3 everywhere and Min.io uses them too, we can also reuse learning curves. Please look at the JSON examples

  • https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html
    You will also see that the 'resource' key (which means what is affected by an ACL) is a property path... who said property path?? more over, who said JSON PATH?
    "Resource":["arn:aws:s3:::examplebucket/*"],
    Means we can even do that. Use the property path for a given entiy:node:uuid:descriptive_metadata:someproperty as we use in Drupal 8/9 typed data and limit access to tiny little pieces in a single string. Not sure i will we able to code this for Beta3, but i can try.

Make Pre Save Event Subscribers respect priorities

What is new here?

Just a silly error. I was using self:: to reference the $priority each Subscriber class derived from /src/EventSubscriber/StrawberryfieldEventPresaveSubscriber.php was given. self on static properties does not late bind, which means i was always using the original priority defined in the abtract class instead of the one of the derived. This really did not affect anything here, but once i started getting more picky about the order and deriving in other modules i found this problem. Sad!

Solution: use static::

Make deepness of your Soul (kidding, of JSON) encoding/decoding a setting

See https://github.com/esmero/strawberryfield/search?q=json_encode+OR+json_decode&type=

WE do a lot of JSON decoding and encoding because well, Strawberryfield is JSON. So, what happens if you depend on this, you decide setting of 10 is correct (for memory/ performance and because at my age you feel you smart) and then you code a thing that IMPORTS Multi hierarchical EAD V3 into strawberryfield? You break the logic. Yes! And you cry.

So. There are a few ways of going going about this:

  • Don't be too smart and let the default (512 in PHP for JSON) be the number
  • Be Smart and allow System Admins to Raise the Number we set (and we can tell them what happened when it happens so they know) to 512 or even higher, maybe they are running a Genomics Cluster and want this. (have seen it, been there)

Remember kids that JSON field can hold the shy number of 2 Gbytes of RAW JSON.

This requires:
A) An Advanced settings Form
B) An Alert that should happen either on Pre Save or on a Webform
C) a Failsafe, means a lot of other initialized arguments

In the meantime, for this Edgie Use case i'm pulling an extra 40 deepness, which is 50, until we get this other larger work done (larger in the sense of more than a few minutes)

Add an Automatic Node Title setter Subscriber based on SBF metadata

What is needed?

We want to allow users to hide the Node Title widget/Field during ingest/edit completely and use, as source for the required node->title a value coming from SBF.
To accomplish so, and following our own way of coding we want to generate an Event Pre Save Subscriber (extending one of our base classes) that in the absence of such Node property sets its value from the SBF metadata. Drupal, strangely enough allows a Title field to be hidden in a Form Mode, but can not handle a Node CRUD operation if the value is not there (White screen with a Constraint error).

This subscriber is quite basic, and will required, to be stronger additional logic coming from the webform_strawberryfield module to, in case of being hidden, but set (e.g edit) to unset the value, so this Subscriber can override it. But that is another ISSUE:

Side note: Drupal's Content Entity Labels/Titles are limited to 128 Characters, quite not suited for metadata, so we will apply a truncating function. Still, the SBF, longer and richer Title will be available and can be used for Solr Indexing and display if wanted.

@giancarlobi @mitchellkeaney @marlo-longley

Make breadcrumbs semantic aware, flexible and fast

What is that i want?

Everybody knows and loves breadcrumbs. But hey. In semantic, linked data world they are like looking trough a key lock the whole reality. Simple use case everyone knows is Collection/member. But that is not all, E.g, an ADO can be part of two collections, or can be connected to other objects via isrelatedto, partof, sameas, etc, etc. All those things are relevant when thinking about a breadcrumb that is useful. People should be able to check which ones they want in the breadcrumb

So what i want?

Breadcrumbs that read from a list of JSON keys (you define which keys, we could even tag our JMESPATHs that are referencing NODES as such), then fast traverse (i'm quite good at traversing graphs) and accumulate not a single hierarchy but a list of parents. And so on. Direction of the relationship is important. We can do this via direct entityQueries (slow but consistent) or we can use Solr to drive this, fast but requires setting each predicate that people want as Fields, so more config.

UI is tricky. How to draw this tree (its not a cylic graph) in a way that still makes sense to people? Maybe show only one path by default and expand, via JS to more (if there is more) on over any node in this tree? Maybe color coded or prefixed with an Unicode character? having this would be lovely.

JSON:API Drush command wrapper

What is this?

Just a way of helping people and myself to ingest/patch NODES faster via the JSONAPI.
I will add a drush command that allows an arbitrary path/wildcard/filename to be passed as attachement, a JSON file to be passed as payload and credentials. Drush command will run all the needed JSONAPI calls, double encode, fetch responses and stuff to ingesting new objects via the API is simpler and less convoluted.

Fix Solr Filters double quotes / Drupal Search API has not the same fancy needs we repo people have

What is this and why are you touching Drupal Solr Search API Diego? Its already perfect!

Well yes, it is a good piece of complex software. But it is not perfect. Hold my beer here:

The story is this, when building/perfecting your Solr Search Driven Views you may come into realization that you want to use "Wildcards" for a filter

Here is one example:

  • i want to exclude a certain Field. Means only return Nodes where the field is not present. It's not about the value, it's about if it's there or not. Solr does not keep a 1:1 field count for every Document.

  • Under that scenario let's say we want this:
    fq[0]=-mimetype:[* TO *] or the newer fq[0]=-mimetype:* if single value/Solr7/8

So what does Drupal View Conditional Filters do?
fq[0]=-mimetype:["*" TO "*"] or the newer fq[0]=-mimetype:"*" if single value/Solr7/8

Look closer (while holding my beer) do you see the double quotes? Yes. It double quotes everything. But in Solr * is a totally different thing that "*" and so our Repo/Existing use case is gone/

image

This is a debug

So.. how do we go on fixing this?

  • [X ] Make a patch for Drupal Search API Solr Module and discuss/wait/iterate. We will even if it takes for ever but also
  • Make a query alter in SBF and a toggle (add a Config option in our Solr Config Form) The alter is quite simple: if the fq[] contains a single *, or a [ something TO *] or a string value followed by a * and after removing the * we do not need to escape anything (means the user was smart to escape) then we assume the * really means * (as it supposed to do, who searches for * as a string anyway? Who keeps a repo of stars and asterisks that need to be string matched...!!!) OR
  • Super fancy i code a new Filter Plugin for this use case that allows all the Solr specific filters to work (there are many more than just *)

@giancarlobi @alliomeria any other user lurking that has an opinion?
Also. great to have repositories with lots of data already running! (even if a few can be shared) because this could have not been noticed if not!

Provide a Vocabulary autocomplete widget

What is needed?

We will need this to allow other parts of Archipelago to read JSONPaths provided by our Vocabulary builder as input for QA/Find and Replace and Properties exposure via JSON Key Name Providers

Notes

It should hopefully allow to also setup only "leaf" elements of the whole vocabulary hierarchy if needed

Make Title setter Event Subscriber more robust

Related to #55

What is needed?

Seems like the title setter has some edge conditions, when there the element in the webform that sets the title is hidden but the main one not, since on new Element we are forcing the setting of a new title we end with our default one, which is generic. Bad.

Fix is simple, just add some checks. I'm also moving from $entity->getTitle() to $entity->label(). Just in case. Will make the branch and test around a little bit.

Expose individual Field properties from JSON to Drupal Views natively

@Favenzio @bryjbrown i started serious research on this last week. Since Solr version proof of concept of this is working and the new JSON flattener options are quite appealing ( and can be also tested using this GIST), next natural way, and the one that is you are expecting, is to mix traditional Views exposed fields and Strawberryfield internal properties coming from JSON.

So, here is the way (TUT) and it is as Drupal as it gets. So no fear!

https://www.lullabot.com/articles/building-views-query-plugins-for-drupal-8-part-2

It is very simple code, but requires some testing and debugging and also some SQL magic, means basically i need you/me to explore this with real data

The ideas are

I will probably take over this after DLF2018 but if you are up to experimenting a bit, please feel free to share your ideas and thoughts.

This is totally not urgent since Solr Views implementation is working fine and in 70% of the cases it will be faster, but this approach has the benefit of the hierarchies. Means we could simply add an argument to the field that is in the shape of a property path like [@graph.*.name] and then join with others inside the same JSON strawberryfield value.

Smart logic JSON Key providers to index cool stuff into Solr

What is that i'm proposing?

After my incursion on deploying a test IR with a lot of data, files, different media and IR needs of course this week (went well, so nice, learned a lot) i decided its time to bring some extra logic into our JSON Key providers
FYI: if you don't know what a JSON Key provider is that is Ok, its a plugin system i wrote that allows to dynamically expose internal data, keys and values from our SBF JSON to Drupal in a native to Drupal way. Which allows Drupal to index into Solr or expose to any other code like Tokens, all our deep, complex and evolving and changing JSON richness. And we have a few cool strategies, from simply "take this json KEY and put the value visible under this property" to query the JSON using JMESPATH and join many values from different places. OK, enough background (also ping to @aliomeria here, new in the block, time to subscribe to this repo)

Things i want

1.- Parser/logic processor. Basically one that allows data to be extracted via logic. and returned as an arbitrary key. Why?
Let's say i have LoD People in my metadata. A lot of them. Some have different roles, some are students others are Faculty, others are from a different place/institution. I want to have different facets so people can search/filter by Professors, or students only. With an extra processor (Twig template again, but stricter and shorter, i can even limit the size of the template) i can make some decisions, and even if do things like "Oh no, no student mentioned in the workds, lets add an extra value that says "No student was involved nor harmed" to the facet. data that was never there, we just expose it to the discovery. The archipelago dream made truth. This code is actually simple

2.- A chameleon processor. Which allows me to take on REAL drupal field class (lets say its the GEO one) and shove programatically data from our JSON and, wait for it, also shove programatically the "complex data" type into the code. This allows us to make Drupal thing we have data coming from one of those fields and makes community contributed code work with our chamaleons. This is actually simpler than you think, since instead of making a JSON Key processor, i can create a Copyfield processor at the entity level. Issue i see sometimes in Drupal8/9 is that most of the code people write is totally not aware of computed fields. I had to fix a few quite popular modules because all is made only for the most common use case, bad bad coding

See also #6 for my 3. Entity casting/reference Fields. We use open semantic here, we want every memberof, ispartof, etc, if they have either an ID or an UUID to be casted as Drupal entities. That way we can create deeper hierarchies and index the full paths into Solr.

@giancarlobi hope you around and all is well. Any ideas on this?

Entity UUID Converter for Routes that start with /do/

What?

All our ADOs are using yoursite.edu/do/uuid as canonical path to access instead of the non sense /node/1 thing. Right now that is being done via Aliases. We need and we want UUIDs , right now it seems easier to have path alias programmed to do that. We do it and it works. But! Path alias does not expand, existing aliased paths for subpaths.

So we need a final solution (never final i know) that deals with any Content Entity under the /do idea because things like /do/uuid/metadata/iiifmanifest does not resolved automatically. Since our metadata exposed endpoints are dynamic (good!) we want to access them also via the uuid path.

I opened this originally inside here esmero/format_strawberryfield#25
but now i'm clear the right way of doing this is to create a Resolver Class/event subscriber that is, based on certain conditions, upcast UUIDS and convert them into Loaded Entities. Something in the tone of https://git.drupalcode.org/project/jsonapi/blob/8.x-2.x/src/ParamConverter/EntityUuidConverter.php but simpler, just for us.

Working on that. Will take me more days but on the right track.

500 Error When Saving Strawberry Field Settings

When going through the process of creating a content type with a strawberry field you get an error when saving the field settings "The website encountered an unexpected error. Please try again later." The Apache error log shows the following:

AH01071: Got error 'PHP message: Uncaught PHP Exception Symfony\Component\DependencyInjection\Exception\ServiceNotFoundException: "You have requested a non-existent service "serializer"." at /var/www/drupalvm/drupal/web/core/lib/Drupal/Component/DependencyInjection/Container.php line 151\n', referer: http://drupalvm.test/admin/structure/types/manage/digital_object/fields/node.digital_object.field_strawberry_field_test

Facepalm (the emoji). Strawberry Flavor Datasource is not pushing every Document it has to Solr

What? Well i made a mistake... look, enjoy my sillyness

This method here is in charge of getting one SBFlavor Datasource ID and actually loading it and returning it so the the Solr Index Tracker can push it.

e.g given this ID:
"strawberryfield_flavor_datasource/2006:1:en:ac7fc929-a45e-4abc-9ff4-a6c35ec16c2f:ocr'
First part is the datasource strawberryfield_flavor_datasource second one is after the '/' is the info
-2006 the originating NODE ID
-1 The sequence, can be a Page Number, order Number, etc. Logic defines how that is been interpreted by a consumer.
-en The languange
-ac7fc929-a45e-4abc-9ff4-a6c35ec16c2f The uuid of the originating File entitty
-ocr The processor label that created this index entry

So what did I wrong? Other than coding late night and testing with a single Sequence and a single Processor most of it is correct but the failure is:

For a given set of pushed Strawberry Flavor Datasource Items I want to index

  • I get all the different Node ids (good)
  • I iterate over all Node ids to see if they exist (good, its a loadMultiple so ones that are not there simply get excluded) (still good)
  • For each Node I found i generate a single Solr document (what?) (not good) ๐Ÿคฆ (also why is my facepalm blonde...mmm.. I do not like this)

What I should have done (if not always falling asleep)

  • I get all the different Node ids (good)
  • I iterate over all Node ids to see if they exist (good, its a loadMultiple so ones that are not there simply get excluded) (good as I already explained)
  • For each Node, go back to my original structure and push everything that belong to that node into the Tracker. Can be many per Node.

Fix is simple but I would love to see also more work on this custom Datasource and make it more failsafe. We may need to push into the Solr index the Processor Plugin ID, the Processor Label and also a differentiated body (means HOCR will push the miniOCR into the special OCR field and to make sure all matches, that field should be named same as the Processor plugin id (in this case )

* @StrawberryRunnersPostProcessor(
 *    id = "ocr",

or the Binary, or the Warc one, etc.

Fix coming because all my goodness, I can not be so silly and code like this!
@giancarlobi @alliomeria please forgive me!

Make file structure creator and persister more robust, smarter and faster

Not robust enough?

Actually its quite robust, but while upgrading an old archipelago (0.9 ALPHA) to Beta2 i found some things i did not like (and i used to like or felt i could push into the future, but the future is here!) and i feel could be better:

Right now theas: structure generator is being triggered by the webform handler. That kinda defeats the purpose of all the other Event Driven Subscribers we have:

So what do we need?

A new pre-save subscriber that does exactly, or almost exactly, the same what the webform handler does:

We already generate this structure in the JSON

"ap:entitymapping": {
        "entity:file": [
            "images",
            "documents",
            "audios",
            "videos",
            "models"
        ]
    }

Means we know which keys will contain file ids.

Pseudo logic i imagine is:

Make as:filetype structure generation better and file persister way more failsafe

Some thing i don't like:
The filer persister service https://github.com/esmero/strawberryfield/blob/8.x-1.0-beta2/src/StrawberryfieldFilePersisterService.php#L155 here generates a new destination URI, normalized by us, for files that have no as structure yet. This has some benefits but can also be weird.

  • PROS: good to normalize, allows us to get also a new URL for old objects if we simply remove all existing as:image etc from the JSON. Also good and fast for new Files and Objects.
  • CONS: we are not checking if the file we want to give a new URL is not being used, and already referenced by, e.g, another ADO. So we need to actually check the usage. This should really only run for temporary files or files are only in use by a single ADO. For other use cases we should build a batch processor that updates all nodes referencing a given file. In those cases the only thing we need to update is the url json key of each file inside an as:something structure.
  • CONS: on actual file persisting, we are limiting the file moving to only temporary files. That defeats our own logic. On one site we renamed/give the file a consistent new name, on the other side if the file is already in use by the same or other ADO, we simply skip it. That could be cool if we where also giving the as:structure url key back the real path, but we keep ours. MEANS: we end with a wrong url in the JSON, that does not match where the file actually is saved. Bad. IF we remove the check for temporary then the file would be actually moved to the new URL (expensive but consistent) but then we would have to still deal with the fact that it could be in use by another ADO and in that case moving would be a bad idea.

Make faster code

All this has a lot of iterations, etc. Its not a terrible thing, i mean 2000 files? foreach 2000 times. You won't have 1000 users ingesting at the same time, so performance is still OK. But i would like to save some CPU cycles and group expensive operations together. I also would like to make some of the entity queries happen on sets smaller than that. All that optimization would/needs to happen in The filer persister service https://github.com/esmero/strawberryfield/blob/8.x-1.0-beta2/src/StrawberryfieldFilePersisterService.php

That is where we fetch all files, load them, classify them and route them into their as:structures, we read from what is already there to avoid md5 on files, etc.
Also, we could move logic that is for too many objects into a last pass on entity save (could be a batch process) and the threshold could be variable based on file size.

This code is complex (i have to admit i felt some WOW + pity for myself while reading it โ˜บ๏ธ) and tests will need to be thorough. Maybe time to test a 5000 files ADO and see how this performs. NOTICE: not saying 5000 files are a good idea.

@giancarlobi @marlo-longley pinging you since this is going to be good. Will require a lot of saving, messing up and writing some unit test too.

Enhance ADO Deposit. Add a JSON:API representation and also Revisions

Related to #37

Currently we deposit only one version of the Object. And we deposit RAW JSON and a full entity serialization.

  • Let's deposit every revision (and add a setting where people can decide if they want to delete this files when removing ADOs or not). Its just a matter of using \Drupal\strawberryfield\EventSubscriber\StrawberryfieldEventInsertSubscriberDepositDO in a class that extends \Drupal\strawberryfield\EventSubscriber\StrawberryfieldEventNewrevisionSubscriber and add the Revision id. to the file name. As simple as that.

  • But most important ,use the JSON:API extras approach to call the JSON:API serializer directly. See
    https://git.drupalcode.org/project/jsonapi_extras/blob/8.x-3.x/src/EntityToJsonApi.php
    Basically its just a wrapper, but a handy one that returns a RAW string. We can use that module (and make SBF depend on JSON:API and this one too, because, well, we like it and see that D9/10 will go that route fully also)

This is a super easy task, in case someone wants it.

Create config for the Solr field used in Type/View Mode mapping

What is needed?

Right now we can configure ADO by Type to map onto View Modes in Drupal.
See: /admin/config/archipelago/viewmode_mapping in an Archipelago instance.

Solr field for Type is fixed/hardcode currently.
See here.

Tasks:

  1. Make a config form that allows user to set which Solr Server, Index, and then Field will be used in the ADO Type/View Mode mapping.
  2. And then change the ViewModeMappingSettingsForm to draw from this config, rather than hardcode.

Notes

I need to think about naming for this. It's kind of clunky. I started using "Type to Solr Field" for the name of the form which probably isn't clear.

Error When Creating New Strawberry Field

I installed Strawberry Field Module in a brand new clean Drupal installation. It enabled without a problem but, when I tried to create a content type with a Strawberry field, an error was thrown:

"There was a problem creating field test_strawberry_field: Exception thrown while performing a schema update. SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'JSON NOT NULL, PRIMARY KEY (entity_id, deleted, delta, langcode), INDE' at line 8: CREATE TABLE {node__field_test_strawberry_field} ( bundle VARCHAR(128) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance', deleted TINYINT NOT NULL DEFAULT 0 COMMENT 'A boolean indicating whether this data item has been deleted', entity_id INT unsigned NOT NULL COMMENT 'The entity id this data is attached to', revision_id INT unsigned NOT NULL COMMENT 'The entity revision id this data is attached to', langcode VARCHAR(32) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL DEFAULT '' COMMENT 'The language code for this data item.', delta INT unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields', field_test_strawberry_field_value JSON NOT NULL, PRIMARY KEY (entity_id, deleted, delta, langcode), INDEX bundle (bundle), INDEX revision_id (revision_id) ) ENGINE = InnoDB DEFAULT CHARACTER SET utf8mb4 COMMENT 'Data storage for node field field_test_strawberry_field.'; Array ( ) "

React to our SBF Entity Delete Event

What?

We dispatch all type of cool events so other modules or this one can react, modify and do things based on what happened, but we have been waiting for stability and me coming back from vacations (and the wait is over!) to actually start removing referenced files (referenced inside a Strawberry field JSON) Drupal tracking. For those who don't know what that is: we track each file's usage inside a JSON via Drupal 8's file tracking capabilities. That way same file can be used in many place (try the "clone" option for and digital object and you will see!) but also nobody can delete them as long as they are in use. (safe, safe). Now we want to remove tracking when a Digital Object gets purged. Once a file is not tracked anymore, the next cron run will get rid of it.

How?

Simple as always. We extend one of our abstract classes that knows how to react to that delete event, we pass the deleted entity in the event, we check if there are files tracked for it, we remove them, then we do additional cleanup in case someone else is falsely tracking it too (Drupal's file managing is quite basic and prone to failure so we have to be double strict and double safe) and if there are false positives, we also remove those.

Pull coming.

show node Edit tab if user has access to any Form Mode

What is wrong?

Right now, if a user role doesn't have permissions for the default form mode, but does have access to another form mode, the Edit tab is not displayed when viewing a node.

For example, in a recent project, a "Contributor" role has access to the "Contributor" form mode, but not default. They can only see the "Delete" tab, but not "Edit" above a node.

How to fix

These tabs above the node are called local tasks in Drupal. See here:
https://www.drupal.org/docs/8/api/menu-api/providing-module-defined-local-tasks

Will create the following YML file: strawberryfield.links.task.yml and add code targeting nodes.
Another option is to use strawberryfield_menu_local_tasks_alter.

Question: should we exclude /allow exclusion of JSON elements in the automatic vocabulary generator?

Automatic Vocabulary generation is (in my opinion) the coolest (++factor) feature we have and is becoming almost 2 years old already. But, as cool as it is, we have not given it too much re-use across the stack.

Today while playing EAD V3 import (XML to JSON) via that new Widget i wrote, i found myself producing this vocabulary:

image

Which, ok. Makes sense, but in strictness is not "our vocabulary" but a particular one of a particular ingest, and we could have quite a lot of different schemas. This also applies to EXIF.

So question is: do we add a form/setting so certain to KEYS become excluded from vocabulary
and (hear me out here) also from the JSON KEY flattener? That one that would generate too much memory use to be useful if this goes too deep? i could exclude all the flv: prefixed vocabs, since EXIF tags are not THAT useful really in a vocab.

I know @giancarlobi understands how this works, wonder if @alliomeria knows this/has seen this vocab, builder in the Archipelagos that are accessible by its user and has an opinion?

Ideas? Opinion? Questions?

ACL and Access: inheritance between arbitrary relayed ADOs/Nodes bearing a SBF

Good Morning

If you ever had time to read through my Roadmap feature list you will have notice probably that i have planned ACL/access-control-list integration on our Digital Objects in Archipelago. ACL is fundamental for our IR needs to. And it goes way over just nodes, but also files (yeah, let's not speak about media here, since we use file entities).

What Drupal provides with its users and permissions per roles is closer to something named RBAC/Role based access control but we need more, fain grained, and UI/UX simpler to apply rules but also inheritance (and to be able to control how inheritance works and when it applies) that does not imply a batch operation to copy permissions from parent to child as e.g Islandora 7 does. That is expensive and not in our spirit.

I read this good post from @rosiel today during my coffee and i feel that speaks a little bit about how complex permission system and UI/UX in a hierarchical structure/Drupal can be and what expectations from users differ. I wish i had that type of feedback here! So will humbly borrow that post to get started

What is my actual proposal.

Thank to our own 'ahead' planning we have some things in place already to help with this. We lack UI of course because everyone hates to write Forms, but we will get there. I will enumerate my idea

  • Every Archipelago Digital Object (means any Node bearing a Strawberryfield) will/can have an ACL.

    • OPTION 1: Attached/logically to the Same SBF but hidden from users. Filtering out keys from our main SBF is easy.
      • PROS: easier to manage for us, but requires field/wide hiding. Also permissions for the field under the Drupal scheme will apply to ACL itself. Not a big issue since we can swap users on important access operations just to read/access the ACL. Also File deposit is easier, means
      • CONS: mixes metadata / and functional access/metadata. Not sure if i like that conceptually. Also, what if some funky use case implies 2+ Strawberryfields? We tell people they don't need it, but we can't tell them what to do. So, in case the ACL is INSIDE a SBF, it would only apply to that SBF. Not sure i like that.
    • OPTION 2: Either a dynamic/virtual field, a sub type of SBF field type or both (a virtual field of that type) that is attached to every Strawberry bearing (Archipelago Digital Object) that contains and exposed the ACL. This has more potential. We already do this for the virtual dropbox files field. This would also allow us to code direct access to internal ACI (access control item/ each permission/control inside an ACL) as field properties.
      • PROS: at this level affects the whole ADO. I like that, and the ACL/ACI can target specific SBF field instances if there are many, and even deltas
      • CONS: Can make things a tiny little slower? Also requires more coding. WE need widgets and formatters and permissions and probably a subclass for our Plugin driven SBF field sub type.
  • In any case, the body, the ACLs are written and defined in JSON. Means the same logic we use for loading/reading/formatting/parsing and making accessible (and yes, we could even use them in Solr as we do with metadata now)

  • We will use something named Entity Access Grants (actually Node Access Grants). Cool thing is that they are way more powerful than hook able permissions or fixed/in code permissions. The later only trigger on/per Route (means you won't get the effect you want on, e.g a View, only on the canonical url of a Node, for context in our case at /do/uuid). We use relationships and autocompletes in many places (like collection membership or any other node to node relationship you can add/edit via some of our webform elements and we need those decisions, of access and visibility to apply there too. So this is how grants will work

    • GRANTS apply to CRUD. The result is basically a YES/NO to CRUD operations. On each grant invocation/ we will evaluate two options:
      • Does the ADO contain its own ACL? Yes, parse and evaluate the conditions. Only evaluate for the current operation or for every operation? I need think about that.
      • Does the ADO contain a ADO to ADO (NODE to NODE) relationship in its SBF JSON and that relationship (global config option!), lets say memberOf/see raw metadata here is under our list of 'ACL inheritance?' enforced properties. If yes, read the inheritance chain, node top up to node until you find a root or no more inheritance. Cache this whole thing. Why cache? Since the ACL is not specific to the current Node pretty sure the ACL evaluation can be reused of any other memberOf in the same situation. Cache gets cleared on any Chained Node change (because we can tag the cache using the traversed node keys). What if the Node is member of many? We can do some config there too. Define if inheritance will use the first, last relationship (JSON, so we can use the order!) or all. Of course all is more expensive.
    • Mix/Overlay all accumulated Evaluations for the current Operation and current Node (this could even be done in the previous step to cache the chained ones that are inherited, anyway..). Respond/ YES/NO to the operation. Done.
  • Finally, make an UI/UX to build an ACL.

    • I want to add also specific to Archipelago permissions in the chain, that are not! part of the Node Grant System evaluation, but will be used to define /read access to particular metadata/JSON elements. Means we can hide (e.g embargo) JSON keys and hide them form certain user if needed.
  • I also want some global configs here. Like which properties (ismemberof, partOf, relatedTo are considered ACL inheritance. This also allows us to have specific to ACL only properties. Like 'inheritsACL'. Not bad? But i also don't know if i would set that as default, feels like reusing Semantically ones could make more sense.

Caching here is quite important, since a deep nested evaluation will be expensive, and some taxonomy based systems (which are a differently labeled RBAC implementations) already make accessing a Node super slow and also, extremely deployment specific.

Also related to this post (crickets i know, but i got some shay private messages... gosh) https://groups.google.com/forum/#!topic/archipelago-commons/MQHUxU3_9wA

Who can be interested?

@giancarlobi ping! @rosiel ๐Ÿ˜Š hopefully my mention here does not add stress to your wok. My intent is the opposite, i feel there could be some ideas here you could find useful for your community project work but also because i highly respect and appreciate your comments and feedback. If you feel this is not your thing and you don't want any mentions i can edit the post and remove those. In any case thanks!

Notes and comments

Side note: This work goes into Beta3. I already started with the Node access grants and a quite simplistic demo ACL (for now fixed during testing) until we agree where the ACLs will be saved and what cool operations we want to allow.

Additional resources: This is a good example of how S3 uses ACLs. Since we do S3 everywhere and Min.io uses them too, we can also reuse learning curves. Please look at the JSON examples

  • https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html
    You will also see that the 'resource' key (which means what is affected by an ACL) is a property path... who said property path?? more over, who said JSON PATH?
    "Resource":["arn:aws:s3:::examplebucket/*"],
    Means we can even do that. Use the property path for a given entiy:node:uuid:descriptive_metadata:someproperty as we use in Drupal 8/9 typed data and limit access to tiny little pieces in a single string. Not sure i will we able to code this for Beta3, but i can try.

Drush command needs to escape JSON properly for CURL

What the issue says. JSON (this is metadata diego boy) needs to be properly escaped before curl can use it as --data. It was working like 98% of the time except for that edge case... and here we are. Delaying release a Monday at 22:42. Get a life!

@giancarlobi this is the last piece of this and i will assume beta3 is done. All works, ingest of archipelago-recycables via DRUSH works and i have TON of configs to share for deployment. On it.

Create a Pass through Serializer/Normalizer

Use Case

When outputting via REST view a strawberry field JSON raw content or when using a Metadata Display based on Twig for the same type of field, Drupal double encodes and normalizes the content. This is Ok if the expected output is the whole, double encoded value (for sharing?) but limits us in building new and exciting apps, like OAI-PHM, or even a simple IIIF manifest using views.

Problem

Drupal serializers are only used to simple text values or type data item lists, but strawberry field, even when it can expose through its multiple properties that type of data has also a single ->value element containing an already in JSON format value. To be true to D8 serialization workflow, we need to add a new one that allows a passthrough and some mangling for our already in JSON value. We also need a way of totally passing through serialization if we want to allow field formatters to do that for us, which gives us a huge flexibility, like building full nested responses in any shape we want instead of depending on D8's perception of what data should look like.

Solution

1.- Handle strawberry fields normalizing as a new service attached to our class.
See https://www.drupal.org/docs/8/api/serialization-api/changing-the-way-serializer-handles-entities
2.- Allow a JSON passthrough serializer/normalizer and probably a new views display plugin extending RestExport able to deliver our rendered (via formatter) field with any interference. The idea here is to allow our field formatters to decide on the desired format, exposed HTTP header for Content type

By doing so we can truly extend D8's data exposing capabilities without coding.

XML to JSON Serialization

Use Case

Import existing XML metadata (EAD, MODS, etc) into a native JSON format for strawberryfield. This can be handy when dealing with external sources of migrations where we want to maintain existing data/schemas but cast into a more general JSON format to allow our webform system (https://github.com/esmero/webform_strawberryfield) to handle further editing/creation.

Problem

Given a simple XML like

<?xml version='1.0' standalone='yes'?>
<archdesc localtype="inventory" level="subgrp">
<did>
<head>Overview of the Records</head>
<repository label="Repository:">
<corpname>
<part>Minnesota Historical Society</part>
</corpname>
</repository>
<origination label="Creator:">
<corpname>
<part>Minnesota. Game and Fish Department</part>
</corpname>
</origination>
<unittitle label="Title:">Game laws violation records,</unittitle>
<unitdate label="Dates:">1908-1928</unitdate>
<abstract label="Abstract:">Records of prosecutions for and seizures of property resulting from violation of the state's hunting and fishing laws.</abstract>
<physdesc label="Quantity:">2.25 cu. ft. (7 v. and 1 folder in 3 boxes)</physdesc>
<physloc label="Location:">See Detailed Description section for box location</physloc>
</did>
</archdesc>

A PHP snippet of code like

$xml = simplexml_load_string($ead);
$json = json_encode($xml);
$array = json_decode($json,TRUE);

Would easily deal with XML to JSON and, if needed, to Array casting.

But:

For XML elements with @attributesand text values, JSON serializer will discard them totally ending in an array like

[unittitle] => Game laws violation records,
[unitdate] => 1908-1928

Solution

Deal with JSON serialization in the same way JSON-LD does using the @value key for the actual text value and a custom @attributekey or even a @typekey with a mapping @context that helps bring non semantic, from an XML schema coming, elements into an local context.

This implies:
1.- Build a decorator class for the JSON Serialization
2.- Subclass Simple XML Element Class
3.- Build a Composer aware PHP Library we can include in Strawberryfield

Potential Code and Discussion

This is a great way of dealing with XML and integrating our own code. This would allow us to also accommodate files already processed by other systems (migrate) or even be fed by external APIs and then cast via Twig to visualizations, index in our Solr, etc.

/**
 * Class JsonLDSimpleXMLElementDecorator
 *
 * Implement JsonSerializable for SimpleXMLElement as a Decorator with JSON-LD syntax
 */
class JsonLDSimpleXMLElementDecorator implements JsonSerializable
{
    const DEF_DEPTH = 512;
 
    private $options = ['@attributes' => TRUE, '@text' => TRUE, 'depth' => self::DEF_DEPTH];
 
    /**
     * @var SimpleXMLElement
     */
    private $subject;
 
    public function __construct(SimpleXMLElement $element, $useAttributes = TRUE, $useValue = TRUE, $depth = self::DEF_DEPTH) {
 
        $this->subject = $element;
 
        if (!is_null($useAttributes)) {
            $this->useAttributes($useAttributes);
        }
        if (!is_null($useValue)) {
            $this->useValue($useValue);
        }
        if (!is_null($depth)) {
            $this->setDepth($depth);
        }
    }
 
    public function useAttributes($bool) {
        $this->options['@attributes'] = (bool)$bool;
    }
 
    public function useValue($bool) {
        $this->options['@value'] = (bool)$bool;
    }
 
    public function setDepth($depth) {
        $this->options['depth'] = (int)max(0, $depth);
    }

    /**
     * Specify data which should be serialized to JSON
     *
     * @return mixed data which can be serialized by json_encode.
     */
    public function jsonSerialize() {
        $subject = $this->subject;
 
        $array = array();
 
        // json encode attributes if any.
        if ($this->options['@attributes']) {
            if ($attributes = $subject->attributes()) {
                $array['@attributes'] = array_map('strval', iterator_to_array($attributes));
            }
        }
 
        // traverse into children if applicable
        $children      = $subject;
        $this->options = (array)$this->options;
        $depth         = $this->options['depth'] - 1;
        if ($depth <= 0) {
            $children = [];
        }
 
        // json encode child elements if any. group on duplicate names as an array.
        foreach ($children as $name => $element) {
            /* @var SimpleXMLElement $element */
            $decorator          = new self($element);
            $decorator->options = ['depth' => $depth] + $this->options;
 
            if (isset($array[$name])) {
                if (!is_array($array[$name])) {
                    $array[$name] = [$array[$name]];
                }
                $array[$name][] = $decorator;
            } else {
                $array[$name] = $decorator;
            }
        }
 
        // json encode non-whitespace element simplexml text values.
        $text = trim($subject);
        if (strlen($text)) {
            if ($array) {
                $this->options['@value'] && $array['@value'] = $text;
            } else {
                $array = $text;
            }
        }
 
        // return empty elements as NULL (self-closing or empty tags)
        if (!$array) {
            $array = NULL;
        }
 
        return $array;
    }

Use would be

$xml = new SimpleXMLElement($ead);
$xml = new JsonLDSimpleXMLElementDecorator($xml, TRUE, TRUE, 3);
echo json_encode($xml, JSON_PRETTY_PRINT), "\n";

This code is adapted (a few single lines change really) https://hakre.wordpress.com/2013/07/10/simplexml-and-json-encode-in-php-part-iii-and-end/ and its pretty cool!

Webform integration

This will require that form elements allow/read/write the @attribute element, which can be generalized by the use of the custom JSON properties each Webform element can/could have.

Create Strawberry Field Documentation

At our NYC summit we came up with a set of requirements for Strawberry Field:

  • Ability to Add a Field of type "Strawberry Field" to Content Type.
  • Make JSON elements available to other Drupal modules.
  • Allow Admin users to specify for each instance of a Strawberry Field (field settings) alternate representations of JSON data.
  • Make alternate representations of JSON (JSON-LD) available to other modules.
  • Ability for Admin users to choose how to edit field:
    • Form user submitted from
    • "omni" form: All values hydrated
    • Raw JSON (indented/color coded would be nice)

At some point we will need to formalize these requirements and create documentation for module usage.

Expand Basic Metadata extracted for Files, in specific PDF and Clean EXIF History

What, more file mangling? yes!

This is a continuation of #86 and #87 which was merged.

The need

Better understanding of what is inside a PDF

Right now we are just getting general PDFinfo (single first page), which means in our metadata we only keep number of Pages (good) and IF even , a single page Dimension. Not cool for Rare books, complex displays in general and too simplistic to be honest when dealing with a IIIF Manifest generation we want to allow to work on Mirador and the Book reader since our implementation (also simplistic) of https://github.com/mozilla/pdf.js is a bit slow on large super large PDFs.
๐Ÿ‘€ @tomadams re:your email today

Solution. Simple. Get more Metadata. How?

Run PDF Info twice:
1.- get the pages as we do now
2.- then use the -f and -l arguments to get all the dimensions for all pages. Store that into an array and add to the JSON. 1000 pages, 1000 entries? May need to think about that but seems feasible, but could also go directly into SOLR same way we expect Text extraction, HOCR and entity extraction would happen per page (one Solr doc per page).

Use that data in the manifest and also rewrite our manifests. The one we have in play.archipelago.nyc is passing the IIIF V2 tests correctly, we need the same for IIIF V3.

Clean EXIF

Ok, still confused about this. @alliomeria may know better. Will put two Examples here: first one clean EXIF

https://play.archipelago.nyc/do/f4a4c6ee-4ce9-4b4c-8704-e8057bad0a7d

{
 "flv:exif": {
                "ISO": 100,
                "Flash": "No Flash",
                "Model": "RICOH THETA S",
                "Aperture": 2,
                "FileSize": "2.8 MB",
                "MIMEType": "image\/jpeg",
                "ImageSize": "5376x2688",
                "Sharpness": "Normal",
                "ColorSpace": "sRGB",
                "ImageWidth": 5376,
                "XMPToolkit": "RICOH THETA for iOS 2.14.0",
                "FocalLength": "1.3 mm",
                "ImageHeight": 2688,
                "GPSVersionID": "2.3.0.0",
                "MeteringMode": "Multi-segment",
                "ShutterSpeed": "1\/6400",
                "WhiteBalance": "Auto",
                "ProjectionType": "equirectangular",
                "GPSImgDirection": 270,
                "PoseRollDegrees": 0,
                "DateTimeOriginal": "2020:07:02 17:25:15",
                "PosePitchDegrees": 0,
                "UsePanoramaViewer": true,
                "GPSImgDirectionRef": "True North",
                "PoseHeadingDegrees": 0,
                "FullPanoWidthPixels": 5376,
                "CroppedAreaTopPixels": 0,
                "ExposureCompensation": 0,
                "FullPanoHeightPixels": 2688,
                "CroppedAreaLeftPixels": 0,
                "CroppedAreaImageWidthPixels": 5376,
                "CroppedAreaImageHeightPixels": 2688
            }
}

Unclean (see duplication because of changes history in the second PDF)
http://ec2-184-73-148-144.compute-1.amazonaws.com/do/018744ea-1d99-4d71-bd93-6cd402a82d74

PRESS HERE TO SEE ALL!
{
 "flv:exif": {
                "Title": "Basic RGB",
                "Format": "application\/pdf",
                "NPages": 1,
                "FileSize": "1934 kB",
                "FontFace": [
                    "Regular",
                    "Regular"
                ]
          }
}
{
 "flv:exif": {
                "Title": "Basic RGB",
                "Format": "application\/pdf",
                "NPages": 1,
                "FileSize": "1934 kB",
                "FontFace": [
                    "Regular",
                    "Regular"
                ],
                "FontName": [
                    "BebasNeue-Regular",
                    "MyriadPro-Regular"
                ],
                "FontType": [
                    "Open Type",
                    "Open Type"
                ],
                "MIMEType": "application\/pdf",
                "Producer": "Adobe PDF library 10.01",
                "PageCount": 1,
                "CreateDate": "2019:11:01 17:16:50-04:00",
                "DocumentID": "xmp.did:e88490b4-4350-2243-9e6a-e0e8a9092ec9",
                "FontFamily": [
                    "Bebas Neue",
                    "Myriad Pro"
                ],
                "InstanceID": "uuid:fac62424-48d5-4b85-84fd-49beb49d517c",
                "Linearized": "No",
                "ModifyDate": "2019:11:01 21:58:38-04:00",
                "PDFVersion": 1.5,
                "PlateNames": [
                    "Cyan",
                    "Magenta",
                    "Yellow",
                    "Black"
                ],
                "XMPToolkit": "Adobe XMP Core 5.6-c145 79.163499, 2018\/08\/13-16:40:22        ",
                "CreatorTool": "Adobe Illustrator CC 23.1 (Windows)",
                "FontVersion": [
                    "Version 2.000;PS 002.000;hotconv 1.0.88;makeotf.lib2.5.64775",
                    "Version 2.106;PS 2.000;hotconv 1.0.70;makeotf.lib2.5.58329"
                ],
                "HistoryWhen": [
                    "2019:10:30 12:21:32-04:00",
                    "2019:11:01 17:16:51-04:00"
                ],
                "FontFileName": [
                    13407,
                    "MyriadPro-Regular.otf"
                ],
                "MaxPageSizeH": 28,
                "MaxPageSizeW": 42,
                "MetadataDate": "2019:11:01 21:58:38-04:00",
                "FontComposite": [
                    false,
                    false
                ],
                "HistoryAction": [
                    "saved",
                    "saved"
                ],
                "CreatorVersion": 23,
                "HistoryChanged": [
                    "\/",
                    "\/"
                ],
                "RenditionClass": "proof:pdf",
                "StartupProfile": "Basic RGB",
                "ThumbnailWidth": 256,
                "MaxPageSizeUnit": "Inches",
                "SwatchGroupName": [
                    "Default Swatch Group",
                    "Cold",
                    "Grays"
                ],
                "SwatchGroupType": [
                    0,
                    0,
                    0
                ],
                "ThumbnailFormat": "JPEG",
                "ThumbnailHeight": 172,
                "ContainerVersion": 11,
                "ManifestLinkForm": [
                    "EmbedByReference",
                    "EmbedByReference"
                ],
                "HistoryInstanceID": [
                    "xmp.iid:615189d1-95dc-e64c-b838-2a31d901c875",
                    "xmp.iid:e88490b4-4350-2243-9e6a-e0e8a9092ec9"
                ],
                "SwatchColorantRed": [
                    255,
                    0,
                    255,
                    255,
                    0,
                    0,
                    0,
                    255,
                    192,
                    236,
                    240,
                    246,
                    250,
                    251,
                    216,
                    139,
                    57,
                    0,
                    0,
                    34,
                    0,
                    41,
                    0,
                    46,
                    27,
                    102,
                    146,
                    157,
                    211,
                    236,
                    198,
                    152,
                    115,
                    83,
                    197,
                    165,
                    139,
                    117,
                    96,
                    66,
                    101,
                    130,
                    185,
                    0,
                    26,
                    51,
                    77,
                    102,
                    128,
                    152,
                    178,
                    203,
                    229,
                    241
                ],
                "EmbeddedImageWidth": 381,
                "OriginalDocumentID": "uuid:9E3E5C9A8C81DB118734DB58FDDE4BA7",
                "SwatchColorantBlue": [
                    255,
                    0,
                    0,
                    0,
                    0,
                    255,
                    255,
                    255,
                    45,
                    36,
                    36,
                    30,
                    59,
                    33,
                    33,
                    63,
                    74,
                    69,
                    55,
                    115,
                    156,
                    225,
                    187,
                    145,
                    100,
                    144,
                    142,
                    93,
                    90,
                    121,
                    152,
                    117,
                    87,
                    65,
                    109,
                    82,
                    57,
                    36,
                    19,
                    11,
                    207,
                    196,
                    200,
                    0,
                    26,
                    51,
                    77,
                    102,
                    128,
                    152,
                    178,
                    203,
                    229,
                    241
                ],
                "SwatchColorantMode": [
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB",
                    "RGB"
                ],
                "SwatchColorantType": [
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS",
                    "PROCESS"
                ],
                "EmbeddedImageFilter": "FlateDecode",
                "EmbeddedImageHeight": 602,
                "HasVisibleOverprint": false,
                "IngredientsFilePath": [
                    "C:\\Users\\clee4\\Downloads\\mora_2018_figures\\Photos to Share\\CAD Images\\20191101-DSC00067.jpg",
                    "C:\\Users\\clee4\\Downloads\\mora_2018_figures\\Photos to Share\\Poster and Paper Figures\\coral-reef-drawing-10.png"
                ],
                "SwatchColorantGreen": [
                    255,
                    0,
                    0,
                    255,
                    255,
                    255,
                    0,
                    0,
                    39,
                    28,
                    90,
                    146,
                    175,
                    237,
                    223,
                    197,
                    180,
                    145,
                    104,
                    180,
                    168,
                    170,
                    113,
                    49,
                    20,
                    45,
                    39,
                    0,
                    20,
                    30,
                    177,
                    133,
                    99,
                    71,
                    155,
                    124,
                    98,
                    76,
                    56,
                    33,
                    199,
                    138,
                    154,
                    0,
                    26,
                    51,
                    77,
                    102,
                    128,
                    152,
                    178,
                    203,
                    229,
                    241
                ],
                "HistorySoftwareAgent": [
                    "Adobe Illustrator CC 22.1 (Windows)",
                    "Adobe Illustrator CC 23.1 (Windows)"
                ],
                "DerivedFromDocumentID": "xmp.did:f6f8d79a-3268-9b47-b8ca-5ae8fc53d04a",
                "DerivedFromInstanceID": "xmp.iid:f6f8d79a-3268-9b47-b8ca-5ae8fc53d04a",
                "IngredientsDocumentID": [
                    "xmp.did:24918461-5358-463f-8f02-8a25bbc0f753",
                    "adobe:docid:photoshop:80092d49-6ba7-f649-b041-a6d0af913b7f"
                ],
                "IngredientsInstanceID": [
                    "xmp.iid:24918461-5358-463f-8f02-8a25bbc0f753",
                    "xmp.iid:c964d0b5-30e9-854d-9422-d12453198b63"
                ],
                "HasVisibleTransparency": true,
                "EmbeddedImageColorSpace": [
                    "DeviceRGB",
                    "Indexed",
                    "DeviceRGB",
                    1,
                    "DeviceRGB"
                ],
                "SwatchColorantSwatchName": [
                    "White",
                    "Black",
                    "RGB Red",
                    "RGB Yellow",
                    "RGB Green",
                    "RGB Cyan",
                    "RGB Blue",
                    "RGB Magenta",
                    "R=193 G=39 B=45",
                    "R=237 G=28 B=36",
                    "R=241 G=90 B=36",
                    "R=247 G=147 B=30",
                    "R=251 G=176 B=59",
                    "R=252 G=238 B=33",
                    "R=217 G=224 B=33",
                    "R=140 G=198 B=63",
                    "R=57 G=181 B=74",
                    "R=0 G=146 B=69",
                    "R=0 G=104 B=55",
                    "R=34 G=181 B=115",
                    "R=0 G=169 B=157",
                    "R=41 G=171 B=226",
                    "R=0 G=113 B=188",
                    "R=46 G=49 B=146",
                    "R=27 G=20 B=100",
                    "R=102 G=45 B=145",
                    "R=147 G=39 B=143",
                    "R=158 G=0 B=93",
                    "R=212 G=20 B=90",
                    "R=237 G=30 B=121",
                    "R=199 G=178 B=153",
                    "R=153 G=134 B=117",
                    "R=115 G=99 B=87",
                    "R=83 G=71 B=65",
                    "R=198 G=156 B=109",
                    "R=166 G=124 B=82",
                    "R=140 G=98 B=57",
                    "R=117 G=76 B=36",
                    "R=96 G=56 B=19",
                    "R=66 G=33 B=11",
                    "C=56 M=0 Y=20 K=0",
                    "C=51 M=43 Y=0 K=0",
                    "C=26 M=41 Y=0 K=0",
                    "R=0 G=0 B=0",
                    "R=26 G=26 B=26",
                    "R=51 G=51 B=51",
                    "R=77 G=77 B=77",
                    "R=102 G=102 B=102",
                    "R=128 G=128 B=128",
                    "R=153 G=153 B=153",
                    "R=179 G=179 B=179",
                    "R=204 G=204 B=204",
                    "R=230 G=230 B=230",
                    "R=242 G=242 B=242"
                ],
                "DerivedFromRenditionClass": "proof:pdf",
                "ManifestReferenceFilePath": [
                    "C:\\Users\\clee4\\Downloads\\mora_2018_figures\\Photos to Share\\CAD Images\\20191101-DSC00067.jpg",
                    "C:\\Users\\clee4\\Downloads\\mora_2018_figures\\Photos to Share\\Poster and Paper Figures\\coral-reef-drawing-10.png"
                ],
                "ManifestReferenceDocumentID": [
                    "xmp.did:24918461-5358-463f-8f02-8a25bbc0f753",
                    "adobe:docid:photoshop:80092d49-6ba7-f649-b041-a6d0af913b7f"
                ],
                "ManifestReferenceInstanceID": [
                    "xmp.iid:24918461-5358-463f-8f02-8a25bbc0f753",
                    "xmp.iid:c964d0b5-30e9-854d-9422-d12453198b63"
                ],
                "DerivedFromOriginalDocumentID": "uuid:9E3E5C9A8C81DB118734DB58FDDE4BA7"
            }
}

Question is: Do we de-dup?, do we simply strip from EXIF a list of offenders? I mean i love the idea of indexing in Solr the Colorswatches, but its a lot, like really too much?

Improve File Destination Naming and Logic to handle Self Deposit Use cases

What is this my friends? Files? Storage? Persisting? TECHMD? Sounds like "preservation"? Scary. Not its not. But may intersect of course

Something we discussed today with @giancarlobi (and that affects that Self Deposit very specific use case but can because of that be extrapolated to more generic needs): we need a better way of making sure that Archipelago/Strawberry field has access to files always in the place it wants to/needs to so IIIF/Security/Access and Order (not global order, no worries no that type of order here). We do a pretty good job but there are always edge cases, and even a year or more ago we were too flexible and had files moving around and being renamed all over the place.

For those who do not know how our file persisting strategy works (same since the start of the project just getting smarter every day!, there are a few Event Subscribers/Data describing logics that happen in a certain order (@alliomeria for you also so we can make a tiny .MD file in the docs explaining this)

  1. User Uploads via a webform Element a new File or via Drush/Batch ingest that attaches (via JSONAPI) a file.
  2. If the webform is involved we act quickly and call directly (before the node even exists) our file classifier that will:
    2.1 add/complement a as:somefiletype JSON structure into the main ADO SBF JSON with info about the file, checksums, size, Drupal fids, uuid, etc. This is a heavy function part of the StrawberryfieldFilePersisterService. It does a lot, and I tried to optimize its logic but we may do more in the future to handle too many files/to big files needs (FYI: solution is simple, add to a queue and process later).
    2.2 The most (yes) important info added here is the desired future storage location of the file. '
  3. The user finishes the form, saves and and confirms the ADO creation, and finally all the NODE events fire.
  4. On presave StrawberryfieldEventPresaveSubscriberAsFileStructureGenerator runs and checks if 2.1 already was processed. This is needed since the user could have triggered an ingest via drush/JSONAPI/Webhooks etc. If all is well (this is a less expensive check) we continue.
  5. On presave (next) StrawberryfieldEventPresaveSubscriberFilePersister runs, checking all TEMPORARY files described in as:somefiletype and actually copying them to the right "desired" location
  6. and on Save StrawberryfieldEventInsertFileUsageUpdater also marking the file as "being" used by a Strawberry driven Node (different Event)

NOTE: Interesting to know (also for @alliomeria) is that anytime we remove directly/raw from the JSON a full as:somefiletype structure of a sub element from an as:structure we force Archipelago to do all the again, and we can regenerate technical Metadata. We have used this when updating EXIF binaries or even when something went wrong (while testing, this stuff is safe no worries). It works well and I will eventually add a BIG red button that does that if you do not like JSON editing.

Many other events trigger other things. But the key to understand this is:

1.- Archipelago (the wise) was acting always on "temporary" files here. temporary means $file->isTemporary() returns true == which means Drupal would eventually get rid of them if we do not act. They are tracked, they have a Drupal ID and UUID but not meant to survive a Cron run. Assigning a clean name and desired destination works based on that and copying them to that place also expect the file to be temporary. So why?

Logic was to not over process (file operations are expensive) but also not and not step over other modules and "ways" toes and move files some other entities/node/ADO may be referencing already. Archipelago allows files to be reused many times. All was fine (almost) until we added Self Deposits!!

In Self deposit situations we may allow Anonymous users to Upload a file and metadata (works great by the way). In those cases the file, when the submission ends gets taken and made permanent by the webform module and usage added to a webform submission. That immediately is a kill switch (๐Ÿ‘Ž) to all our logic and we leave the file in peace. Well nice of use but not good for IIIF or our needs of keeping the house clean (not my own for sure..).

What to do? Is the logic not made for this case? It was just too respectful and sometimes you (or your code) needs to step up and demand what is right. A place where we want a file to be.

Fix was not complex (already did it, now testing) but involves:

  • Allowing a "force == TRUE, get me a new desired destination even if permanent and if already Permanent file make it temporary for a few ms so the rest of the logic can act" but also a "check" if file is Permanent but the current location is not to be respected because its in a different storage than the global setup for archipelago OR its not "used" by our system but by any other that really would not mind having the file moved (trust me, many do not mind because its just a location update in the DB for the file entry). Guess what! It works.

Ok. I think the explanation is actually more than the CODE but is needed. Will make a pull later tonight.
What is next? (another pull)

  • @giancarlobi suggested/needs also: unmanaged files (Fedora 3 files). That will bring a new mapping in "ap:entitymapping", a new Webform upload and a new type of as:somefiletype sub structure that allows us to "mention/reference" these file without ever taking control of them, but still having enough data to act/do things with them.

  • @alliomeria ๐Ÿ‘€ Deposit directly to backend storage/Dropbox like which is really #76 (secretly hidden because I got crickets with the idea that time and it is still a great one!) Imagine you have a 2TB large VIDEO of your wedding (worth preserving memory?), there are better ways of uploading files than via the web browser. I promise its real. You upload your file via one of those (Multipart S3, FTP, who knows) and a manifest that comes with the Big Binary. Or a ZIP file with a Manifest (frictionless data package). The manifest contains a few cool keys (including a secret TOKEN!). Code comes with
    with a webform element (paste your Token, we connect everything) and a "ap:entitymapping" subset. Also manifest/ token generator Form in your User account so you can do the work AUTOMATICALLY! (one time use) and just attach. I guess we can even add that as an API later e.g for just the TOKEN. Hope someone else is reading this too since its a lot! And sounds so great.

Theme hints for ADOS (do/) and JSON key types

What is needed?

The default Drupal 8 theme hinting for node is kinda silly. it uses page-- or page--node some number like page--node--12.html.twig. We really think its cool that we are actually using NODES as ADOS and that we only need one Content Type (Digital Object) but also we want to allow site builders and theming folks to target pages via templates grouped by JSON key type (book, etc) and also specifically for a ADO (which bears a SBF).

Solution: find a simple way of add a theme hint for SBF bearing Content Types (easy, code is there) but also, for specific JSON types. That way we can give people close to design and aesthetics the chance to shine. Beta3 task

Make JSON-LD fetching for Strawberry Key Name provider plugin safer

The problem?

See esmero/archipelago-deployment#17

Sometimes Google or any other WebSem JSON-LD provider can forget to pay the bill and external services (to us..) can go out. We download now schema.org or basically any JSON-LD context used to feed initially our SBF properties directly from the web. In a perfect scenario that is OK. In reality, as demostrated today Dec 3rd 2019 via a Quote exceded message on Schema.org, that is not so true.

How to approach this?

We cache right now downloaded Contexts, but if there is nothing to download? One good option, when cache is not yet present is to allow a File fallback (yes a file!), means first check for a JSON-LD file locally that matches/was provided/generated by us. If not, try the remote option. And then remote option happens, we still download to file and keep local. Good, good.

Issue?

Well you could need to refresh your remote schema. And since file is already there we will keep using it. Simple solution is documentation. Teach people how to remove file. Better/other solution would be to expose that in the Plugin Config (Which requires some serious recoding because of how Config Entity saving happens on the Plugin form, and not in each Implementations). Another is auto expiring based on last modifications, last every 3 months or so. Still, in 3 months we could hit a blackout again... we can discuss this further.

Extract basic File Metadata on File Persister Service and push as flavor

What is this?

Basically a quick hack, don't do this at home kids for File Description extraction when dealing with ADOs/Archipelago Digital Objects with Small Binaries/not many Binaries and that don't have the luxury (yet) to have strawberry_runners doing all their cool ReactPHP Async background processing for you.

This adds a WIP Method to the filePersisterService that will process/move/to/tmp files, run exiftool and (UK) Pronom extraction (FIDO) and push back all the insane amount of data into the JSON.

What is also needed?

  • A check that says , hey! You are adding like 3 files, Ok, i can process in sync/realtime, but hey, those are 2000 pages of poetry and thoughts, no way i'm doing that, install/setup strawberry_runners for that, i will only do it for the first, 3? that are less than 100 Mbytes. OR so.

  • Move Checksumming which runs on the main method into this ::getBaseFileMetadata() (oh.. you have not seen the pull yet.. well, linked down!)

  • Add a Config Form for FIDO and EXIFTOOL. Future i want to use https://github.com/richardlehane/siegfried also

Enabled EAD3

Use case:

Use Finding Aids (EAD) in an Archipelago environment with Strawberryfield support. Focus on EAD3
https://github.com/saa-ead-roundtable/ead3-toolkit

Options

  • Attach an existing XML file to a Strawberry field the same way we are doing right now with Media.
    Requires: finish file content indexing/exposing for referenced Drupal files from a Strawberryfield.
  • Generate an EAD only JSON representation which could be as simple as XML to JSON encoding via PHP and fill a strawberryfield with that data. Allow Objects to be referenced by/to that EAD. We could need a new rdf @type (Finding Aid?)
    Requires: a predicate/json key that binds Nodes/Objects to from that EAD only Node
    Requires: a UI/UX that allows people to bind new/existing objects. Can be a Webform as we do with Collection Membership right now
    Requires: A JSON to EAD XML Twig template.

To be discussed

Finding aids in EAD are ancient technology but without any semantic aware replacement right now. Means they work and Archivists and Archival Systems relay heavily on them. Finding a solution could help bind Archival needs with Repository/access needs in a single simple to use solution. Should we focus, research on binding directly to/from Archivespace too?

BUG: json flattener is nesting too deep

Problem

Strawberryfield flattens on common keys the whole JSON Structure to be able to expose individual, deep nested values inside a common key. This is used for Drupal's interaction with Solr amongst many other use cases. This works in most of the cases as expected but, weirdly enough (my bad) it over nests items if a single key, contains multiple values in an non associative array.

The issue happens exactly here: https://github.com/esmero/strawberryfield/blob/master/src/Tools/StrawberryfieldJsonHelper.php#L136

function arrayToFlatCommonkeys(array $array, &$flat = array(), $jsonld = TRUE)

Is a recursive function that accumulates values it finds as it traverses the array hierarchy.

Solution

Well, i'm really bad at managing recursion in my brain. (i noticed it a bit late, like 41 years too late) so i have to debug this a little deeper. The gist is, basically making sure that the extra nesting does not happen under that circumstance. Since i can not know up front if the array i will get has more levels, and i also don't want to look the tree down in a recursive function (but then by being recursive i should not even care.. gosh recursion?) i need to assign values by either merging the values with existing ones or simply assigning if the "key" i'm accumulating was found for the first time.

@giancarlobi this is what you just saw in your Solr. Give me like 30-60 minutes and a few cups of coffee. I know i have this done correctly many times in the past, i just need to debug, debug and build some test cases.

Review our SBF Flavor Search API Source class

To be honest, not sure if this needs to be fixed/just reviewed or simply other modules are not good. But while testing (with success) PDF to text extraction by creating an S3 aware PDF to Text plugin i found that Search API was calling some methods we have not present in our special source. Because its not an content entity, just a data type. But then, i also noticed that we provide no Display Mode which means even if we store HOCR or any other flavor, we can not show it.

@giancarlobi you worked on this so maybe you have more ideas/questions. Any of this makes sense? I will share my findings later today

Save Image proportions or exif, pronom and PDF info in the as:image JSON structure

What is needed?

IIIF manifests in their 3.0-Draft version 'width' (finally) is not a requirement. That is the whole reason of having, e.g a info.json, so the Client can take the sizes/proportions from there. See https://iiif.io/api/presentation/3.0/#width

But because of CORS, some tiny bugs or implementation details we can not manage right now, Clients Viewers without that info tend to fail badly. In the past (our past is short) we used to hit the IIIF server's info.json to extract that and pass it directly to any client that needs it, like in the thumbnail processor.

Still, it could be nice, to have inside our File Persister service, a simple call to exif/pronom to fetch that data upfront. Not thinking necessarily about the whole exif data (like 148 fields i got the other day with a simple image), just the basics. There are other uses cases where having that directly in the JSON can be useful.

So what is needed?

Not much (i mean its never easy peasy), we need to decide really if this will be part of strawberry_runners module, or we can simply deal with this tiny/quick processing needs directly inside this module with some settings. And then runners can expand/reuse on those settings. Would require me to move the base plugin logic i have staged in runners to SBF, not a big deal really and implement 3x fixed plugins, one for each binary that is modal, executed against uploaded files. Those would be run while persisting/updating an image on storage, same as we do right now persister service. Probably better to move this into its own service that runs after persisting (or not... better like into an event Subscriber but triggered inside the persister since i would love to process this while the files are still local, can always fallback to download to temp, process, etc, but think about larger bigger ones!) since persister is already quite heavy on logic. See https://github.com/esmero/strawberryfield/blob/8.x-1.0-beta2/src/StrawberryfieldFilePersisterService.php

Make Title setter Event Subscriber notifications be less verbose

What?

When we set title/label of a node via the event subscriber, we notify the user everytime something happens, even when the title that is to be set/being set/ is exactly the same as the one that was before.

We don't like that and its confusing.

I looked at the code a few times thinking if we should totally avoid assigning a title if the current one is already in place and equal, but then decided to leave the check out. Since we have to calculate it anyway, its an extra CPU cycle(or cycles) to check/validate before setting. Decide to only check if there was a change after the fact. Not before. Pretty sure i will change my mind in the future and will come to even more optimal code. For now this is OK.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.