Coder Social home page Coder Social logo

Comments (13)

DiegoPino avatar DiegoPino commented on August 27, 2024 1

@alliomeria ping since this is also in your realm and as already documented (ISSUE) we need documentation for the CSV export

from ami.

DiegoPino avatar DiegoPino commented on August 27, 2024

Hi Pat. Thanks for sharing your workflow. Can you please also share privately the CSV used for the update? Did you select "Create AMI Set?" or did you manually download and added the CSV to a custom AMI SET you created afterwards? The issue you might be seeing is the fact that you are providing duplicated info (twice) for the same file (the mapping for Drupal File ID plusin the "Upload" json key + the source for the file). If you do both, then AMI will connect the S3:// as a new File and also keep the connection to the previously added via the FILE ID entity. Said differently, you might not need at all the S3:// again (removing the mapping will work and we might explore-check if the "Create AMI set" during export is doing that already or not), OR removing all the values under the mapped file column too.

I know its a lot to ask for, but could you also test "Just exporting" metadata and updating via that? If I coded this correctly existing files would be preserved. So please test with a few Objects (maybe 2?) that you do not mind having to revert revisions if something goes wrong.

On my side... I could add some logic that checks IF for existing Files the provided Paths are already used and avoid the duplication, not sure how easy that is since the actual "Connection" of files happens not at AMI but as a later step in the Event we trigger when saving...

from ami.

patdunlavey avatar patdunlavey commented on August 27, 2024

@DiegoPino I wonder if it makes sense for update AMI sets that have globalmapping_settings->files, for each file field, to delete the existing the file usages prior to processing the $filenames here?

from ami.

DiegoPino avatar DiegoPino commented on August 27, 2024

Eh.. not sure about that. There is a rational on being atomic and not deleting existing File Entities (I guess that is what you mean with file usages) in case other ADOs are using the same entity OR, there are already enqueued in the Post Processor tasks (e.g HOCR) for the same File Entity. Update AMI Sets should really respect whatever you pass, so if you pass a File Entity ID (one entry-usage) and you pass also a Mapping to a File then both should be respected. What I could do is to "remove" the mapping from the Automatic Created AMI set. Is that what you mean? Maybe I'm not understanding the workflow. What if you share with us the workflow (or expected workflow?) Maybe that way I can add a checkbox or an option that allows that workflow to be respected

from ami.

patdunlavey avatar patdunlavey commented on August 27, 2024

Reading your initial response more carefully...

We encountered this problem with exported data that we re-imported -- not using the VBO export operation but a view that dumps out csv that mirrors csv we used for original import. However what I am documenting here is not utilizing exported data - it's simply the original csv used for import, after it has been processed by AMI to add the node_uuid column.

In no event are we using file ids in the import csv. Always the file paths.

I may be using incorrect terminology regarding file usages. I wasn't suggesting removing managed file entities, but file usages. Are those entities as well?

from ami.

patdunlavey avatar patdunlavey commented on August 27, 2024

...We're not using automatically created AMI sets from the AMI export. These are manually created AMI sets.

from ami.

DiegoPino avatar DiegoPino commented on August 27, 2024

Ok, that explains the behavior correctly. So, I have to say its expected behavior because of this:

foreach ($original_file_mappings as $filekey) {
if (!in_array($filekey, $processed_metadata['ap:entitymapping']['entity:file'])) {
$processed_metadata[$filekey] = $original_value[$filekey] ?? [];
$processed_metadata['ap:entitymapping']['entity:file'][] = $filekey;
}
}
foreach ($original_node_mappings as $nodekey) {
if (!in_array($nodekey, $processed_metadata['ap:entitymapping']['entity:node'])) {
$processed_metadata[$nodekey] = $original_value[$nodekey] ?? [];
$processed_metadata['ap:entitymapping']['entity:node'][] = $nodekey;
}
}
// Really not needed?
$processed_metadata['ap:entitymapping']['entity:node'] = array_unique($processed_metadata['ap:entitymapping']['entity:node']);
$processed_metadata['ap:entitymapping']['entity:file'] = array_unique($processed_metadata['ap:entitymapping']['entity:file']);
// Copy directly all as:mediatype into the child, the File Persistent Event will clean this up if redundant.
foreach(StrawberryfieldJsonHelper::AS_FILE_TYPE as $as_file_type) {
if (isset($original_value[$as_file_type])) {
$processed_metadata[$as_file_type] = $original_value[$as_file_type];
}
}
$this->patchJson($original_value, $processed_metadata);
$itemfield->setMainValueFromArray($processed_metadata);
break;

To avoid expensive/hard to process and SUPER important data and also to allow "Descriptive metadata" only to be used to update ADOs (that is a use case that @dmer shared) we always copy on an update the existing ap:entity mappings. Basically you do not want to pass the File Paths (which for AMI are not Entities, just places where you fetch URLS to create new Entities) into an Update, Except, if you are actually intending to add a new File

So the only option if you are not really using the CSV export with the automatic AMI Set creation is to do extra logic and CHECK if existing Files have the SAME urls that are being passed and in that case avoid creating new File entities

issue with that is the File Entities are Connected to the URLS way before the actual update happens here:

// Now do heavy file lifting
foreach($file_columns as $file_column) {

Which would imply that only for UPDATES:

  1. For each URL passed (and only If the url is local and/or S3) check if inside Keys defined in:
    $entity_mapping_structure['entity:file'] are actually File Entities, Load each one, get the URL and compare. If already present DO NOT process that URL.

That would make Updates way slower... So, just as curiosity. Now that you know that AMI on updates preserves existing Files, do you still need to pass the S3:// values again? Would it not be safer for your users to just fill up Descriptive metadata? Not imposing, just asking if that is a workaround?

If not a workaround, I can add that extra check

from ami.

DiegoPino avatar DiegoPino commented on August 27, 2024

basically here:

$file = $this->AmiUtilityService->file_get($filename,

before I call the file get I need to check if another File Entity with the same URI is in use in this ADO

from ami.

DiegoPino avatar DiegoPino commented on August 27, 2024

@patdunlavey @alliomeria let me know if the solution I propose makes sense to you and I you want me to code it. Thx

from ami.

patdunlavey avatar patdunlavey commented on August 27, 2024

Thanks so much for digging into this @DiegoPino. I'll discuss with Megan tomorrow to better understand her use-case.

You may recall this feature request where I observed that for most sbf fields, an AMI update ingest is destructive when the field is not provided in the csv and mapped in the twig template. So it's necessary to provide complete data for every field in an update ingest. What I have learned today in this issue is that for file fields, the behavior is practically speaking, the opposite. For good reason, I understand! It's even more confusing/mysterious on first glance because file usage data is saved in sbf json, but is not mapped in the twig template. The solution to this may simply be re-calibrating our expectations for how AMI updates work - and omitting file field mapping except where we want to add files to an object. However, I suspect that Megan will tell us that, actually, they do need to be able to add and remove files via AMI updates. If that proves to be true, then we'll have a lot more to talk about!

from ami.

patdunlavey avatar patdunlavey commented on August 27, 2024

I want to note, and correct me if I'm wrong, @DiegoPino, that the changes you proposed only address the file usage duplication problem (as described in this Issue). It does not deal with a use-case that requires replacing or removing existing file usages via AMI.

I'll report back here after discussing with Megan.

from ami.

DiegoPino avatar DiegoPino commented on August 27, 2024

@patdunlavey did you have any updates from Megan on this?

from ami.

patdunlavey avatar patdunlavey commented on August 27, 2024

Sorry for not updating you! They are simply utilizing a strategy of being careful to only ingest a file field once (and never having to change a file via AMI ingest). Their round-trip export/import process excludes the file fields except in cases where the re-import is providing file paths for the first time.
So while this is still an issue, it's not a priority for them.

from ami.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.