Coder Social home page Coder Social logo

Comments (12)

RichardChappell avatar RichardChappell commented on August 24, 2024 1

I have finally managed to import some UK data. For the benefit of future readers

The task was to import 4 Snomed files

  1. SnomedCT_InternationalRF2_PRODUCTION_20180731T120000Z
  2. SnomedCT_UKEditionRF2_BETA_20200204T000002Z
  3. SnomedCT_UKClinicalRF2_BETA_20200204T000002Z
  4. SnomedCT_UKClinicalRefsetsRF2_BETA_20200204T000002Z

Steps taken.

  1. Start with blank repository
  2. Imported SnomedCT_InternationalRF2_PRODUCTION_20180731T120000Z
  3. Imported SnomedCT_UKEditionRF2_BETA_20200204T000002Z
  4. Failed to import SnomedCT_UKClinicalRF2_BETA_20200204T000002Z
  5. Failed to import SnomedCT_UKClinicalRefsetsRF2_BETA_20200204T000002Z
  6. Removed Refset data from SnomedCT_UKClinicalRF2_BETA_20200204T000002Z
  7. Imported SnomedCT_UKClinicalRF2_BETA_20200204T000002Z_REMOVED_REFSETS
  8. Removed Refset data from SnomedCT_UKClinicalRefsetsRF2_BETA_20200204T000002Z
  9. Imported SnomedCT_UKClinicalRefsetsRF2_BETA_20200204T000002Z_REMOVED_REFSETS
  10. Imported SnomedCT_UKClinicalRF2_BETA_20200204T000002Z
  11. Still failed to import SnomedCT_UKClinicalRefsetsRF2_BETA_20200204T000002Z
  12. Removed missing description ids from SnomedCT_UKClinicalRefsetsRF2_BETA_20200204T000002Z
  13. Imported SnomedCT_UKClinicalRefsetsRF2_BETA_20200204T000002Z_REMOVED_DESC_IDS

Result

• SnomedCT_InternationalRF2_PRODUCTION_20180731T120000Z - imported first go
• SnomedCT_UKEditionRF2_BETA_20200204T000002Z - imported first go
• SnomedCT_UKClinicalRF2_BETA_20200204T000002Z – imported after importing terminology from files 3 & 4
• SnomedCT_UKClinicalRefsetsRF2_BETA_20200204T000002Z – imported after importing terminology from files 3 & 4, and removing missing description ids

At steps 4 and 5 there were ‘missing component’ errors in the log file. I had to eliminate these errors to successfully import the data. Steps 6/7 and 8/9 were necessary to get missing concepts in before re-importing the originals (steps 10 and 11).

At step 11 there were ‘missing description’ errors in the log file. I manually removed them from the original data and used this archive to successfully import the UKClinicalRefsets data

from snow-owl.

cmark avatar cmark commented on August 24, 2024

Hi @RichardChappell

Thank you for reporting the issue.
To find the cause of failures (such as 500 Something went wrong HTTP responses, or internal errors during import, etc.) please find the appropriate log*.log file under your Snow Owl installation directory (usually under /var/log/snowowl) and open it in a text editor or use grep to find any lines with ERROR or Exception in it.

Could you please share the error with us so we can fix the issue in Snow Owl for the next release?

Apart from the failures, the import steps you have taken looks OK to me.
Snow Owl should be able to import both the single distribution file or separate distributions (INT then UK-CL).

Regards,
Mark

from snow-owl.

RichardChappell avatar RichardChappell commented on August 24, 2024

Here are the log files

SnowOwl - log_2020-01-16.log
SnowOwl - log_2020-01-17.log

I started the import of UK-CL at 16Jan2020 09:33 and noticed the failure at 16Jan2020 12:08
I decided to import INT at 16Jan2020 14:04 and noticed the failure on 17Jan2020 06:58

from snow-owl.

RichardChappell avatar RichardChappell commented on August 24, 2024

The single distribution method did not work for me

curl -X POST "http://localhost:8000/snowowl/snomed-ct/v3/imports/6f700174-0fc3-4cb2-b9bd-0a8e6722707c/archive" -H "accept: /" -H "authorization: Basic c25vd293bDpzbm93b3ds" -H "Content-Type: multipart/form-data" -F "file=@uk_sct2cl_28.0.0_20191001000001.zip;type=application/x-zip-compressed"
{"status":500,"code":0,"message":"Something went wrong during the processing of your request.","errorCode":0,"statusCode":500}

from snow-owl.

cmark avatar cmark commented on August 24, 2024

Hi @RichardChappell,

According the first log file the current multipart configuration for uploading files is not sufficient enough for the combined UK CL release. We will consider increasing that value for the next release. You can do that manually in the configuration/jetty-http.xml file until the next release.

Importing the individual files looks about right, except that it fails due to missing components in certain effective time dates, see here:

[2020-01-16T11:48:02.318] ERROR eventbus-101    import    Component refers to a non-existing concept with id '410399009' in effective time '2011-04-01' 

I suggest importing the INT Snapshot from the official release and then the extracted UK CL extension Snapshot version from this release archive.

Regards,
Mark

from snow-owl.

RichardChappell avatar RichardChappell commented on August 24, 2024

Hi Mark,

I have successfully imported the full INT portion of the official release and have managed to access various components (concepts, descriptions etc). The import took about 19 hours so I have held back on importing the UK extension so my question now is, had the multipart import worked, would a branch under MAIN have been automatically created for the UK part?

Additionally, I have looked at the configuration/jetty-http.xml file but I don't know what to change. It doesn't appear that I have the appropriate configuration section.

from snow-owl.

cmark avatar cmark commented on August 24, 2024

Hi Richard,

I'm glad to hear the import worked for you.
The UK extension branch should be manually created before you start the import on it, there is not automated creation of any child branches during RF2 import. The branch you select in the RF2 import configuration will be the target of the imported content.
To import the UK extension to its own dedicated branch, you need to create a branch via POST /snomed-ct/v3/branches then create a CodeSystem with POST /admin/codesystems to mark the content there with the SNOMEDCT-UK extension name for example.

Let me know if you have more questions.

Cheers,
Mark

from snow-owl.

RichardChappell avatar RichardChappell commented on August 24, 2024

Hi Mark,

Thanks for your reply.

I feel like I am going down a bit of a rabbit hole. All I am trying to do is import a SNOMED archive using your API. Because this didn't work I figured I could separate the archive into its constituent zip files and import them individually. I don't really know if I need a separate branch for the UK part - I am just guessing as to what happens if everything worked ok.

I did a FULL import of INT because you identified errors like 'Component refers to a non-existing concept XXX' in the expectation that I should not get such errors when importing a UK snapshot. However, it appears that when the process comes to loading certain refsets it still logs these errors. Why should this be so when I am importing an official UK SNOMED release?

You pointed out that there is some configuration setting that might help import the combined release. I have looked at the config files but I don't know what to change. Please would you show me an example?

from snow-owl.

cmark avatar cmark commented on August 24, 2024

Hi Richard,

Could you please try to import the UK extension without those invalid members?

Reference sets (especially the UK ones) often reference components that do not exist at all in the official releases. To fix them, you need to remove those lines and try to import it again.
It is up to you how you configure your terminology server, if the only release you are going to have at the end is the UK Edition, then feel free to import everything onto the MAIN branch.
Also, the config file changes would only allow you to import the archive in a single go, but since you have already split it to two archives, feel free to continue the import with them.

Let me know if you have any further questions.

Cheers,
Mark

from snow-owl.

RichardChappell avatar RichardChappell commented on August 24, 2024

Hi Mark,

I tried your suggestion and it feels like a losing battle. When I perform an import if fails after finding 100 non-existent concept ids. So I then remove those concepts ids from the data files using a custom program. Next I re-zip the data files and re-import. Again, another batch of 100 non-existent concept ids and so it goes on with no end in sight. At the moment I am doing this as a proof of concept exercise but my company will very soon want to have a reliable solution without resorting to hacking the data.

A contact at NHS Digital informed us that from April 2020 the UK release will not have dependencies like the ones in the past and so should be more reliable. I have downloaded new data files and it looks like previously referenced but non-existent concept ids do not appear in this set of data, which looks promising.

I would now like to import them as a single archive. What are the configuration changes I need to make to do this?

Thanks
Richard

from snow-owl.

cmark avatar cmark commented on August 24, 2024

Hi @RichardChappell,

Sorry to hear your struggle with the UK import. We've never tried to import the UK content as a single archive into Snow Owl, because we would always like to have a dedicated branch for the international content and for the extensions.

To import the archive in a single go, it is necessary to increase to allowed max file size limit, which in the latest 7.3.0 version is not possible to set via a configuration file. It is currently hardcoded to ~734Mb.
My suggestion would be to import the international RF2 part first (or if you have already imported previously then use that dataset) then import the UK CL content.

In the meantime, I'll change Snow Owl to let users configure the max upload limit from the snowowl.yml configuration file, so after the next 7.4.0 release you will be able to import the UK Edition without splitting the archive into two parts.

Let me know if you have any further questions.

Cheers,
Mark

from snow-owl.

cmark avatar cmark commented on August 24, 2024

Hi @RichardChappell,

Glad to hear you have managed to get the UK CL data imported into your Snow Owl instance.
Feel free to open another issue if you run into any trouble or have a question.

Cheers,
Mark

from snow-owl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.