Coder Social home page Coder Social logo

curation-dashboard's People

Contributors

coy123 avatar davoros avatar dependabot[bot] avatar dietervu avatar snyk-bot avatar twagoo avatar vronk avatar wowasa avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

curation-dashboard's Issues

Link checker: detect federated login

Links that ultimately redirect to the CLARIN discovery service can be considered to have restricted access, even though the response will consist out of a series of redirects ending up with a 200 at discovery.clarin.eu.

Correct and uniform usage of ratios/percentages

In the collection report we find:
 

modality | 0.5663
Ratio valid Records: 0.9880
Average rate of populated elements: 0.9835%

I would suggest changing all of these to proper percentages with reasonable precision, for example in these cases 56.6%, 98.8% and 98.4%

announcement of curation 3.0

We started the development of curation 3.0 which will have the following features:

  1. Replacement of vaadin
    Beside the isolated cases of instance- and the profile analysis, the web-app of curation uses the vaadin framwork to display static content (reports generated 2-times per week as xml-files) in a dynamic way: means most of the view are created in the moment when the user access a certain page with the help of the framework, although the displayed content is static. This approach wastes resources and time.
    Hence we want the core module not only to generate the reports in xml format but also the HTML views (static pages for static content!). The two cases where we need to create the pages dynamically will be covered by a servlet which transforms XML (the report) to HTML, user-interaction like sorting and filtering is done by jquery, layout by CSS.

  2. Optimization of memory usage
    Currently the curation-core module needs between 2-4GB of heap space while generating the collection reports, since it accumulates the information of each singe CMDI instance in memory to generate the collection report in a final iteration. With some redesign we can pass the required information of each instance directly to the collection report, which would decrease the amount of memory dramatically.

  3. Establishment of multi-threading on the Java-level
    In the collection mode the current version of curation-core takes the path to a single collection directory as input parameter and generates a single collection report by analyzing all the files from this collection directory. Means the program has to be called for each collection, which is in our case done by a shell script. Multi-threading is established by the shell script, which runs a configurable number of processes. And for large collections (>10000 files) by the use of stream parallelization in Java.
    In curation 3.0 the multi-threading will be established in a configurable way on the Java level.
    This includes that coration-core further on is not processing one single collection anymore but it processes all collections descending from a given root.
    This approach has also the advantage that it enables curation-core to generate an overview of all collection results as it is needed on the collections view without the need re-read the collection reports from the file system again.

Link Checker: Status 307 with dubious info

Consider: https://curate.acdh.oeaw.ac.at/statistics/IDS_Repository/307; the first link is:
http://hdl.handle.net/10932/00-03FE-203B-D2BD-4801-9, which currently gives the following information:

Message: HTTP entity too large to be buffered in memory
Expected Content Type:
Content Type: text/html;charset=utf-8
Byte Size: 0
Request Duration(ms): 36
Method: N/A
Timestamp: 2020-03-04 17:20:59.0

This information is dubious:

  • The message about download size looks like an error encountered by the repository, but it seems to be have occurred inside the checker; it is possible to download the file.
  • (Byte size is evidently wrong, probably due to the download problem.)
  • What information is supposed to go in Expected Content Type? If it nothing expected, state it; otherwise, it looks like an error.
  • The Content Type is the one provided by the first URL, at the end of the redirect chain application/zip is provided.
  • Method is not helpful.

Maybe it would be possible to modify the checker to report more conspicuously as follows:

Message: (No diagnosis performed on content. HTTP entity too large to be buffered in memory)
Expected Content Type: (no expectation)
Content Type: application/zip
Byte Size: N/A, due to download size
Request Duration(ms): 36
Method: <SOMETHING_USEFUL>
Timestamp: 2020-03-04 17:20:59.0

For Content Type, one might also consider the chain of redirects:

Content Type:  ["text/html;charset=utf-8"; NONE; "application/zip"]

float value delimiter

Float values in generated reports use comma delimiters. However, point delimiters are default in most programming languages. For compatibility with other programming languages and plotting packages e.g. when analysis the quality of collections, I advise using point delimiters.
Since float formatting in Java is bound to the locale it should be changed by defining other locale (e.g. 'US')
See a stackoverflow post on that issue:
http://stackoverflow.com/questions/4553633/java-float-formatting-depends-on-locale

validation is not namespace aware

The current validation is not namespace aware, this allows invalid CMD records to pass through unnoticed. Take for example:

https://vlo.clarin.eu/data/clarin/results/cmdi/Huygens_Metadata_Repository/oai_oaipmh_huygens_knaw_nl_womenwriters_ffffdb1c_e0b2_4ceb_8720_e13b2498b5bb.xml

The cmdi:Components child elements are all member of the default namespace with URI http://www.openarchives.org/OAI/2.0/. This is not possible in a CMDI records, so this record should be considered invalid.

Collection Header section score and profile report score don't match

It is first mentioned here: #53

2.) The Header Section contains a score per ID. Sometimes it matches the score in CMD Profile Report, sometimes it seems to be a difference of 1 to this score?
e.g. https://curate.acdh.oeaw.ac.at/collection/IDS_Repository.html
clarin_eu_cr1_p_1455633534543
Score: 1.65
https://curate.acdh.oeaw.ac.at/profile/clarin_eu_cr1_p_1455633534543.html
Total: 2.65 Max: 3.00
Could this be an error or does the score in the Collection report refer to a different score? If so what score?

This seems to be a bug. In fact it seems, in collection header section, it is one less than profile report score. So somewhere an extra 1 point is being added.

Page layout issues

See the following screenshot of https://curate.acdh.oeaw.ac.at/profile/table rendered in Firefox:

Screenshot 2019-07-18 at 15 44 23

Some global page layout issues:

  • header very crammed
  • table div not fully 'justified', should fill all horizontal space; now it looks strangely truncated
  • footer vertically truncated

Reducing the window size amplifies these issues:
Screenshot 2019-07-18 at 15 48 04

Using column based layout of Bootstrap or some other tried and tested layout framework would be the best way to tackle this.

URL-Checking in a separate application that runs 24 hours

The separate application saves the results in a database and curation-module fetches the urls and the results from this database. The separate application should control the requests to different servers in order not to send too many requests to each servers.

Java Null pointer exception when assessing profile

Hi all,
when running an assessment of the profile clarin.eu:cr1:p_1447674760337 I receive a
java.lang.NullPointerException
The schema (CMDI 1.1) is
https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1447674760337/xsd

here the complete error message:

at eu.clarin.cmdi.curation.report.CMDProfileReport.getName(CMDProfileReport.java:77)
at eu.clarin.web.views.ResultView.curate(ResultView.java:101)
at eu.clarin.web.views.ResultView.enter(ResultView.java:64)
at com.vaadin.navigator.Navigator.navigateTo(Navigator.java:616)
at com.vaadin.navigator.Navigator.navigateTo(Navigator.java:573)
at eu.clarin.web.views.CurationForm.curate(CurationForm.java:55)
at eu.clarin.web.views.CurationForm.lambda$new$61446b05$2(CurationForm.java:30)
at sun.reflect.GeneratedMethodAccessor113.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.vaadin.event.ListenerMethod.receiveEvent(ListenerMethod.java:508)
at com.vaadin.event.EventRouter.fireEvent(EventRouter.java:198)
at com.vaadin.event.EventRouter.fireEvent(EventRouter.java:161)
at com.vaadin.server.AbstractClientConnector.fireEvent(AbstractClientConnector.java:1008)
at com.vaadin.ui.Button.fireClick(Button.java:377)
at com.vaadin.ui.Button$1.click(Button.java:54)
at sun.reflect.GeneratedMethodAccessor112.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.vaadin.server.ServerRpcManager.applyInvocation(ServerRpcManager.java:158)
at com.vaadin.server.ServerRpcManager.applyInvocation(ServerRpcManager.java:118)
at com.vaadin.server.communication.ServerRpcHandler.handleInvocations(ServerRpcHandler.java:408)
at com.vaadin.server.communication.ServerRpcHandler.handleRpc(ServerRpcHandler.java:273)
at com.vaadin.server.communication.UidlRequestHandler.synchronizedHandleRequest(UidlRequestHandler.java:79)
at com.vaadin.server.SynchronizedRequestHandler.handleRequest(SynchronizedRequestHandler.java:41)
at com.vaadin.server.VaadinService.handleRequest(VaadinService.java:1409)
at com.vaadin.server.VaadinServlet.service(VaadinServlet.java:364)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:108)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:522)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:349)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:1110)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:785)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1425)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
see XML

Make link checking reports downloadable as data (XML/JSON)

The HTML rendering of, e.g., the link checking is great. I would find it even easier if the data were available as JSON or XML as this would facilitate analysis in case of large numbers by, e.g. grouping by message rather than code only. Given that the HTML seems to be incrementally loaded, just downloading that is also not an option.

(Maybe I am missing something but I do not find an address to ask a question to, so this seems to be the preferred mode of communication?)

implementation of value mapping

The latest version of the VLO has implemented the concept of value mappings which enlarges the concept of uniform maps and might replace it in the long run. This new concept has to be reflected in the curation-module

annoucement of curation 3.0.1

I'm going to deploy curation 3.0.1 this Friday (July 12th, 12 p.m.). This will include the following features:

  • correction of the filter bug for profiles view
  • limiting non-valid files list in collection view to 100 files (full list is in the xml report), show/hide validation errors, make each file downloadable from clarin.eu and individually analyzable

Add option to download link checking detail list as XML, JSON and TSV

Some lists have thousands of links in the link checking statistics view. Therefore I propose to have a button to download an csv report of the full list. This way, users can have the full list and can run automatic tools on it if they wish.

There needs to be a solution for huge data. Some pages have millions of links.

Formats:

  • CSV or TSV
  • XML
  • JSON

Validation Issue: correct XML file is marked as wrong

The curation module says for the XML validation section:
Invalid Records: https_talar_sfb833_uni_tuebingen_de_8443_erdora_rest_SFB833_A02_Gedichte_20Emily_20Dickinson_Gedichtkorpus_20Emily_20Dickinson

This refers to the metadata available at https://talar.sfb833.uni-tuebingen.de:8443/erdora/displaycmdi?path=%2FSFB833%2FA02%2FGedichte+Emily+Dickinson%2FGedichtkorpus+Emily+Dickinson%2FFID_129.cmdi.xml

However, this file is valid - at least according to two XML parsers.
It would be helpful to provide the complete validation error - if any - to verify that there is a problem.

Making scores and reports more useful

The Curation Module provides a lot useful information. The results could benefit from a link to a documentation. Here a number of examples:

1.) The Collection Report gives an “Average Score: xx out of 15”
How is the number 15 motivated? Where can one find what exactly needs to be improved?

2.) The Header Section contains a score per ID. Sometimes it matches the score in CMD Profile Report, sometimes it seems to be a difference of 1 to this score?
e.g. https://curate.acdh.oeaw.ac.at/collection/IDS_Repository.html
clarin_eu_cr1_p_1455633534543
Score: 1.65
https://curate.acdh.oeaw.ac.at/profile/clarin_eu_cr1_p_1455633534543.html
Total: 2.65 Max: 3.00
Could this be an error or does the score in the Collection report refer to a different score? If so what score?

3.) To improve usefulness, more information on which entries or concepts are linked to facets and how/why would be very helpful.

e.g. How is the relation between the facets found in the Curation Module and the facets in this tool:
https://cmdi.clarin.eu/mapping/index.html#mapping

4.) A partial documentation of Cmd Component Section and Cmd Concept Section can be found in the specification: https://office.clarin.eu/v/CE-2016-0742-CLARINPLUS-D2_1.pdf but it would be much more comfortable to get the information on one page describing all features of the report.

Score record title 'friendliness'?

Many records have a (mapped) title that is not very descriptive or even friendly to the human reader. For example the otherwise excellent records from the Leipzig Corpora Collection have names like ukr_newscrawl_2011_1M and LCC data provider "www.elkhabar.com" in resource with identifier 11022/0000-0000-7F4F-B These values come from a 'name' field, there is no additional title; this would be the solution, but this issue is about identifying issues.

I don't have an exact solution but it would be nice if the name/title, as shown in the VLO, could be scored according to some heuristic analysis that could involve the length of the title, number of characters, relative number of spaces, relative number of non-alphabetical characters. Perhaps such a measure already exists. I would not make this a big part of the overall score but it would be nice as a soft warning as part of the metadata quality report.

If I come across existing methods for scoring title quality for human readability, I will share it in a comment to this issue.

upgrade to vaadin 10

Currently the subproject curation-module-web uses version 7 of vaadin. Since this version is only supported until the beginning of 2019 (see https://vaadin.com/faq »Are Vaadin 7 and 8 still relevant?«) we should think about an upgrade of vaadin to the current version 10 over the course of the current year.

A Project to be used by Stormychecker and Curation Module for shared code

Both projects share some functionality:

  • Category definitions
  • Category to Exception Mapping
  • Sending requests and receiving results
  • ...
    Therefore a new project to be used as a dependency is necessary similar to RASA(cant use RASA for this because its purpose is database access and not http)

Analysis failed with HTTP Status 500

I just tried

https://clarin.oeaw.ac.at/curate/rest/instance/?url=http://www.deutschestextarchiv.de/api/cmdi/reich_mammon_1910

and got

HTTP Status 500 - java.lang.ClassCastException: eu.clarin.cmdi.curation.report.ErrorReport cannot be cast to eu.clarin.cmdi.curation.report.CMDInstanceReport

type Exception report
message java.lang.ClassCastException: eu.clarin.cmdi.curation.report.ErrorReport cannot be cast to eu.clarin.cmdi.curation.report.CMDInstanceReport

description The server encountered an internal error that prevented it from fulfilling this request.

exception

javax.servlet.ServletException: java.lang.ClassCastException: eu.clarin.cmdi.curation.report.ErrorReport cannot be cast to eu.clarin.cmdi.curation.report.CMDInstanceReport
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:434)
org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:372)
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

root cause

java.lang.ClassCastException: eu.clarin.cmdi.curation.report.ErrorReport cannot be cast to eu.clarin.cmdi.curation.report.CMDInstanceReport
eu.clarin.rest.CurationRestService.assessInstance(CurationRestService.java:26)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:74)
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:247)
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:388)
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:346)
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:337)
org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
org.glassfish.jersey.internal.Errors.process(Errors.java:315)
org.glassfish.jersey.internal.Errors.process(Errors.java:297)
org.glassfish.jersey.internal.Errors.process(Errors.java:267)
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:280)
org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:316)
org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1084)
org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:418)
org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:372)
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

note The full stack trace of the root cause is available in the Apache Tomcat/8.5.3 logs.

HTML-version via the REST-API

HTML-version via the REST-API
Expose the XSL-stylesheets
=> make a printable simple html page (via content negotiation or xslt processing instruction)

Link to download XML report

Would it be possible to have (and somewhere in the application provide) a URL that leads to the XML report for a collection, profile or instance? This would be helpful for sharing purposes, e.g. in the context of the centre assessment.

sharing code/libraries with VLO project

some classes/code is either copied from or to the vlo project. To synchronize the evolution of both projects and to assure that import validation as well as curation validation are handled in the same way, the curation module should use libraries from the vlo project (eventually we have to repack some classes of the vlo project in separate libraries first).

all report types only written to profile subdirectory

When choosing to generate local report xml files via setting SAVE_REPORT=true in config.properties, the report sub directories "collection", "instances" and "profiles" are created. However, all report types are written to "profiles" while the other directories stay empty.

Best regards,
Florian

Java Null pointer exception when assessing profile

I tried to assess a profile and received a java null pointer exception
I was assessing https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1447674760337/xsd

and received the following error message.

Error while curating profile from https://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1447674760337/xsd! java.lang.NullPointerException at eu.clarin.cmdi.curation.report.CMDProfileReport.calculateScore(CMDProfileReport.java:90) at eu.clarin.cmdi.curation.processor.AbstractProcessor.process(AbstractProcessor.java:17) at eu.clarin.cmdi.curation.entities.CurationEntity.generateReport(CurationEntity.java:32) at eu.clarin.cmdi.curation.main.CurationModule.processCMDProfile(CurationModule.java:31) at eu.clarin.web.views.ResultView.curate(ResultView.java:83) at eu.clarin.web.views.ResultView.enter(ResultView.java:64) at com.vaadin.navigator.Navigator.navigateTo(Navigator.java:616) at com.vaadin.navigator.Navigator.navigateTo(Navigator.java:573) at eu.clarin.web.views.CurationForm.curate(CurationForm.java:55) at eu.clarin.web.views.CurationForm.lambda$new$61446b05$2(CurationForm.java:30) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.vaadin.event.ListenerMethod.receiveEvent(ListenerMethod.java:508) at com.vaadin.event.EventRouter.fireEvent(EventRouter.java:198) at com.vaadin.event.EventRouter.fireEvent(EventRouter.java:161) at com.vaadin.server.AbstractClientConnector.fireEvent(AbstractClientConnector.java:1008) at com.vaadin.ui.Button.fireClick(Button.java:377) at com.vaadin.ui.Button$1.click(Button.java:54) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.vaadin.server.ServerRpcManager.applyInvocation(ServerRpcManager.java:158) at com.vaadin.server.ServerRpcManager.applyInvocation(ServerRpcManager.java:118) at com.vaadin.server.communication.ServerRpcHandler.handleInvocations(ServerRpcHandler.java:408) at com.vaadin.server.communication.ServerRpcHandler.handleRpc(ServerRpcHandler.java:273) at com.vaadin.server.communication.UidlRequestHandler.synchronizedHandleRequest(UidlRequestHandler.java:79) at com.vaadin.server.SynchronizedRequestHandler.handleRequest(SynchronizedRequestHandler.java:41) at com.vaadin.server.VaadinService.handleRequest(VaadinService.java:1409) at com.vaadin.server.VaadinServlet.service(VaadinServlet.java:364) at javax.servlet.http.HttpServlet.service(HttpServlet.java:729) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:230) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:192) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:165) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:108) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:522) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:620) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:1096) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:760) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1480) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745)
see XML

changing the color scheme

blue font color on black background is hardly visible especially at presentation. Therefore the color scheme should be changed for a better contrast

Validation of resource availability and type

Much value would be added to the quality assessment if linked resources could be checked for a number of properties:

  • matching actual content type at retrieval, if mime type specified in the resource proxy (or elsewhere in the metadata in a reliable way)
  • retrievability in accordance with availability level, e.g. if 'PUB' all linked resources should be retrievable over HTTP with a 200 response; optionally detect login pages and/or shibboleth redirects
  • general online availability: none of the resources or landing/search pages should lead to a 404, 500 or other error code (after optional redirects)

Normalization

implement normalization step in the core component

Total number of resource proxy links is not calculated anywhere

It is in code but after the initial setting of 0, it is not calculated. Therefore it is always 0. Is it possible to calculate this by counting the number of links that have a mime type?

So I assume resource proxy links always have an expected mime type associated and this is extracted during report generation. And no other links have an expected mime type. If this assumption is true, I can calculate this by counting the number of links that have a mime type. Is this a good solution or is there another way to determine the number of resource proxy links (and average number of resource proxy links)?

In collection reports, there is a whole resource proxy section the following statistics: total number of resource proxies and total number of resource proxies with MIME. Therefore the number in url section is redundant. I will delete it from there. It was never calculated anyways, so its loss wouldn't be missed.

Wolfgang mentioned that resource proxy section actually belongs to the url report section. So we can talk about incorporating it in there. But for now I'm deleting it from url report.

Validation of hierarchical metadata

Email from Florian Schiel:

But when testing CMDI instances nested in a hierarchy I encountered the following (conceptional?) problem:
Each CMDI instance is tested by the module in isolation. Why is this a problem?

Consider for example a 2-level hierarchy of metadata: on the first level (corpus level) the metadata of a complete collection of resources is stored as in [1]; on the second level (that is linked as resources of type 'Metadata') in the first level) the metadata of a single resource is stored as in [2]. To avoid massive replication, MD that concern all members of the collection are only stored in the first level, for example availabilty.
When analysing a single CMD instance of the second level, we can't find this information in the CMDI. But what we find is a pointer to the upper level, namely the IsPartOf entry in the CMDI header.

So, I guess my questions are:

  1. Since we encourage users to build redundant-free hierarchical MD structures in the CMD framework, would it be possible that the curator module follows hierarchies (if they are there) all the way to the top and add the encountered MD to the MD content of the CMDI?
    ...

Integrate detailed mapping description from the facet mapping tool

The output of the facet mapping tool (written by @menzowindhouwer) is still adding value to the curation module for the detailed mapping description it can provide for a given profile. However it is unclear for how long this tool will be maintained or running in its current location. Moreover it is inconvenient for users to have to use two tools with overlapping functionality.

Sample of useful output that (as far as I know) cannot be obtained from the Curation Module:


<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <title>VLO mapping for profile talkbank-session (clarin.eu:cr1:p_1393514855466)</title>
   </head>
   <body>
      <h1><a href="index.html">VLO mapping</a> for profile talkbank-session (<a href="http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1393514855466/xml">clarin.eu:cr1:p_1393514855466</a>)
      </h1>
      <dl>
         <dt>Facet: id</dt>
         <dd>
            <dl></dl>
         </dd>
         <dt>Facet: _selfLink</dt>
         <dd>
            <dl></dl>
         </dd>
         <dt>Facet: collection</dt>
         <dd>
            <dl></dl>
         </dd>
         <dt>Facet: projectName</dt>
         <dd>
            <dl>
               <dt>Matched CMD Element ConceptLink: <a href="http://hdl.handle.net/11459/CCR_C-2536_13fc5f10-c14a-1f64-a669-32736f6d3ef5">http://hdl.handle.net/11459/CCR_C-2536_13fc5f10-c14a-1f64-a669-32736f6d3ef5</a></dt>
               <dd>
                  <dl>
                     <dt>/c:CMD/c:Components/c:talkbank-session/c:Session/c:MDGroup/c:Project/c:Name/text()</dt>
                     <dd>xpath accepted</dd>
                  </dl>
               </dd>
               <dt>Matched CMD Element ConceptLink: <a href="http://hdl.handle.net/11459/CCR_C-2537_fa206273-223a-f4fa-dde3-ba59b965701f">http://hdl.handle.net/11459/CCR_C-2537_fa206273-223a-f4fa-dde3-ba59b965701f</a></dt>
               <dd>
                  <dl>
                     <dt>/c:CMD/c:Components/c:talkbank-session/c:Session/c:MDGroup/c:Project/c:Title/text()</dt>
                     <dd>xpath accepted</dd>
                  </dl>
               </dd>
            </dl>
         </dd>
         <dt>Facet: name</dt>
         <dd>
            <dl>
               <dt>Matched CMD Element ConceptLink: <a href="http://hdl.handle.net/11459/CCR_C-2544_3626545e-a21d-058c-ebfd-241c0464e7e5">http://hdl.handle.net/11459/CCR_C-2544_3626545e-a21d-058c-ebfd-241c0464e7e5</a></dt>
               <dd>
                  <dl>
                     <dt>/c:CMD/c:Components/c:talkbank-session/c:Session/c:Name/text()</dt>
                     <dd>xpath accepted</dd>
                  </dl>
               </dd>
               <dt>Matched CMD Element ConceptLink: <a href="http://hdl.handle.net/11459/CCR_C-2545_d873f2ab-2a2f-29d6-a9ab-260cde57f227">http://hdl.handle.net/11459/CCR_C-2545_d873f2ab-2a2f-29d6-a9ab-260cde57f227</a></dt>
               <dd>
                  <dl>
                     <dt>/c:CMD/c:Components/c:talkbank-session/c:Session/c:Title/text()</dt>
                     <dd>xpath accepted</dd>
                  </dl>
               </dd>
            </dl>
         </dd>
...

Full output example (PDF)

Exception in tooltips

The instance assessment form as well as the profiles and colletions tables show a stacktrace (StringIndexOutOfBoundsException) in the tooltip when hovering over any of the cells.

jquery and jquery datatables loaded twice, also unminified

I noticed that in the web pages of Curation 3.0, the jquery and datatable js files are loaded twice, both in minified and non-minified form.

from the page source:

...
   <!-- #wrapper-footer-full --> 
  </div> 
  <script type="text/javascript" src="/view/fundament/vendor/jquery/jquery.min.js"></script> 
  <script type="text/javascript" src="/view/fundament/js/fundament.min.js"></script> 
  <script type="text/javascript" src="/view/js/dropzone.min.js"></script> 
  <script type="text/javascript" src="/view/js/curate.js"></script> 
  <script type="text/javascript" charset="utf8" src="https://cdn.datatables.net/1.10.19/js/jquery.dataTables.js"></script> 
  <script type="text/javascript" src="https://code.jquery.com/jquery-3.3.1.js"></script> 
  <script type="text/javascript" src="https://cdn.datatables.net/1.10.19/js/jquery.dataTables.min.js"></script> 
  <script type="text/javascript" src="https://cdn.datatables.net/fixedheader/3.1.5/js/dataTables.fixedHeader.min.js"></script> 
...

Note that there is both src="https://code.jquery.com/jquery-3.3.1.js" and src="/view/fundament/vendor/jquery/jquery.min.js"
and both src="https://cdn.datatables.net/1.10.19/js/jquery.dataTables.js" and src="https://cdn.datatables.net/1.10.19/js/jquery.dataTables.min.js"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.