See the error logs per workflow stage in: <a href="https://git

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Take action on error logs to improve data quality in KG about rcgraph HOT 5 OPEN

ceteri commented on June 11, 2024

Take action on error logs to improve data quality in KG

from rcgraph.

Comments (5)

lobodemonte commented on June 11, 2024 1

After looking at Failures for Step 2, sources of failures seem to come from:

Title differs on special character (quotes, hyphens, etc
Title with encoding issue or cut-off special character
API Source not used (e.g. RePEc, SSRN)

from rcgraph.

lobodemonte commented on June 11, 2024 1

I reran the failed titles (step_2) (763 total) with other API's and found that by adding 3 more APIs we could decrease the number of failed lookups by 25%, but that leaves a huge chunk of titles with possible typing/encoding-related errors
CrossRef: (found/total)
33/763
CrossRef + PubMed:
38/763
CrossRef + DataCite:
109/763
Core:
95/763
CrossRef + DataCite + Core:
195/763

from rcgraph.

ceteri commented on June 11, 2024

Thank you @lobodemonte -
Sounds like the improved title matching (less exact matches) may help here.

from rcgraph.

ceteri commented on June 11, 2024

Nice work! That's super-helpful to guide how we leverage the APIs in the workflow.

For the typing/encoding-related errors, is it mostly the case that the titles we have are incorrect?

from rcgraph.

lobodemonte commented on June 11, 2024

Hi @ceteri,

Yes, the rest seem to be either some minor difference with the actual title and what we have in our data partitions, there's also a small portion of titles that won't have a doi (e.g. RePEc publications)

from rcgraph.

Take action on error logs to improve data quality in KG about rcgraph HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent