Coder Social home page Coder Social logo

roco-dataset's People

Contributors

razorx89 avatar saviola777 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

roco-dataset's Issues

Windows compatibility issue

Multiprocessing on Windows uses a different process spawning mechanism and thus one cannot rely on visibility for global variables. Unix based processes always use forking as default process spawning.

Questions about the RoCo version released on Kaggle

After multiple failed download attempts, I discovered a Roco dataset on the Kaggle platform. Is there any difference between this dataset and the one I was trying to download, or is it just another version uploaded to Kaggle by someone else? I'm very confused...

zlib error

Hi,

I keep getting "zlib.error: Error -3 while decompressing data: invalid code lengths set" or other zlib errors during the download. Is there any fix on the script or this is a data problem?

Cheers

Windows Error when running fetch.py

Hello, I wanted to know if someone can help me with the following issue regarding running the script fetch.py on Windows 10 using python 3.8.11

I get the following output and error message:

Configuration:
Subdirectory: images
Extraction directory: C:\Users\franc\AppData\Local\Temp\roco-dataset
Keep archives: False
Delete contents of extraction directory: True
Number of processes: 4
Number of download retries: 10
Fetching ROCO dataset images...
multiprocessing.pool.RemoteTraceback:

Traceback (most recent call last):
  File "C:\Users\franc\anaconda3\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\franc\roco-dataset\scripts\fetch.py", line 131, in process_group
    result = download_archive(extraction_dir_name, archive_url,
  File "C:\Users\franc\roco-dataset\scripts\fetch.py", line 209, in download_archive
    return subprocess.call(['wget', '-nc', '-nd', '-c', '-q', '-P',
  File "C:\Users\franc\anaconda3\lib\subprocess.py", line 340, in call
    with Popen(*popenargs, **kwargs) as p:
  File "C:\Users\franc\anaconda3\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Users\franc\anaconda3\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "scripts\fetch.py", line 330, in <module>
    for i, pmc_id in enumerate(pool.imap_unordered(process_group,
  File "C:\Users\franc\anaconda3\lib\multiprocessing\pool.py", line 868, in next
    raise value
FileNotFoundError: [WinError 2] The system cannot find the file specified

Not sure what Im doing wrong. Any ideas on how to solve the issue?

Thanks in advance

Need ur help

Could you please lemme know how did you downloaded the data from PubMed. And why this url is not workinghttps://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?tool=roco-fetch&email=[email protected]&id=.
Thank you.

Error: download failed, retrying

my environment has proxy
i can not download any image of dataset. but i can download other txt file of the dataset.

python scripts/fetch.py -n 1
Configuration:
Subdirectory: images
Extraction directory: /tmp/roco-dataset
Keep archives: False
Delete contents of extraction directory: False
Number of processes: 1
Number of download retries: 10
Fetching ROCO dataset images...
Error: download failed, retrying
Trying to get new archive URL: https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?tool=roco-fetch&[email protected]&id=PMC4608653
Error: download failed, retrying
Error: download failed, retrying
Error: download failed, retrying
Error: download failed, retrying
Error: download failed, retrying

"""
Traceback (most recent call last):
File "/home/work/lisa/miniconda3/lib/python3.11/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/home/work/lisa/distributed/visual-med-alpaca/roco-dataset/scripts/fetch.py", line 151, in process_group
result = download_archive(extraction_dir_name, archive_url,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/work/lisa/distributed/visual-med-alpaca/roco-dataset/scripts/fetch.py", line 212, in download_archive
raise Exception("Giving up download of archive {0} after {1} tries"
Exception: Giving up download of archive ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/8d/34/PMC4608653.tar.gz after 11 tries
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/work/lisa/distributed/visual-med-alpaca/roco-dataset/scripts/fetch.py", line 338, in
pool = multiprocessing.Pool(processes=args.num_processes,
File "/home/work/lisa/miniconda3/lib/python3.11/multiprocessing/pool.py", line 873, in next
raise value
Exception: Giving up download of archive ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/8d/34/PMC4608653.tar.gz after 11 tries

Question regarding ROCOv2

I had some questions regarding ROCOv2.

  1. What is the difference between train_concepts_manual.csv and train_concept.csv
  2. ROCO had keywords but ROCOv2 doesn't. What is the best way I can generate keywords for ROCOv2.

FileNotFoundError

hi there, I ran fetch.py under the Unbuntu On WSL, got some problems, here is the traceback:

Configuration:
Subdirectory: images
Extraction directory: /tmp/roco-dataset
Keep archives: False
Delete contents of extraction directory: False
Number of processes: 12
Number of download retries: 10
Fetching ROCO dataset images...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "fetch.py", line 199, in process_group
    shutil.copy(image_filename, target_filename)
  File "/usr/lib/python3.8/shutil.py", line 415, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.8/shutil.py", line 261, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: 'data/test/radiology/images/ROCO_00176.jpg'

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "fetch.py", line 340, in <module>
    for i, pmc_id in enumerate(pool.imap_unordered(process_group,
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 865, in next
    raise value
FileNotFoundError: [Errno 2] No such file or directory: 'data/test/radiology/images/ROCO_00176.jpg'

I don't sure how's that happening? I also trying to run fetch.py in Windows, at beginning it works for a little while, some pics download in images indeed, but then thrown errors like:
zlib.error: Error -3 while decompressing data: invalid stored block lengths
module gzip has no attribute BadGzip etc.

could you give me some fixing ways, appreciated for it.

download failed

Hi, I tried the download code, but it always shows: retrying even with setting -n 1

(base) [yupei@login roco-dataset-master]$ python scripts/fetch.py
Configuration:
Subdirectory: images
Extraction directory: /tmp/roco-dataset
Keep archives: False
Delete contents of extraction directory: False
Number of processes: 64
Number of download retries: 10
Fetching ROCO dataset images...
Error: download failed, retrying
Trying to get new archive URL: https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?tool=roco-fetch&[email protected]&id=PMC3395713
Error: download failed, retrying
Trying to get new archive URL: https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?tool=roco-fetch&[email protected]&id=PMC3130474

Could you tell me why?
THX

filename 'PMC4889020/nihms790182f4.jpg' not found"

some pictures cannot be downloaded.
Error: failed to extract image PMC4889020/nihms790182f4.jpg from archive ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/da/a9/PMC4889020.tar.gz: "filename 'PMC4889020/nihms790182f4.jpg' not found" Image PMC4889020/nihms790182f4.jpg not found in archive ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_package/da/a9/PMC4889020.tar.gz, skipping

slow speed

Hi thank you so much for your work!

May I ask is it normal that the download speed is very slow? It takes about 10 seconds to download one image. The command I used was: python fetch.py -n 1
Because if the number of processes is 10, then many error would occurred.

Thanks

Download fails

Is the script still working ? Or something has changed ?
I tried with 1 process also and it failed after 10 attempts.

dataset issues

Does the image name of the downloaded data set match the name of the image corresponding to the caption? How can I solve it, or can I directly provide a link to the data set?Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.