genomiqueens / toulligqc Goto Github PK

View Code? Open in Web Editor NEW

73.0 73.0 4.0 79.35 MB

A post sequencing QC tool for Oxford Nanopore sequencers

License: Other

Python 11.54% Dockerfile 0.04% JavaScript 88.32% CSS 0.09%

toulligqc's People

Contributors

Stargazers

Watchers

Forkers

hunter-cameron charlesbioinf alihamraoui ruixiangliu

toulligqc's Issues

Can only merge Series or DataFrame objects, a <class 'NoneType'> was passed

Hi!

I managed to run the latest version of toulligQC (2.3) with default guppy basecalling:

toulligqc \
--report-name "$run_id" \
--telemetry-source ./fastq_hac_400/sequencing_telemetry.js \
--sequencing-summary-source ./fastq_hac_400/sequencing_summary.txt \
--html-report-path ./toolligqc/"$run_id"_tooligqc_"$run_id".html \
--data-report-path ./toolligqc/"$run_id"_tooligqc_"$run_id".data

However when I ran it with the demultiplexing files I got the following error:

toulligqc \
--force \
--report-name "$run_id" \
--barcoding \
--telemetry-source "$run_path"/fastq_hac_400/sequencing_telemetry.js \
--sequencing-summary-source "$run_path"/fastq_hac_400/sequencing_summary.txt \
--sequencing-summary-source "$run_path"/guppy_demultiplexed_pass/barcoding_summary_pass.txt \
--sequencing-summary-source "$run_path"/guppy_demultiplexed_fail/barcoding_summary_fail.txt \
--html-report-path "$run_path"/toolligqc/"$run_id"_tooligqc_"$run_id".2.html \
--data-report-path "$run_path"/toolligqc/"$run_id"_tooligqc_"$run_id".2.data \
--barcodes BC01,BC02,BC03,BC04,BC05,BC06,BC07,BC08,BC09,BC10,BC11,BC12,BC13,BC14,BC15,BC16,BC17,BC18,BC19,BC20,BC21,BC22,BC23,BC24

* Start Basecaller sequencing summary extractor
Traceback (most recent call last):
  File "/home/vincent.hahaut/anaconda3/bin/toulligqc", line 33, in <module>
    sys.exit(load_entry_point('toulligqc==2.3', 'console_scripts', 'toulligqc')())
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/toulligqc-2.3-py3.8.egg/toulligqc/toulligqc.py", line 347, in main
    extractor.init()
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/toulligqc-2.3-py3.8.egg/toulligqc/sequencing_summary_extractor.py", line 106, in init
    self.dataframe_1d = self._load_sequencing_summary_data()
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/toulligqc-2.3-py3.8.egg/toulligqc/sequencing_summary_extractor.py", line 408, in _load_sequencing_summary_data
    dataframes_merged = pd.merge(
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 74, in merge
    op = _MergeOperation(
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 598, in __init__
    _left = _validate_operand(left)
  File "/home/vincent.hahaut/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 2148, in _validate_operand
    raise TypeError(
TypeError: Can only merge Series or DataFrame objects, a <class 'NoneType'> was passed

I did not manage to debug it myself. The links to the barcoding summary files are correct:

head -n 2 "$run_path"/guppy_demultiplexed_pass/barcoding_summary_pass.txt

read_id	barcode_arrangement	barcode_full_arrangement	barcode_kit	barcode_variant	barcode_score	barcode_front_id	barcode_front_score	barcode_front_refseq	barcode_front_foundseq	barcode_front_foundseq_length	barcode_front_begin_index	barcode_rear_id	barcode_rear_score	barcode_rear_refseq	barcode_rear_foundseq	barcode_rear_foundseq_length	barcode_rear_end_index	barcode_front_total_trimmed	barcode_rear_total_trimmed	barcode_mid_front_id	barcode_mid_front_score	barcode_mid_front_end_index	barcode_mid_rear_id	barcode_mid_rear_score	barcode_mid_rear_end_index	adapter_front_id	adapter_front_score	adapter_front_foundseq_len	adapter_front_begin_index	adapter_rear_id	adapter_rear_score	adapter_rear_foundseq_len	adapter_rear_end_index	adapter_mid_id	adapter_mid_score	adapter_mid_end_index
fa6dee83-eb61-5169-bddf-8f6c19de6b53	barcode19	NB19_var1	NB	var1	46.3333	NB19_FWD	46.3333	AGGTTAAGTTCCTCGTGCAGTGTCAAGAGATCAGCACCT	AGGTAGTCTGTACATAATTCAGAGAGAGGACAT	33	37	NB19_REV	21.9167	GGTGCTGATCTCTTGACACTGCACGAGGAACTTAACCTTAGCAAT	TGCTGCCATTCGGCCAGTGAGTCTTCTCCCAAT	33	84	70	117	unclassified	0	-1	unclassified	0	-1	ADAPTER_LSK109_FWD	37.9286	20	73	ADAPTER_LSK110_REV	43.5556	13	114	unclassified	0	-1

head -n 2 "$run_path"/guppy_demultiplexed_fail/barcoding_summary_fail.txt

read_id	barcode_arrangement	barcode_full_arrangement	barcode_kit	barcode_variant	barcode_score	barcode_front_id	barcode_front_score	barcode_front_refseq	barcode_front_foundseq	barcode_front_foundseq_length	barcode_front_begin_index	barcode_rear_id	barcode_rear_score	barcode_rear_refseq	barcode_rear_foundseq	barcode_rear_foundseq_length	barcode_rear_end_index	barcode_front_total_trimmed	barcode_rear_total_trimmed	barcode_mid_front_id	barcode_mid_front_score	barcode_mid_front_end_index	barcode_mid_rear_id	barcode_mid_rear_score	barcode_mid_rear_end_index	adapter_front_id	adapter_front_score	adapter_front_foundseq_len	adapter_front_begin_index	adapter_rear_id	adapter_rear_score	adapter_rear_foundseq_len	adapter_rear_end_index	adapter_mid_id	adapter_mid_score	adapter_mid_end_index
82bb5bfb-5e34-565e-a6b4-6ad6947004b0	unclassified	NB12_var2	NB	var2	39.75	NB12_FWD	39.75	ATTGCTAAGGTTAATCCGATTCTGCTTCTTTCTACCTGCAGCACC	TTGCTACATAGACGGGTGTGCTCTTTTCACTGTTCAG	37	41	NB12_REV	16.25	AGGTGCTGCAGGTAGAAAGAAGCAGAATCGGATTAACCT	GGGCTAGGTTTAGCCCCATACTATGTTAGTTGATACC	37	39	0	0	unclassified	0	-1	unclassified	0	-1	ADAPTER_LSK109_FWD	37	22	126	ADAPTER_LSK109_REV	29.8261	12	5	unclassified	0	-1

Any idea why this is happening ?

Thank you in advance

docker run cannot open files

Hi,

I am having a problem running toulligQC. It can't seem to find the files it needs even though I am using absolute paths and I allow read/write permissions for everything. I am using docker since I couldn't get the standard install to work. I am on Ubuntu 16.04.

I pulled the image as in the instructions, and I also tried building it with docker build. I get the same error either way. I tried with my own data and with the test data that is downloaded.

If I modify the config file and run the test data:

sudo docker run genomicpariscentre/toulligqc -c /tools/toulligQC/test_data/config.txt -n /tools/toulligQC/test_data/091817_test -b

Traceback (most recent call last):
  File "/usr/local/bin/toulligqc", line 11, in <module>
    load_entry_point('toulligqc===-0.1a1-', 'console_scripts', 'toulligqc')()
  File "/usr/local/lib/python3.5/dist-packages/toulligqc-_0.1a1_-py3.5.egg/toulligqc/toulligqc.py", line 228, in main
    parse_args(config_dictionary)
  File "/usr/local/lib/python3.5/dist-packages/toulligqc-_0.1a1_-py3.5.egg/toulligqc/toulligqc.py", line 87, in parse_args
    config_dictionary.load(conf_file)
  File "/usr/local/lib/python3.5/dist-packages/toulligqc-_0.1a1_-py3.5.egg/toulligqc/toulligqc_conf.py", line 78, in load
    with open(conf_path, 'r') as config_file:
FileNotFoundError: [Errno 2] No such file or directory: '/tools/toulligQC/test_data/config.txt'

Running without the config file but with the individual files and the built docker image:

sudo docker run 8b035f9a408a -n test_individual_files -f /tools/toulligQC/test_data/dnacpc14_20170328_FNFAF04250_MN17734_mux_scan_1D_validation_test1_45344_ch282_read40_strand.fast5 -a /tools/toulligQC/test_data/sequencing_summary/sequencing_summary.txt -q /tools/toulligQC/test_data/fastq/20170328_FAF04250/20170328_FAF04250_barcode01.fastq -o /tools/toulligQC/test_data -s /tools/toulligQC/test_data/samplesheet.csv -b

Traceback (most recent call last):
  File "/usr/local/bin/toulligqc", line 11, in <module>
    load_entry_point('toulligqc===-0.1a1-', 'console_scripts', 'toulligqc')()
  File "/usr/local/lib/python3.5/dist-packages/toulligqc-_0.1a1_-py3.5.egg/toulligqc/toulligqc.py", line 237, in main
    barcode_selection = get_barcode(sample_sheet_file)
  File "/usr/local/lib/python3.5/dist-packages/toulligqc-_0.1a1_-py3.5.egg/toulligqc/toulligqc.py", line 174, in get_barcode
    with open(barcode_file) as csvfile:
FileNotFoundError: [Errno 2] No such file or directory: '/tools/toulligQC/test_data/samplesheet.csv

`
Any help would be great. Thanks!

Run toulligQC error

Ok, I tried our barcoded nanopore data, and toulligqc generated reports. So, for non bar coded data, what can I do because tougligqc requires sample sheet file?

It gives KeyError: 'sample_sheet_file'
Thanks!

George

I am trying to run toulligQC to generate albacore run reports. I tried different options, and always got errors. It looks like that toulligqc requires all the options even when the run does not have barcodes.
I ran following cammand:
toulligqc -n test_run
-f /storage/FILES/Oxford_Minion/Hudson_Sept8_FLO-MIN106_SQK-LSK108/Sept_R9.4_flowcells/testReport/0
-a /storage/FILES/Oxford_Minion/Hudson_Sept8_FLO-MIN106_SQK-LSK108/Sept_R9.4_flowcells/testReport/
-q /storage/FILES/Oxford_Minion/Hudson_Sept8_FLO-MIN106_SQK-LSK108/Sept_R9.4_flowcells/testReport/fastq
-o /storage/FILES/Oxford_Minion/Hudson_Sept8_FLO-MIN106_SQK-LSK108/Sept_R9.4_flowcells/testReport/report
and got following errors:
Traceback (most recent call last):
File "/home/apps/toulligQC/toulligQC-20170912/bin/toulligqc", line 11, in
load_entry_point('toulligqc===-0.1a1-', 'console_scripts', 'toulligqc')()
File "/home/apps/toulligQC/toulligQC-20170912/lib/python3.6/site-packages/toulligqc-0.1a1-py3.6.egg/toulligqc/toulligqc.py", line 229, in main
check_conf(config_dictionary)
File "/home/apps/toulligQC/toulligQC-20170912/lib/python3.6/site-packages/toulligqc-0.1a1-py3.6.egg/toulligqc/toulligqc.py", line 142, in check_conf
if not config_dictionary['sample_sheet_file']:
File "/home/apps/toulligQC/toulligQC-20170912/lib/python3.6/site-packages/toulligqc-0.1a1-py3.6.egg/toulligqc/toulligqc_conf.py", line 46, in getitem
return self._config_dictionary[item]
KeyError: 'sample_sheet_file'

I am wondering if you could show me how to run toulligqc with following output from an Albacore run. The pass directory has 0, 1, 2 sub directories holding fast5 files. Fastq files are also in the pass directory.
Thanks!

George

albacoreResults
----- configuration.cfg
----- pipeline.log
----- sequencing_summary.txt
----- workspace
------ fail
------ pass
--------- 0
--------- 1
--------- 2
--------- fastq_runid_31ee3c191ce764f50a56b1bbee67326bf4c6d40e_0.fastq
--------- fastq_runid_31ee3c191ce764f50a56b1bbee67326bf4c6d40e_1.fastq
--------- fastq_runid_77317b2780c17bc6d729717e91f58ff31823c8a8_2.fastq

toulligqc for pod5 format

Hi!
thank you for providing this tool. I have new projects that are stored in pod5. The tool seems to not be compatible with this format.
Is there any wayaround to this?
Thank you in advance.
Kind regards,
Oscar

unable to open fast5 files

I'm running toulligqc on very recent fast5 files and get the following error:

ToulligQC version 1.1
* Initialize extractors
* Start Sequencing telemetry extractor
* End of Sequencing telemetry extractor (done in 00:00:00)
* Start Fast5 extractor
Traceback (most recent call last):
  File "/home/jroels/miniconda3/envs/toulligqc/bin/toulligqc", line 10, in <module>
    sys.exit(main())
  File "/home/jroels/miniconda3/envs/toulligqc/lib/python3.6/site-packages/toulligqc/toulligqc.py", line 354, in main
    extractor.extract(result_dict)
  File "/home/jroels/miniconda3/envs/toulligqc/lib/python3.6/site-packages/toulligqc/fast5_extractor.py", line 115, in extract
    result_dict['sequencing.telemetry.extractor.flowcell.id'] = self._get_fast5_items(h5py_file, 'flow_cell_id')
  File "/home/jroels/miniconda3/envs/toulligqc/lib/python3.6/site-packages/toulligqc/fast5_extractor.py", line 230, in _get_fast5_items
    tracking_id_items = list(h5py_file["/UniqueGlobalKey/tracking_id"].attrs.items())
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/jroels/miniconda3/envs/toulligqc/lib/python3.6/site-packages/h5py/_hl/group.py", line 167, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: 'Unable to open object (component not found)'

I don't find the cause of the problem, anyone who can help?

Numpy 1.24 compatibility

In the event anyone tries to run this with numpy 1.24, you just need to change the numpy type casting for booleans in line 355 of sequencing_summary_extractor.py

from

    sequencing_summary_datatypes = {
        'channel': np.int16,
        'start_time': np.float64,
        'passes_filtering': np.bool, <--------------------------------------------
        'sequence_length_template': np.uint32,
        'mean_qscore_template': np.float32,
        'duration': np.float32}

    sequencing_summary_datatypes = {
        'channel': np.int16,
        'start_time': np.float64,
        'passes_filtering': np.bool_, <--------------------------------------------
        'sequence_length_template': np.uint32,
        'mean_qscore_template': np.float32,
        'duration': np.float32}

then it seems to work as expected. cheers

toulligqc for duplex

hi
I am trying to create a report for a duplex analysis (guppy_basecaller_duplex) and and I got this error message.

toulligqc --report-name QC_duplex \                                                                          
         --barcoding \
         --telemetry-source duplex/sequencing_telemetry.js \
         --sequencing-summary-source duplex/sequencing_summary.txt \
         --html-report-path duplex/QC_duplex.html \
         --barcodes barcode01
duplex/QC_duplex.html
ToulligQC version 2.2.1
* Initialize extractors
* Start Toulligqc info extractor
* End of Toulligqc info extractor (done in 0m0.00s)
* Start Sequencing telemetry extractor
* End of Sequencing telemetry extractor (done in 0m0.00s)
* Start Basecaller sequencing summary extractor
  - Load sequencing summary file (0.04 MB used) in 0m0.06s
Traceback (most recent call last):
  File "/home/minion/miniconda3/envs/nano/bin/toulligqc", line 10, in <module>
    sys.exit(main())
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/toulligqc.py", line 343, in main
    extractor.extract(result_dict)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/sequencing_summary_extractor.py", line 234, in extract
    extract_barcode_info(self, result_dict,
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/sequencing_summary_common.py", line 144, in extract_barcode_info
    dataframe_dict["read.fail.barcoded"] = _barcode_frequency(extractor, barcode_selection, result_dict,
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/toulligqc/sequencing_summary_common.py", line 273, in _barcode_frequency
    set_result_value(extractor, result_dict, entry + '.count', sum(count_sorted.drop("unclassified")))
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/series.py", line 4771, in drop
    return super().drop(
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/generic.py", line 4267, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/generic.py", line 4311, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/home/minion/miniconda3/envs/nano/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6644, in drop
    raise KeyError(f"{list(labels[mask])} not found in axis")
KeyError: "['unclassified'] not found in axis"

I can send you the two guppy basecaller output files if necessary.

ps. I also have a request. Would it be possible to specify a barcode range for the --barcodes argument. I use a lot of barcode and the command quickly becomes very long. --barcodes barcode01,barcode02, ... barcode48

Make available via bioconda?

Hi there,
Thanks for a very interesting software package. I was wondering if you're still actively developing this whether you would consider making this available via bioconda?

Thanks,
Miika

ToulligQC 2.2 not including graphs in html output file

Hi,

after installation of the latest version of ToulligQC (from PyPi) images are not included in the HTML report.

The command:

$ toulligqc -a guppy_out_5.0.11/sequencing_summary.txt -o test.html 
test.html
ToulligQC version 2.2
* Initialize extractors
* Start Toulligqc info extractor
* End of Toulligqc info extractor (done in 0m0.00s)
* Start Basecaller sequencing summary extractor
  - Load sequencing summary file (54.05 MB used) in 0m3.21s
  - Extract info from sequencing summary file in 0m10.38s
  - Creation of image "Read count histogram" in 0m0.28s
  - Creation of image "Distribution of read lengths" in 0m4.99s
  - Creation of image "Yield plot through time" in 0m2.62s
  - Creation of image "PHRED score distribution" in 0m4.65s
  - Creation of image "PHRED score density distribution" in 0m1.03s
  - Creation of image "Channel occupancy of the flowcell" in 0m1.11s
  - Creation of image "Correlation between read length and PHRED score" in 0m1.24s
  - Creation of image "Read length over time" in 0m3.15s
  - Creation of image "PHRED score over time" in 0m3.60s
  - Creation of image "Translocation speed" in 0m3.69s
* End of Basecaller sequencing summary extractor (done in 0m39.96s)
* Write HTML report
* Write statistics files
* End of the QC extractor (done in 0m39.98s)

produces an HTML file with only Run statistics and Device and software parts.

When run with the --images-directory option, HTML files with graphs are correctly produced in the specified directory but are not shown in the HTML report, so there is no problem with graphs production.

When running the previous version (2.1.1) on the same input everything is correct.

Do you have any solution?

System: Ubuntu 20.04, kernel 5.4.0-47
Python: 3.8.5
plotly: 5.5.0
matplotlib: 3.4.3
numpy: 1.19.2

toulligqc not processing directory of fast5 files...

I'm trying to run toulilgqc to generate statistics for a run but I can't persuade it to read a directory of fast5 files, let along the standard sub-directory heirarchy produced by albacore. When I specify a directory containing all the fast5 files from a run toulligqc fails with:

ToulligQC version 0.5
* Initialize extractors
fast5_directory
* Start FAST5 extractor
Traceback (most recent call last):
  File "/cluster/gjb_lab/nschurch/cluster_installs/miniconda2/envs/toulligQC/bin/toulligqc", line 11, in <module>
    sys.exit(main())
  File "/cluster/gjb_lab/nschurch/cluster_installs/miniconda2/envs/toulligQC/lib/python3.6/site-packages/toulligqc/toulligqc.py", line 252, in main
    extractor.extract(result_dict)
  File "/cluster/gjb_lab/nschurch/cluster_installs/miniconda2/envs/toulligQC/lib/python3.6/site-packages/toulligqc/fast5_extractor.py", line 87, in extract
    result_dict['flow_cell_id'] = self._get_fast5_items(h5py_file,'flow_cell_id')
  File "/cluster/gjb_lab/nschurch/cluster_installs/miniconda2/envs/toulligQC/lib/python3.6/site-packages/toulligqc/fast5_extractor.py", line 192, in _get_fast5_items
    tracking_id_items = list(h5py_file["/UniqueGlobalKey/tracking_id"].attrs.items())
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/cluster/gjb_lab/nschurch/cluster_installs/miniconda2/envs/toulligQC/lib/python3.6/site-packages/h5py/_hl/group.py", line 167, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: 'Unable to open object (component not found)'

Modifying the extractor python script int he _read_fast5(self): method to see what is going on reveals that the directory extension is being set correctly printing the glob lines befow reveals:

elif self.fast5_file_extension == 'fast5_directory':
            if glob.glob(self.fast5_source+self.run_name+'/*.fast5'):
                self.fast5_file = self.fast5_source+self.run_name+'.fast5'

glob.glob: datadir/allfast5/*.fast5
self.fast5_file: datadir/allfast5.fast5

Where allfast5 is the run name, datadir is the input path specified with --fast5-source, and datadir/allfast5 contains all the *.fast5 files.

Should the extractor be looping over all the fast5 files in the glob?

Barcodes in dorado output

Hej,

Thanks for this really pretty tool! I encountered a small issue when using it with dorado's output:

a) When demultiplexing with dorado demux, toulligQC (v2.6) does not produce the barcode plots. This is due to the sequencing_summary_extractor.py which recognized the column "barcode_arrangement" but dorado's column name is only "barcode".
I used dorado v0.5.1 and provided the barcodes via sample-sheet and demultiplexing kit (dorado basecaller ${basecallmodel} pod5/ --sample-sheet ${sampleSheet} $demuxKit ). When I changed my sequencing_summary.txt column to "barcode_arrangement", toulligQC produced the plots properly.

b) Is it possible to skip/modify the barcode-name check in toulligqc.py (ctrl+F: "Get barcode selection")? I'm have our lab's (integer) sampleID's as barcode names in the sequeuncing_summary and would have to add "BC" or "barcode" as prefix such that I don't et the error "ERROR: No known barcode found in provided list of barcodes". In dorado, you can specify now your own custom barcode sets as well with different names. That would be cool to be changed in toulligQC!

Thanks a ton and have a great week!
Philipp

error on running toulligQC

When running toulligQC, i'm getting the following error:
ImportError: C extension: umpy.core.multiarray failed to import not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.
and running setup.py with --inplace --force does not solve it

Questions about usage and output

Hi,
first of all: Thank you for this tool!

I have some question related the output and the usage of ToulligQC.
I'm running version 2.2

at the end of the Readme.md you mentioned that the output should look like this:
RUN_ID
├── report.html
├── report.data
└── images
└── plots.html
└── plot.png

For me it looks like this after this program call:

toulligqc --report-name <ReportName> \
--telemetry-source /Path/To/sequencing_telemetry.js \
--sequencing-summary-source /PAth/To/sequencing_summary.txt \
--output-directory /Path/To/OutputDirectory/

RUN_ID
├── report.html
├── report.data
└── images
└── plots.html

Do I miss a option or do I missunderstand your outputgraph?

The second thing I have to ask, is if the Kit "SQK-PCB109" is not (yet) included into ToulligQC? I have tried it with following command:

toulligqc --report-name <ReportName> \
--telemetry-source /Path/To/sequencing_telemetry.js \
--sequencing-summary-source /Path/To/sequencing_summary.txt \
--sequencing-summary-source /Path/To/barcoding_summary_pass.txt \
--sequencing-summary-source /Path/To/barcoding_summary_fail.txt \
--barcodes BP01,BP02,BP03,BP04,BP05,BP06,BP07,BP08,BP09,BP10,BP11,BP12 \
--output-directory /Path/To/OutputDirectory/

and got ERROR: No known barcode found in provided list of barcodes

Thank you in advance.

Pandas library and Errors

Good afternoon,
I'm setting this issue 'cause I have two running problems in two different computers:
The first one is:

Traceback (most recent call last):
  File "/usr/local/bin/toulligqc", line 11, in <module>
    load_entry_point('toulligqc==0.5', 'console_scripts', 'toulligqc')()
  File "/usr/local/lib/python3.5/dist-packages/toulligqc/toulligqc.py", line 212, in main
    check_conf(config_dictionary)
  File "/usr/local/lib/python3.5/dist-packages/toulligqc/toulligqc.py", line 137, in check_conf
    config_dictionary['result_directory'] = config_dictionary['result_directory'] + '/' + config_dictionary['run_name'] + '/'
TypeError: Can't convert 'NoneType' object to str implicitly

And the second is this one:

RuntimeError: module compiled against API version 0xb but this version of numpy is 0xa
Traceback (most recent call last):
  File "/home/bio/miniconda3/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/__init__.py", line 25, in <module>
    from pandas import hashtable, tslib, lib
ImportError: numpy.core.multiarray failed to import

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/bio/miniconda3/bin/toulligqc", line 11, in <module>
    load_entry_point('toulligqc===-0.1a1-', 'console_scripts', 'toulligqc')()
  File "/home/bio/miniconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/__init__.py", line 565, in load_entry_point
  File "/home/bio/miniconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/__init__.py", line 2598, in load_entry_point
  File "/home/bio/miniconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/__init__.py", line 2258, in load
  File "/home/bio/miniconda3/lib/python3.6/site-packages/setuptools-27.2.0-py3.6.egg/pkg_resources/__init__.py", line 2264, in resolve
  File "/home/bio/miniconda3/lib/python3.6/site-packages/toulligqc-_0.1a1_-py3.6.egg/toulligqc/toulligqc.py", line 36, in <module>
    from toulligqc import fastq_extractor
  File "/home/bio/miniconda3/lib/python3.6/site-packages/toulligqc-_0.1a1_-py3.6.egg/toulligqc/fastq_extractor.py", line 28, in <module>
    import pandas as pd
  File "/home/bio/miniconda3/lib/python3.6/site-packages/pandas-0.19.2-py3.6-linux-x86_64.egg/pandas/__init__.py", line 31, in <module>
    "the C extensions first.".format(module))
ImportError: C extension: umpy.core.multiarray failed to import not built. If you want to import pandas from the source directory, you may need to run 'python setup.py build_ext --inplace --force' to build the C extensions first.

Trying to solve it, I've updated all the libraries, updated python, I removed and reinstalled the libraries and those problems still occur.
Do you know about what is happening? Is it something from the command line? or the python version? or is something from the libraries?
Honestly, I don't know which is the problem?
Could you help me please,

Thanks in advance.
Luis Alfonso

extractor.init errer

I got this error with the last version of toulligqc (2.5)

folder=ONV-15-2023_Detection
toulligqc --report-name $folder --barcoding \
         --telemetry-source $folder/fastq/sequencing_telemetry.js \
         --sequencing-summary-source $folder/fastq/sequencing_summary.txt \
         --html-report-path $folder/QC_report.html \
         --barcodes barcode01:barcode24
ONV-15-2023_Detection/QC_report.html
ToulligQC version 2.5
* Initialize extractors
* Start Toulligqc info extractor
* End of Toulligqc info extractor (done in 0m0.00s)
* Start Sequencing telemetry extractor
* End of Sequencing telemetry extractor (done in 0m0.00s)
* Start Basecaller sequencing summary extractor
Traceback (most recent call last):
  File "/home/joel/miniforge3/envs/nano/bin/toulligqc", line 10, in <module>
    sys.exit(main())
  File "/home/joel/miniforge3/envs/nano/lib/python3.10/site-packages/toulligqc/toulligqc.py", line 388, in main
    extractor.init()
  File "/home/joel/miniforge3/envs/nano/lib/python3.10/site-packages/toulligqc/sequencing_summary_extractor.py", line 117, in init
    self.dataframe_1d['barcode_arrangement'].cat.add_categories([0, 'other barcodes', 'passes_filtering'],
  File "/home/joel/miniforge3/envs/nano/lib/python3.10/site-packages/pandas/core/accessor.py", line 112, in f
    return self._delegate_method(name, *args, **kwargs)
  File "/home/joel/miniforge3/envs/nano/lib/python3.10/site-packages/pandas/core/arrays/categorical.py", line 2893, in _delegate_method
    res = method(*args, **kwargs)
TypeError: Categorical.add_categories() got an unexpected keyword argument 'inplace'

I will soon attach a link with the telemetry files (they are a bit big).

Issue with toulligqc_demo_data

Hi GenomiqueENS/toulligQC team.

I tried to run the provided sample data with toulligQC but a get the following error:
❯ toulligqc
--report-name toulligqc_demo_aata
--barcoding
--barcodes BC01,BC02,BC03,BC04,BC05,B0C7
--telemetry-source sequencing_telemetry.js
--sequencing-summary-source sequencing_summary.txt
--sequencing-summary-source barcoding_summary_pass.txt
--sequencing-summary-source barcoding_summary_fail.txt
--output-directory output
output/toulligqc_demo_aata/report.html
ToulligQC version 2.2.1

Initialize extractors
Start Toulligqc info extractor
End of Toulligqc info extractor (done in 0m0.00s)
Start Sequencing telemetry extractor
End of Sequencing telemetry extractor (done in 0m0.05s)
Start Basecaller sequencing summary extractor
Traceback (most recent call last):
File "/home/glustein/.local/bin/toulligqc", line 8, in
sys.exit(main())
File "/home/glustein/.local/lib/python3.10/site-packages/toulligqc/toulligqc.py", line 342, in main
extractor.init()
File "/home/glustein/.local/lib/python3.10/site-packages/toulligqc/sequencing_summary_extractor.py", line 116, in init
self.dataframe_1d['barcode_arrangement'].cat.add_categories([0, 'other barcodes', 'passes_filtering'],
File "/home/glustein/.local/lib/python3.10/site-packages/pandas/core/generic.py", line 5575, in getattr
return object.getattribute(self, name)
File "/home/glustein/.local/lib/python3.10/site-packages/pandas/core/accessor.py", line 182, in get
accessor_obj = self._accessor(obj)
File "/home/glustein/.local/lib/python3.10/site-packages/pandas/core/arrays/categorical.py", line 2717, in init
self._validate(data)
File "/home/glustein/.local/lib/python3.10/site-packages/pandas/core/arrays/categorical.py", line 2726, in _validate
raise AttributeError("Can only use .cat accessor with a 'category' dtype")
AttributeError: Can only use .cat accessor with a 'category' dtype. Did you mean: 'at'?

I don´t understand what is happening.
Could you help me to solve this trouble?

Thanks

how to extract a fast5 file from Npore run data to use as sequencing_telemetry file substitute

Hi, A colleague has shown me reports generated with toulligQC that would be great for me. However I am just a wet lab person! The output from the Nanopore Mk1c does not include a 'sequencing_telemetry' file. I see from the Github page that a fast5 file would be a fine substitute. Assuming this is true and the fastest way for me to get a report is to add a fast5 file, how do i isolate a fast5 file from the package of 4000 that is generated by the minion? (i think the default size of fast5 folders is 4000 reads).

Thanks,
Nick H.

issue for invalid file name

Hi,
I'm running toulligc in my Ubuntu system, I have installed it as a python package with pip3 and using an environment.
When I run it I have the following problem:
`/home/grid/programas/ToulligQC/bin/toulligqc --report-name test --fast5-source /data/test/20220404_1721_X5_FAS58594_1dd0a346/fast5_pass/ --sequencing-summary-source /data/test/20220404_1721_X5_FAS58594_1dd0a346/sequencing_summary_FAS58594_6132143a.txt --barcoding -l BC01,BC04,BC05,BC06,BC07,BC08,BC09,BC10,BC11,BC12 --html-report-path test.QC.html
test.QC.html
ToulligQC version 2.2.2

Initialize extractors
Start Toulligqc info extractor
End of Toulligqc info extractor (done in 0m0.00s)
Start Fast5 extractor
Traceback (most recent call last):
File "/home/grid/programas/ToulligQC/bin/toulligqc", line 8, in
sys.exit(main())
File "/home/grid/programas/ToulligQC/lib/python3.8/site-packages/toulligqc/toulligqc.py", line 348, in main
extractor.extract(result_dict)
File "/home/grid/programas/ToulligQC/lib/python3.8/site-packages/toulligqc/fast5_extractor.py", line 111, in extract
h5py_file = self._read_fast5()
File "/home/grid/programas/ToulligQC/lib/python3.8/site-packages/toulligqc/fast5_extractor.py", line 216, in _read_fast5
h5py_file = h5py.File(self.fast5_file)
File "/home/grid/programas/ToulligQC/lib/python3.8/site-packages/h5py/_hl/files.py", line 533, in init
fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
File "/home/grid/programas/ToulligQC/lib/python3.8/site-packages/h5py/_hl/files.py", line 226, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 106, in h5py.h5f.open
ValueError: Invalid file name (invalid file name)
`

Could you please help with me this?
Thank you!

Multiple summary and telemetry files

It seems toulligQC will not work with multiple summary or telemetry files, like those given when basecalled using guppy_supervisor. I tried to concatenate the files it gives the following error for telemetry

json.decoder.JSONDecodeError: Extra data: line 252161 column 2 (char 5928761)

any help appreciated
Mustafa