illumina / interop Goto Github PK

View Code? Open in Web Editor NEW

75.0 22.0 26.0 23.42 MB

C++ Library to parse Illumina InterOp files

Home Page: http://illumina.github.io/interop/index.html

License: GNU General Public License v3.0

CMake 4.93% C++ 78.01% C 5.20% C# 4.43% Python 2.91% Shell 1.48% Batchfile 0.55% PowerShell 0.16% SWIG 2.33%

interop swig illumina-sequencing cpp csharp python python3 python27 python35 python36

interop's People

Contributors

Stargazers

Watchers

interop's Issues

Function percent_phasing() in interop2csv

hello,

I use script interop2csv to obtain some metrics about run, but i think have a problem in function percent_phasing which is use in interop2csv. For read 3 (read 1 = first read read2 = index read read3= second read) the percent_phasing is always 0. For read 1 the percent _phasing equals to percent_phasing for read 3 observed in Illumina Sequencing Analysis Viewer. Is it possible that exists a problem in this function ?

Romain

Refactor metric sets to support run_metrics

The metric set code requires a refactor to properly support the envisioned run_metrics. run_metrics will hold all the metric sets and XML information required for the eventual summary logic.

calcul strandard deviation

I need to calcul standard deviation in many function. How to calcul standard deviation for clusterDensity () , percent_aligned () etc .. ?

Add summary logic

This logic will calculated all the metrics in the SAV summary tab.

Problems installing python module

I tried following the instructions for python installation:
On the https://github.com/Illumina/interop/releases/tag/v1.0.18 page the pip installation is described as follows:
pip install -f https://github.com/Illumina/interop/releases/latest interop
The output on my ubuntu 16.04 machine is:

Could not find a version that satisfies the requirement interop (from versions: linux_gcc46_release, mingw_win64_Release, msvc14_win64_Release, osx_clang_release)
No matching distribution found for interop

On the http://illumina.github.io/interop/python_binding.html page it says:
pip install -f https://ussd.artifactory.illumina.com/list/generic-bioinformatics/interop/pypi/ interop
but my proxy server seems to have problems with the SSL certificate

I proceeded to donwload the wheel package:
interop-1.0.18-cp27-cp27mu-linux_x86_64.whl
Trying to pip install it I get:
interop-1.0.18-cp27-cp27mu-linux_x86_64.whl is not a supported wheel on this platform.

My platform:
Linux x86_64 x86_64 GNU/Linux

Would you have any suggestions on how to proceed?

Calculate the % read >= Q30

How to Calculate the % read >= Q30

Referenced from: #22
@rglandie

Add summary model

This model will hold all the metrics reported in the SAV summary tab

summary - RunParameters.xml required for legacy run folders and is missing

Hey there
Thanks a bunch for adding summary!

atm I get this error on some recent HS2500 and HSX runfolders

RunParameters.xml required for legacy run folders and is missing

Our runfolders have a runParameters.xml
When I rename this to RunParameters.xml
summary runs fine

It seems all of our runfolders have the lower case r for this file

Thanks again for library!

No NuGET packages available for Windows C# users

While looking for binary packages targeted for Windows, it was not possible to find NuGet packages and Visual C zip files don't provide shared libraries callable from C# (*.dll).

clusterDensity value

Hi,

I try to calculate clusterDensity value . SAV return a value of 2355 (K / mm ) and the function clusterDensity : 2.35491x10 ^ 6 .
In function definition we have:
Density of clusters For Each tile (in Thousands per mm2). So if the return value is in Thousands per mm² we have a value of density equal to 2354.91x10 ^ 3 k / mm². It's a big difference . The return value of this function is really in Thousands per mm² ?

MiSeq QScores / NextSeq current Cycle

Thank you for putting together this software its great!

I am trying to build a watcher for runs in process using your summary script. I noticed a few things:

MiSeq - the QScores are 100% which obviously isn't reflected in the data.
NextSeq & MiSeq - the clusterPF gets written for both reads as soon as Read1 data is availabe. Is this normal?

I use the Recipe/RunSate.xml on the MiSeq to extract the current step and cycle. I've been told by Illumina that that information is available in one of the InterOp files. It would be great if your summary script to access that information and display.

Thank you!

NextSeq dumptext error

dumptext seems to work for the example MiSeq data all the HiSeq data I have tried to use it against.

However, with NextSeq InterOp files I am experiencing problems. This is with the 1.0.15 release (thanks so much for that!)

The problem seems to happen at the start of the #Extraction section

# Extraction,1
Header and channel names count mismatch
/home/travis/build/Illumina/interop/src/interop/model/metrics/extraction_metric.cpp::write_header (190)

I was able to replicate this error with a public dataset "NextSeq 500 v2: Nextera Rapid Capture Exome (CEPH 9plex) - H3GJKBGX" Which you can find the InterOp files here https://figshare.com/articles/NextSeq_500_InterOp_example_data/4564000

Discrepancy on error metric's standard deviation.

Hi,
It seems when I computed standard deviation with (n-1) version of formula and last cycle removed, I get a different value for error metric on all lanes and reads. Even though the mean is the same. And the standard deviation of all other metrics matches what's printed out in the summary tab when computed by hand (e.g. via excel).

stdev.xlsx
If you look at this xlsx file, you 'd see that the mean is 0.14, stdev is 0.404

However, using the run folder https://basespace.illumina.com/run/14858869/HiSeq-2500-v2-TruSeq-Exome-9-replicates-of-NA12878 from basespace, I have different standard deviations on all values when it's printed out with summary binary
summary.txt

So my question is this: is there any difference between the way of computing standard deviation of error metrics than computation of stdev of any other metrics in API? I can't seem to find how they differ at a glance.

Thanks!

MiSeq - the QScores are 100% which obviously isn't reflected in the data.

Creating another issue to take this item originally reported in: #52

Reported by @markliexo

read_index not defined error

Hi,

I followed all the steps for run metrics and was able to reproduce the yield pandas dataframe. However when I tried to get read metrics using the command
summary.at(read_index).summary().yield_g() it gave me read_index not defined error. In the code it was not defined above. Should I follow the same steps as run_metrics?

interop/io/format/metric_format.h::read_header (87)

I have my runs mounted via BaseMount and I'm in the "Files" directory of one of my runs. When I try to run summary or index-summary, I get this error:

index-summary .

Version: v1.0.8-10-g34a5cd6

Record size does not match layout size, record size: 206 != layout size: 6 for Q v6
interop/./interop/io/format/metric_format.h::read_header (87)

summary .

Version: v1.0.8-10-g34a5cd6

Record size does not match layout size, record size: 206 != layout size: 6 for Q v6
interop/./interop/io/format/metric_format.h::read_header (87)

Inconsistent values for Q30 between two different views.

Hi,
Sometimes Q30 values printed from collapsed metrics can differ significantly from the summary statistics:
If you look at the Q30 value of:
https://basespace.illumina.com/run/14858869/HiSeq-2500-v2-TruSeq-Exome-9-replicates-of-NA12878

On Lane 2: the collapsed q-metrics (.cpp, dumptext binary) reported a percentage of 89.4% after division. However, the summary statistics reported it as 96.36%. (summary binary)

Currently, this issue can also be seen in either the basespace page and the sequence analysis viewer. If you use SAV analysis tab and look at read 2 for all (lane, surface, cycle), you'd see (Q>=30)=89.4%. However if you look at Summary tab: Level Read 2 (I) is reported as having (%>=Q30)=96.36%. Are there any reasons these two values are different?

Thanks!

Add run metrics

This class will encapsulate all the data parsed from the run folder.

Update cycle state for other metrics

Currently, only the error rated cycle state is updated. The rest should be updated too.

dumpbin error - unhelpful error message

I am trying to run dumpbin to covert bin files to text but return the following error:

src/apps/dumpbin  /path/to/runfolder
# Version: v1.1.2
terminate called after throwing an instance of 'std::length_error'
  what():  vector::_M_fill_insert
Aborted (core dumped)

I have just compiled the interop from the source, so it can be installation error, but I have no way to know based on the error message

interop2csv sometimes reports "nan" values for the Aligned column in the TileMetricsOut.bin section

Hey there,

I know that interop2csv is no longer supported, but I wanted to file an issue that I ran into.

The newest release 1.0.16 sometimes outputs nan values for the Aligned column in the #TileMetricsOut.bin section. It looks like this only happens when the value was reported as 0 in the 1.0.6 release

Here is the data used hiseq.zip

This is how to reproduce the issue.

Generate the output from the 1.0.6 version (the last version we used):

./1.0.6/bin/interop2csv hiseq > 1.0.6.csv

Generate the output from the most recent release 1.0.16 :

./1.0.16/bin/interop2csv hiseq > 1.0.16.csv

diff them like so diff -y --suppress-common-lines 1.0.6.csv 1.0.16.csv

# Tile							      |	# TileMetricsOut.bin
# Tile							      |	# TileMetricsOut.bin
1,1106,2,0,0,0						      |	1,1106,2,nan,0,0
1,1101,2,0,0,0						      |	1,1101,2,nan,0,0
1,1102,2,0,0,0						      |	1,1102,2,nan,0,0
1,1105,2,0,0,0						      |	1,1105,2,nan,0,0
1,1108,2,0,0,0						      |	1,1108,2,nan,0,0
1,1103,2,0,0,0						      |	1,1103,2,nan,0,0
1,1107,2,0,0,0						      |	1,1107,2,nan,0,0
1,1104,2,0,0,0						      |	1,1104,2,nan,0,0
1,1114,2,0,0,0						      |	1,1114,2,nan,0,0
1,1110,2,0,0,0						      |	1,1110,2,nan,0,0
1,1109,2,0,0,0						      |	1,1109,2,nan,0,0
1,1111,2,0,0,0						      |	1,1111,2,nan,0,0
1,1113,2,0,0,0						      |	1,1113,2,nan,0,0
1,1112,2,0,0,0						      |	1,1112,2,nan,0,0
1,1116,2,0,0,0						      |	1,1116,2,nan,0,0
1,1206,2,0,0,0						      |	1,1206,2,nan,0,0
1,1115,2,0,0,0						      |	1,1115,2,nan,0,0
1,1202,2,0,0,0						      |	1,1202,2,nan,0,0
1,1201,2,0,0,0						      |	1,1201,2,nan,0,0
1,1203,2,0,0,0						      |	1,1203,2,nan,0,0
1,1208,2,0,0,0						      |	1,1208,2,nan,0,0
1,1205,2,0,0,0						      |	1,1205,2,nan,0,0
1,1204,2,0,0,0						      |	1,1204,2,nan,0,0
1,1207,2,0,0,0						      |	1,1207,2,nan,0,0
1,1214,2,0,0,0						      |	1,1214,2,nan,0,0
1,1210,2,0,0,0						      |	1,1210,2,nan,0,0
1,1209,2,0,0,0						      |	1,1209,2,nan,0,0
1,1211,2,0,0,0						      |	1,1211,2,nan,0,0
1,1216,2,0,0,0						      |	1,1216,2,nan,0,0
1,1213,2,0,0,0						      |	1,1213,2,nan,0,0
1,1212,2,0,0,0						      |	1,1212,2,nan,0,0
1,2106,2,0,0,0						      |	1,2106,2,nan,0,0
1,1215,2,0,0,0						      |	1,1215,2,nan,0,0
1,2102,2,0,0,0						      |	1,2102,2,nan,0,0
1,2101,2,0,0,0						      |	1,2101,2,nan,0,0
1,2108,2,0,0,0						      |	1,2108,2,nan,0,0
1,2103,2,0,0,0						      |	1,2103,2,nan,0,0
1,2104,2,0,0,0						      |	1,2104,2,nan,0,0
1,2105,2,0,0,0						      |	1,2105,2,nan,0,0
1,2109,2,0,0,0						      |	1,2109,2,nan,0,0
1,2110,2,0,0,0						      |	1,2110,2,nan,0,0
1,2107,2,0,0,0						      |	1,2107,2,nan,0,0
1,2114,2,0,0,0						      |	1,2114,2,nan,0,0
1,2116,2,0,0,0						      |	1,2116,2,nan,0,0
1,2111,2,0,0,0						      |	1,2111,2,nan,0,0
1,2112,2,0,0,0						      |	1,2112,2,nan,0,0
1,2113,2,0,0,0						      |	1,2113,2,nan,0,0
1,2201,2,0,0,0						      |	1,2201,2,nan,0,0
1,2206,2,0,0,0						      |	1,2206,2,nan,0,0
1,2202,2,0,0,0						      |	1,2202,2,nan,0,0
1,2203,2,0,0,0						      |	1,2203,2,nan,0,0
1,2115,2,0,0,0						      |	1,2115,2,nan,0,0
1,2208,2,0,0,0						      |	1,2208,2,nan,0,0
1,2205,2,0,0,0						      |	1,2205,2,nan,0,0
1,2204,2,0,0,0						      |	1,2204,2,nan,0,0
1,2209,2,0,0,0						      |	1,2209,2,nan,0,0
1,2210,2,0,0,0						      |	1,2210,2,nan,0,0
1,2214,2,0,0,0						      |	1,2214,2,nan,0,0
1,2211,2,0,0,0						      |	1,2211,2,nan,0,0
1,2207,2,0,0,0						      |	1,2207,2,nan,0,0
1,2216,2,0,0,0						      |	1,2216,2,nan,0,0
1,2213,2,0,0,0						      |	1,2213,2,nan,0,0
1,2212,2,0,0,0						      |	1,2212,2,nan,0,0
2,1101,2,0,0,0						      |	2,1101,2,nan,0,0
2,1102,2,0,0,0						      |	2,1102,2,nan,0,0
2,1103,2,0,0,0						      |	2,1103,2,nan,0,0
2,1106,2,0,0,0						      |	2,1106,2,nan,0,0
1,2215,2,0,0,0						      |	1,2215,2,nan,0,0
2,1108,2,0,0,0						      |	2,1108,2,nan,0,0
2,1105,2,0,0,0						      |	2,1105,2,nan,0,0
2,1104,2,0,0,0						      |	2,1104,2,nan,0,0
2,1109,2,0,0,0						      |	2,1109,2,nan,0,0
2,1110,2,0,0,0						      |	2,1110,2,nan,0,0
2,1111,2,0,0,0						      |	2,1111,2,nan,0,0
2,1114,2,0,0,0						      |	2,1114,2,nan,0,0
2,1116,2,0,0,0						      |	2,1116,2,nan,0,0
2,1107,2,0,0,0						      |	2,1107,2,nan,0,0
2,1113,2,0,0,0						      |	2,1113,2,nan,0,0
2,1201,2,0,0,0						      |	2,1201,2,nan,0,0
2,1202,2,0,0,0						      |	2,1202,2,nan,0,0
2,1112,2,0,0,0						      |	2,1112,2,nan,0,0
2,1203,2,0,0,0						      |	2,1203,2,nan,0,0
2,1208,2,0,0,0						      |	2,1208,2,nan,0,0
2,1205,2,0,0,0						      |	2,1205,2,nan,0,0
2,1115,2,0,0,0						      |	2,1115,2,nan,0,0
2,1206,2,0,0,0						      |	2,1206,2,nan,0,0
2,1209,2,0,0,0						      |	2,1209,2,nan,0,0
2,1210,2,0,0,0						      |	2,1210,2,nan,0,0
2,1204,2,0,0,0						      |	2,1204,2,nan,0,0
2,1211,2,0,0,0						      |	2,1211,2,nan,0,0
2,1216,2,0,0,0						      |	2,1216,2,nan,0,0
2,2102,2,0,0,0						      |	2,2102,2,nan,0,0
2,1207,2,0,0,0						      |	2,1207,2,nan,0,0
2,1213,2,0,0,0						      |	2,1213,2,nan,0,0
2,2101,2,0,0,0						      |	2,2101,2,nan,0,0
2,1214,2,0,0,0						      |	2,1214,2,nan,0,0
2,1212,2,0,0,0						      |	2,1212,2,nan,0,0
2,2103,2,0,0,0						      |	2,2103,2,nan,0,0
2,2108,2,0,0,0						      |	2,2108,2,nan,0,0
2,2110,2,0,0,0						      |	2,2110,2,nan,0,0
2,2105,2,0,0,0						      |	2,2105,2,nan,0,0
2,2106,2,0,0,0						      |	2,2106,2,nan,0,0
2,1215,2,0,0,0						      |	2,1215,2,nan,0,0
2,2109,2,0,0,0						      |	2,2109,2,nan,0,0
2,2104,2,0,0,0						      |	2,2104,2,nan,0,0
2,2111,2,0,0,0						      |	2,2111,2,nan,0,0
2,2116,2,0,0,0						      |	2,2116,2,nan,0,0
2,2201,2,0,0,0						      |	2,2201,2,nan,0,0
2,2107,2,0,0,0						      |	2,2107,2,nan,0,0
2,2113,2,0,0,0						      |	2,2113,2,nan,0,0
2,2202,2,0,0,0						      |	2,2202,2,nan,0,0
2,2114,2,0,0,0						      |	2,2114,2,nan,0,0
2,2203,2,0,0,0						      |	2,2203,2,nan,0,0
2,2112,2,0,0,0						      |	2,2112,2,nan,0,0
2,2208,2,0,0,0						      |	2,2208,2,nan,0,0
2,2209,2,0,0,0						      |	2,2209,2,nan,0,0
2,2205,2,0,0,0						      |	2,2205,2,nan,0,0
2,2115,2,0,0,0						      |	2,2115,2,nan,0,0
2,2206,2,0,0,0						      |	2,2206,2,nan,0,0
2,2204,2,0,0,0						      |	2,2204,2,nan,0,0
2,2211,2,0,0,0						      |	2,2211,2,nan,0,0
2,2216,2,0,0,0						      |	2,2216,2,nan,0,0
2,2210,2,0,0,0						      |	2,2210,2,nan,0,0
2,2213,2,0,0,0						      |	2,2213,2,nan,0,0
2,2207,2,0,0,0						      |	2,2207,2,nan,0,0
2,2214,2,0,0,0						      |	2,2214,2,nan,0,0
2,2212,2,0,0,0						      |	2,2212,2,nan,0,0
2,2215,2,0,0,0						      |	2,2215,2,nan,0,0
# Error							      |	# ErrorMetricsOut.bin
# CorrectedInt						      |	# CorrectedIntMetricsOut.bin
# Extraction						      |	# ExtractionMetricsOut.bin
# Image							      |	# ImageMetricsOut.bin
# Q							      |	# QMetricsOut.bin
# Index							      |	# IndexMetricsOut.bin
Lane,Tile,Read,Sequence,Sample,Project,Count		      |	Lane,Tile,Read,Sequence,Sample,Project,ClusterCount
# Version: v1.0.6					      |	# Version: v1.0.16

Thanks!

imaging_table cannot find RunParameters.xml

MiSeq V2 runs generate runParameters.xml while MiSeq V3 give RunParameters.xml using MCS v2.5.0.5.

The imaging_table throws error with V2 runs:
RunParameters.xml required for legacy run folders with missing channel names /share/apps/interop-distros/interop/./interop/model/run_metrics.h::read_run_parameters (187)

Renaming the file solves the issue.

recover Cycles err rated informations

I need to recover cycles err rated informations. Because sometimes we don't have 150 but 0-150 in this metric. I need to know when this happened to adapt my script. Can you help me ?

InterOp Quality Metrics v6

Hi,

I am just wondering if the interop format description at one point is not correct:

Q-Metrics Version 6 says:

byte 4 - binCount: array of low ends for each bin (uint8)

This is way too short.

Isn't it sth like:

byte 4 - (4 + binCount)

where binCount is byte 3?

best,
Sven

Simulator for tile instead of cycle

When we create test runs for our pipeline we usually remove most tile but one to speed up the processing.
In order for the interop to work we would need to have interop that have the same number of tiles as the test run.
Is it possible to create an equivalent of cyclesim that would limit the tiles to a specified set?

Cheers
Tim

MacOSX *.so files refer to @rpath which does not exist

I installed interop python wheel file on my mac using

pip install -f https://github.com/Illumina/interop/releases/latest interop

against a virtualenv created from a manally installed python
(BTW this only works with python 3.6 because there are no other wheel)

When I use it I get a stack trace.

ImportError: dlopen(/Users/tcezard/python/python-analysis-driver/lib/python3.6/site-packages/interop/_py_interop_run_metrics.so, 2): Library not loaded: @rpath/libpython3.6m.dylib
  Referenced from: /Users/tcezard/python/python-analysis-driver/lib/python3.6/site-packages/interop/_py_interop_run_metrics.so
  Reason: image not found

The .so files refers to an @rpath which is not defined. From what I can read it seems to be related to anaconda but I'm not using anaconda to install python or my venv.
I could fix this by manually editing the .so files as follow

for i in *.so
do 
    install_name_tool -change "@rpath/libpython3.6m.dylib" /Library/Frameworks/Python.framework/Versions/3.6/lib/libpython3.6m.dylib $i
done

Is it something that can be fix on you end so I do not have to tinker with the .so files?
Is there another way of installing that should be preferred ?

File Format Description

Hi,

kind of "feature request": It would be nice to see the (current) InterOp file format description for each of the bin files here as well (without compiling the sources).

E.g.

    # Quality Metrics (QualityMetricsOut.bin)
    # Contains quality score distribution
    # Format, version 5 (NxSeq500 and newer):
    #  byte 0: file version number (5)
    #  byte 1: length of each record
    #  byte 2: quality score binning (byte flag representing if binning was on)
    #  byte 3: number of quality score bins, B
    #  bytes 4 - (4+B-1): lower boundary of quality score bins
    #  bytes (4+B) - (4+2*B-1): upper boundary of quality score bins
    #  bytes (4+2*B) - (4+3*B-1): remapped scores of quality score bins
    #
    #  The remaining bytes are for the records, with each record in this format:
    #   2 bytes: lane number  (uint16)
    #   2 bytes: tile number  (uint16)
    #   2 bytes: cycle number (uint16)
    #   4 x 50 bytes: number of clusters assigned score (uint32) Q1 through Q50
    #   Where N is the record index

That makes it pretty complete. And I don't have to search in almost forgotten PDF files for the descriptions ;-)

best,
Sven

Unknown tile naming method - update your RunInfo.xml

I'm having trouble generating a few of the plots for data that was produced on our older HiSeq2000/2500. Data off the 4k makes the BasePercent and others just fine, but I can tell the RunInfo.xml is formatted differently.
Here is the error I'm getting.
plot_by_cycle ./160420_7001452_0296_AC8GJ5ANXX/ --metric-name=BasePercent
Version: v1.0.12
Run Folder: 160420_7001452_0296_AC8GJ5ANXX
Unknown tile naming method - update your RunInfo.xml
/var/tmp/linux_gcc46_release/interop/src/interop/model/run_metrics.cpp::finalize_after_load (436)

and here is the RunInfo.xml, runParameters.xml, and InterOp files.
160420_7001452_0296_AC8GJ5ANXX.tar.gz

Am I just missing an additional step?
I've seen some other examples of the RunInfo.xml online that look just like mine so I don't think they are bad, other than the fact they are an older version.
RunInfo xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" Version="2"

Thanks

error while installing with pip

Hi ,

I'm getting the following error message while trying to install interop with pip:

pip install -f https://github.com/Illumina/interop/releases/latest interop
Could not find a version that satisfies the requirement interop (from versions: 1.0.25-Darwin-AppleClang-7.3.0.7030031py344, 1.0.25-Darwin-AppleClang-7.3.0.7030031py351, 1.0.25-Darwin-AppleClang-7.3.0.7030031py360, 1.0.25-Darwin-AppleClang-7.3.0.7030031py2711, 1.0.25-Linux-GNU-4.8.2, 1.0.25-Windows-MSVC-19.0.24215.1)
No matching distribution found for interop

I'm running python 2.7.10 currently. Do you guys have any idea what's going on ?

Best,

Sander

How to get the mean q-score per read and lane?

Hi!

First of all, thank you for providing this library.

I'm trying to figure out how to get the mean q-score per read and lane using the Python API, and so far I've not been able to figure it out.

For a lot of other metrics e.g. the error rate I'm able to do something like this to access the error rates:

run_metrics = py_interop_run_metrics.run_metrics()
run_metrics.run_info()

valid_to_load = py_interop_run.uchar_vector(py_interop_run.MetricCount, 0)
py_interop_run_metrics.list_summary_metrics_to_load(valid_to_load)
run_metrics.read(RUNFOLDER, valid_to_load)

summary = py_interop_summary.run_summary()
py_interop_summary.summarize_run_metrics(run_metrics, summary)

lanes = summary.lane_count()

for read in range(summary.size()):
    for lane in range(lanes):
        mean_error_rate = summary.at(read).at(lane).error_rate().mean()
        print("mean error rate at read: {} and lane: {} is: {}".format(read, lane, mean_error_rate))

However to I can't seem to figure out how to extract mean q-score at the same level.

Is it possible to get it? And if so, could you give me some pointers on how to get to it?

Cheers,
Johan

version missing from output

Hi,

Just a minor bug - the version is missing from output:

$ /share/apps/interop-distros/interop-1.0.11/build/bin/usr/local/bin/summary /data/archive/miseq/160901_M02641_0134_000000000-ATBYE
# Version: Unknown
 Level           Yield           Projected Yield Aligned         Error Rate      Intensity C1    %>=Q30         
 Read 1          1.91            1.91            1.46            0.19            196             97.07          
 Read 2 (I)      0.18            0.18            0.00            0.00            512             95.15          
 Read 3          1.91            1.91            1.43            0.28            231             95.66          
 Non-indexed     3.83            3.83            1.45            0.24            214             96.36          
 Total           4.01            4.01            1.45            0.24            313             96.31          


Read 1
 Lane            Tiles           Density         Cluster PF      Phas/Prephas    Reads           Reads PF        %>=Q30          Yield           Cycles Error    Aligned         Error           Error (35)      Error (75)      Error (100)     Intensity C1   
 1               38              1148 +/- 38     92.02 +/- 1.51  0.151 / 0.037   28.11           25.87           97.07           1.91            74              1.46 +/- 0.07   0.19 +/- 0.02   0.11 +/- 0.01   0.00 +/- 0.00   0.00 +/- 0.00   196 +/- 21     
Read 2 (I)
 Lane            Tiles           Density         Cluster PF      Phas/Prephas    Reads           Reads PF        %>=Q30          Yield           Cycles Error    Aligned         Error           Error (35)      Error (75)      Error (100)     Intensity C1   
 1               38              1148 +/- 38     92.02 +/- 1.51  0.000 / 0.000   28.11           25.87           95.15           0.18            0               0.00 +/- 0.00   0.00 +/- 0.00   0.00 +/- 0.00   0.00 +/- 0.00   0.00 +/- 0.00   512 +/- 61     
Read 3
 Lane            Tiles           Density         Cluster PF      Phas/Prephas    Reads           Reads PF        %>=Q30          Yield           Cycles Error    Aligned         Error           Error (35)      Error (75)      Error (100)     Intensity C1   
 1               38              1148 +/- 38     92.02 +/- 1.51  0.131 / 0.052   28.11           25.87           95.66           1.91            74              1.43 +/- 0.06   0.28 +/- 0.04   0.19 +/- 0.03   0.00 +/- 0.00   0.00 +/- 0.00   231 +/- 26     
Extracted: 158
Called: 158
Scored: 158

Cheers
Matt

Why summary tables have spaces between the labels and other values?

Parsing the summary table becomes a hurdle when all labels are separated by commas and spaces. Any automated CSV parser takes the spaces in question and keeps in the labels.

One example if pandas read_csv, and to access columns you need to strip or name the columns before reading it. Why do that with a CSV file? Is there an option just to not add the spaces at generation time?

Thanks

RPM version

Is there a RPM version or just the source for compiling?

Thanks a lot

No QCollapsed for HiSeq4000?

I have run "dumptext --metric=QCollapsed" on one of our flowcells (HiSeq4000), and it reports "No
InterOp files found".

The InterOp folder contains the following files:

ColorMatrixMetricsOut.bin
CorrectedIntMetricsOut.bin
EmpiricalPhasingMetricsOut.bin
ErrorMetricsOut.bin
EventMetricsOut.bin
ExtractionMetricsOut.bin
FWHMGridMetricsOut.bin
ImageMetricsOut.bin
IndexMetricsOut.bin
PFGridMetricsOut.bin
QMetricsOut.bin
RegistrationMetricsOut.bin
StaticRunMetricsOut.bin
TileMetricsOut.bin

Am I doing something wrong?

Add metric and channel logic as well as more statistical functions

Add additional statistical functions and logic to handle channel order

Errors in Interop Python tutorial [Interop v1.0.25, Python 3.6.3]

When accessing metrics from a run_summary object at a specific read and lane, it seems that I always receive the same mean and median values. For example:

summary.at(0).at(0).cluster_count().median() results in 880134.0
summary.at(1).at(0).cluster_count().median() results in 880134.0

Shouldn't these commands give me read-specific values for reads 1 and 2? If not, what is the proper way to get read-specific metrics such as cluster_count and density?

percent_over_qscore_cumulative returns NaN

Hello,

When trying to use percent_over_qscore_cumulative() and total_over_qscore_cumulative(), I'm getting NaN, whereas the local histogram versions of these calls work fine.

is_cumulative_empty() is returning false, so I know it's not trying to calculate over an empty set.

Is there anything else that needs to be done for percent_over_qscore_cumulative() that isn't required for percent_over_qscore() ?

Thanks,
Andrew

#define INTEROP_VERSION "v1.0.19"

Example using percent_over_qscore and total_over_qscore:

HLFG2BGXY 30 569525.000000 95.297417
HLGC7BGXY 30 618893.000000 95.195122

Same runs, substituting percent_over_qscore_cumulative and total_over_qscore_cumulative:

HLFG2BGXY 30 0.000000 nan
HLGC7BGXY 30 0.000000 nan

Add percentage of used clusters by lane in summary output (NovaSeq)

Hi,
Is it possible to add the percentage of clusters on the lane that are occupied in summary output command (if ExtendedTileMetricsOut.bin file exist in InterOp directory) ?
We haven't developed our process in Java, it is difficult for us to use the defined class.
Kind regards,
Aurélie.

Please include docs for dumptext

Hi folks,

It looks like the docs at http://illumina.github.io/interop/index.html are 1 release behind at InterOp v1.0.13-2-gc534deb

Specifically the applications page does not yet have help for the new dumptext application. Could y'all please add them :)

I would also like to register my interest in another release. Release v1.0.14 had some fatal bugs in dumptext that were fixed with #97 which caused me a lot of confusion.

Thanks so much!

Cmake Compilation Failure on Linux

Hello,

I'm trying to compile the code from source using make and I got the following error:

Building CXX object src/tests/interop/CMakeFiles/interop_gtests.dir/logic/image_table_logic_test.cpp.o

In file included from /illumina/scratch/iPEG/aaminfar/install/interop/interop-1.0.8/src/tests/interop/logic/image_table_logic_test.cpp:8:
/illumina/scratch/iPEG/aaminfar/install/interop/build/external/gtest/src/gtest/include/gtest/gtest.h: In function ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = long unsigned int, T2 = illumina::interop::model::table::column_type]’:
/illumina/scratch/iPEG/aaminfar/install/interop/build/external/gtest/src/gtest/include/gtest/gtest.h:1485:   instantiated from ‘static testing::AssertionResult testing::internal::EqHelper<lhs_is_null_literal>::Compare(const char*, const char*, const T1&, const T2&) [with T1 = size_t, T2 = illumina::interop::model::table::column_type, bool lhs_is_null_literal = false]’
/illumina/scratch/iPEG/aaminfar/install/interop/interop-1.0.8/src/tests/interop/logic/image_table_logic_test.cpp:132:   instantiated from here
/illumina/scratch/iPEG/aaminfar/install/interop/build/external/gtest/src/gtest/include/gtest/gtest.h:1448: error: comparison between signed and unsigned integer expressions
gmake[2]: *** [src/tests/interop/CMakeFiles/interop_gtests.dir/logic/image_table_logic_test.cpp.o] Error 1
gmake[1]: *** [src/tests/interop/CMakeFiles/interop_gtests.dir/all

My linux version: CentOS release 6.7 (Final)
GCC version : gcc version 5.2.0 (GCC)

IndexMetricsOut.bin source and regeneration

We receive data from a sequencing center that does not generate fastq data. Once we create a tag sheet here and generateFASTQ with our local copy of MiSeq Reporter the IndexmetricsOut.bin file is generated. If the file is incorrect e.g. the i7 is the wrong orientation, the SAV Indexing metrics are wrong and will not correct themselves when the SampleSheet is fixed and generateFASTQ is rerun. How can this file be updated to a new SampleSheet?

We also regularly rerun our MiSeq runs to generate Index files to check for tag errors, which 1) takes time to generate the files and 2) takes time to parse the index files, lots of time in the case of HiSeq runs. What files contain the data used to generate the IndexMetricsOut.bin file? I need to know the physical tag combinations, so I can't use the DemultiplexSummary files since it does not contain the pairing information.

Indexing QC metrics (Python)

Hi Illumina,

May I know if there is a way to use python interop library to display the indexing QC "% aligned to index"? I realized that the examples given in the documentations are all by lane or tile.

Thank you
Shimin Ang

Add unit tests for summary logic

We need to add unit tests to ensure all the summary logic works

Compiling an example

Hi,

I have downloaded the Linux precompiled library and have tried to compile an example code:
gcc -Iinterop/include/ test.cpp interop/lib64/libinterop_lib.a -o test

but I get this error message:
/tmp/ccr1iTvq.o:(.eh_frame+0x2eb): undefined reference to `__gxx_personality_v0'
collect2: error: ld returned 1 exit status

I apologize in advance if my question is naive.

Thank you,

Olivier

Need a build test with recent version of GCC (5.2 or later)

Our older GCC compilers missed some warnings (which we report as errors).

Python binding for plots

Hi,

I think I've been able to successfully populate a bar plot object using something like:

py_interop_plot.plot_sample_qc(metrics, lane, bar_plot_data)

but I haven't been able to figure out how to actually get the data out of the object. Is there a way to interface with the gnuplot_writer.write_chart from Python?

Thanks

interop haven't support for Illumina MiniSeq of index-summary

For Illumina MiSeq, interop have index-summary function, however, there is no result with MiniSeq run.

# Version: v1.0.12 Lane 1 Total Reads PF Reads % Read Identified (PF) CV Min Max 0 0 0.0000 -nan 0.0000 0.0000 Index Number Sample Id Project Index 1 (I7) Index 2 (I5) % Read Identified (PF)

Calculate Mean intensity for cycle 1 per lane

How to Calculate Mean intensity for cycle 1 per lane

Referenced from: #22
@rglandie

interop does not install: Python 2.7.7, gcc 4.4.7 / 4.7.2

This occurs on Centos 6.8. Any leads on how to add this to Python 2.7.x?

Pip install attempt:
pip install -f https://github.com/Illumina/interop/releases/latest interop

Result:
Downloading/unpacking interop
Could not find a version that satisfies the requirement interop (from versions: linux_gcc46_release, msvc14_win64_Release, osx_clang_release, osx_py36_clang_release)
Cleaning up...
No distributions matching the version for interop

Pip install attempt with wheel:
pip install --use-wheel --no-index interop-1.0.18-cp27-cp27mu-linux_x86_64.whl

Result:
Ignoring indexes: https://pypi.python.org/simple/
interop-1.0.18-cp27-cp27mu-linux_x86_64.whl is not a supported wheel on this platform.

Another post suggested building a .whl file with cmake. However that did not work.

Cmake attempt:
cmake --build . --target package_wheel
Result:
gmake: *** No rule to make target `package_wheel'. Stop.

Interop on PyPI

Hi!

Firstly, thanks for providing a great library. I'm maintaining a open source project called CheckQC, which depends on Interop. One problem that I've been having is that Interop cannot be installed from PyPI. Most recently my problem is related to getting our documentation up on ReadTheDocs, but we have also seen one user who cannot install our software because they have they have trouble install Interop. Having the library installable via PyPI should be helpful to all users utilizing Interop via the Python APIs.

I'm therefore wondering if it would be possible for you to upload Interop to PyPI? If there is any support I can provide to make that happen let me know.

Cheers,
Johan

illumina / interop Goto Github PK

interop's People

Contributors

Stargazers

Watchers

Forkers

interop's Issues

Version: v1.0.8-10-g34a5cd6

Version: v1.0.8-10-g34a5cd6

Recommend Projects

Recommend Topics

Recommend Org