evanyeyeye / rainbow Goto Github PK

Read chromatography and mass spectrometry binary files.

License: GNU General Public License v3.0

Python 88.45% MAXScript 10.70% Euphoria 0.59% OpenEdge ABL 0.15% Visual Basic 6.0 0.11%

rainbow's Introduction

rainbow

rainbow provides programmatic access to the raw data encoded in chromatography and mass spectrometry binary files. This library supports the following vendors and detectors:

Agilent .D

.uv - UV spectrum (supports incomplete files)
.ch - UV, FID, CAD, and ELSD channels
.ms - MS (supports incomplete files)
MSProfile.bin - HRMS

Waters .raw

CHRO - CAD and ELSD, as well as miscellaneous analog data
FUNC - UV and MS

There is documentation for rainbow that also details the structure of each binary file format.

Installation

pip install rainbow-api

Usage

The easiest way to get started is to give rainbow a directory path. Assume that we have a directory mydata.D that contains a binary file DAD1.uv with UV data.

import rainbow as rb
datadir = rb.read("mydata.D")
datafile = datadir.get_file("DAD1A.uv")

Here, the datadir DataDirectory object contains a DataFile object for DAD1A.uv. The raw UV data is contained in numpy arrays that are attributes of datafile. Users may find the following particularly useful:

datafile.xlabels - 1D numpy array with retention times
datafile.ylabels - 1D numpy array with wavelengths
datafile.data - 2D numpy array with absorbances

There is a tutorial available. There are also example snippets for basic tasks. Or just check out the full API.

rainbow/ contains the code of the Python library.
docs/ contains code for generating documentation. To build documentation locally, you will need to install the sphinx and sphinx-rtd-theme packages. Then, move to the docs/ directory and run make html. The docpages will be generated under docs/_build.
tests/ contains unit tests for the library. These can be run with python -m unittest.

rainbow's People

Contributors

Stargazers

Watchers

Forkers

mdussere bujianbusan jonathanstathakis tinanemati jmarvi3 p3bm fujitsu-systems-europe-fse arnabdebnath1999 omarashkar thijsdejong10 rodrigomv29 kamoah233 jaychillin2607

rainbow's Issues

questions

I would like to express my sincere gratitude for your generous sharing of the rainbow Python code, which has been tremendously
helpful for my data processing. However, I have encountered some issues in extracting the data, so I am writing to kindly ask for
your assistance.
The rainbow functions you developed work flawlessly for extracting data acquired with Agilent's ChemStation software, as we have
extensively tested. However, Agilent Company now predominantly utilize the OpenLab CDS 2.x version for data acquisition.
To extract UV data with rainbow, the data files need to be first exported as .D files from OpenLab CDS software, then processed by rainbow. This differs from ChemStation where the .D files were directly accessible on the acquisition computer. The .D files exported from OpenLab CDS may have slight differences in structure. When using rainbow on these files, the extracted data appears incorrect - the chromatograms for each wavelength are abnormal (The extracted data for the example is shown as "gehua.xlsx").
I suspect the differences between the .D file versions may be causing the wavelengths, retention times, and other parameters to not be located properly during parsing. Since writing source code requires highly specialized background knowledge, I do not have the expertise to address this issue on my own. I wanted to kindly ask if you could examine whether the rainbow functions truly have issues handling data from OpenLab CDS software? If there are problems, would it be possible to assist with modifying the code to be compatible with the OpenLab CDS software for proper data extraction? I've attached example .D files from OpenLAB CDS 2.7 software as demonstration.
I sincerely appreciate you taking the time to address my questions. Your project has been invaluable in assisting me with data processing.
gehua.xlsx
example_data.zip

Waters 4-bytes format

I get a "The 4-bytes format is not supprted" error when trying to read my waters FUNxxx.DAT files.

I have read your doc and code, and I would like to implement some 'parse_funcdat4' function.
Do you have any clue ?
Did you find some documentation from the constructor or is it pure reverse engineering ?

Thanks in advance and congratulation for the good work.
The code is really nice.

test_green Unit Test Date Format Mismatch

Hello!

I pip installed the library (version 1.0.9 according to pip list), cloned the repository, and ran the unit tests to make sure everything was installed correctly. One of the unit tests failed, and it seems like it is due to a date format mismatch.

Here is the printout I got for the "test_green" unit test (I have omitted some of my local paths with the <...> notation):

======================================================================
FAIL: test_green (tests.test_agilent.TestAgilent.test_green)
Tests a directory containing:
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\<...>\rainbow\tests\test_agilent.py", line 45, in test_green
    self._DataTester__test_data_directory("green", "D")
  File "C:\<...>\rainbow\tests\datatester.py", line 68, in __test_data_directory
    self.assertDictEqual(datadir.metadata, json_data['metadata'])
AssertionError: {'vendor': 'Agilent', 'date': '03-Feb-22, 11:22:20'} != {'vendor': 'Agilent', 'date': '3 Feb 22  11:22 am -0500'}
- {'date': '03-Feb-22, 11:22:20', 'vendor': 'Agilent'}
?           - ^   ^  ^      ^^

+ {'date': '3 Feb 22  11:22 am -0500', 'vendor': 'Agilent'}
?            ^   ^  ^      ^^^^^^^^


----------------------------------------------------------------------
Ran 17 tests in 39.398s

If it is relevant, I am on a Windows 11 computer with Python 3.11.5.

Based on my reading of the code, it looks like the metadata's date is being read from the .UV file instead of the .MS files, but the assert statement is expecting the date to match the .MS files' format. A simple fix to get the unit test to pass would be to adjust the date format in the metadata section of the unit test's "info.json" file, but if you actively want the metadata's date to be retrieved from the .MS files and not the .UV file, it may be worth filtering by file extension when setting it here:

rainbow/rainbow/agilent/chemstation.py

Lines 853 to 854 in 21ce06b

    
           if 'date' not in metadata and 'date' in datafile.metadata: 
        
               metadata['date'] = datafile.metadata['date']

Rainbow failed in reading .D folder

I used rainbow to open .D folder from Agilent GC-MS data. Please see attached file for example. The error shows "error: unpack requires a buffer of 4 bytes". Any idea about this issue? Thanks
Uploading L_plate1_run1_10302020.D.zip…
!

Finding Q3 information

Hello,
This is related to #16. I have been digging to some extent . I do think this information is coded like that:

https://github.com/ProteoWizard/pwiz/blob/095e7a6f229da2349ba5fc545c7b63fe1ac120d8/pwiz/data/vendor_readers/Waters/ChromatogramList_Waters.cpp#L348

I am not sure exactly however where it could be. A little more expertise is needed.

Thanks

Can't use Rainbows to access Chemstation directory and HPLC datafiles.

Thank you for providing this Python code, which I am hoping to use for a chemistry project. Unfortunately, it is not working for me so far. When I run the following cell in Jupyter notebook to read the Chemstation data in folder 55.D:
"import rainbow as rb
datadir = rb.read('/Users/ (...) /VTNA-4/55.D')"
Datadir only contains a name '55.D' and the metadata: {'vendor': 'Agilent', 'vialpos': '55'}. When I then try to read one of my .ch files (datadir.get_file('dad1C.ch')), I get an error message saying that "Data file dad1C.ch not found in 55.D.". I have also tried to apply the method rb.agilent.chemstation.parse_ch('dad1C.ch'), but this returns None, suggesting that the file cannot be read. Do you know my rainbows is not working in my case? I can access the data using a program called UniChrom, but this is less useful than using Python. A colleague of mine has used rainbows successfully in the way I have attempted, but he has used data from a newer version of the chemstation software, maybe that makes the difference?

PS: the files in my directory 55.D are as follows:
['.ipynb_checkpoints',
'ACQRES.REG',
'dad1.uv',
'dad1A.ch',
'dad1B.ch',
'dad1C.ch',
'dad1D.ch',
'dad1E.ch',
'LCDIAG.REG',
'Report.TXT',
'Report00.CSV',
'REPORT01.CSV',
'REPORT02.CSV',
'REPORT03.CSV',
'REPORT04.CSV',
'REPORT05.CSV',
'RUN.LOG',
'SAMPLE.MAC',
'Untitled.ipynb']

New to MS Analysis

Hi, I'm the IT person for my lab and I have been tasked to obtained the detected masses and their intensities from a Agilent .D folder.

Could you provide me with the steps on how I could do so with Rainbow?

Thank you so much! :)

Best regards,
Yew Mun

Issue opening GC data "struct.error: unpack requires a buffer of 4 bytes"

Hello, I am trying to use the package but am getting a "struct.error: unpack requires a buffer of 4 bytes" error in /rainbow/agilent/chemstation.py line 568. I expect something is going wrong in determining the right num_times earlier in the script but I am not sure.
Any help appreciated :)
An example .D file is here

lzf has no attribute decompress

Issue

I am using a MAC with an M2 chip

I am trying to use the module rainbow.agilent.masshunter.parse_msdata() but I encounter the following error mesage:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[110], [line 1](vscode-notebook-cell:?execution_count=110&line=1)
----> [1](vscode-notebook-cell:?execution_count=110&line=1) profile = rainbow.agilent.masshunter.parse_msdata("Data/Agilent/MS1/QC_neg_5.d/AcqData")

AttributeError: module 'lzf' has no attribute 'decompress'

I succesfully intalled the lzf library (I think), but when I tried to call the lzf.decompress() module it just does not show up (Image bellow).

Any recomendations to troubleshoot this issue?

Scaling data issue for Agilent 7890A GC FID

When trying to get a .csv file from the FID1A.ch file for GC-FID Agilent 7890A a get only 341 data points in the output.csv file, however the csv file exported with Agilent software (FID data) had 21084 rows.
Software GC Chemstation B.04.02 SP1.

This issue may be the same as in #13.

Two files with an example are here:
https://drive.google.com/drive/folders/1pMSbtQUYjCvHe4gsaDTn1Qcsu0bGqNS7?usp=sharing

parse compound names and MS2 from waters

Hello,

Thank you very much for this work! it's the cleanest way to parse waters data. In waters .raw there is a file called _FUNC001.CMP that contains the compound names. I think it should be easy to parse this as an attribute and save also as csv while looping. Also there must be a file in .raw that contains MS2 reads. The reads it self may not be important, but the mapping between which parent mass and child mass is of interest.

Thanks again!

Extension for Asterix ChemStation

I tried to use you package to read in .ch files from Asterix ChemStation. However, there seems to be an issue with the scaling of the data in the data body. Is there a way to adjust rainbow to support data from Asterix ChemStation? How could one find out the scaling factor to read out the data body correctly?

Data File from MassHunter ICP-MS

Evan and Eugene,

I was wondering if you all had ever experimented with opening data from an Agilent 7900 or 8900 inductively coupled plasma mass spectrometer. I am trying to access the time-series data from both the pulse and analog detectors for modeling detector linearity.

Thanks for all of your hard work on rainbow-api. What a feat!

For the data files coming off our 8900, we do not have a file called 'MSMassCal.bin'

Instead we have:
MSScan_XSpecific.xsd - much like the MSScan.xsd this file likely contains a description of the complex types in the .bin file of the same name

MSScan_XSpecific.bin

MSTS_XSpecific.xml
This file contains integer masses, element names, and accumulation times for each isotope.

I think I can modify your code to open these files but I was wonder if you wanted first crack at it. I am happy to provide data files.

Thanks for your work.

	if 'date' not in metadata and 'date' in datafile.metadata:
	metadata['date'] = datafile.metadata['date']