Coder Social home page Coder Social logo

evanyeyeye / rainbow Goto Github PK

View Code? Open in Web Editor NEW
27.0 7.0 14.0 123.27 MB

Read chromatography and mass spectrometry binary files.

License: GNU General Public License v3.0

Python 88.45% MAXScript 10.70% Euphoria 0.59% OpenEdge ABL 0.15% Visual Basic 6.0 0.11%

rainbow's Introduction

rainbow

PyPI Documentation Status PyPI - Downloads

rainbow provides programmatic access to the raw data encoded in chromatography and mass spectrometry binary files. This library supports the following vendors and detectors:

Agilent .D

  • .uv - UV spectrum (supports incomplete files)
  • .ch - UV, FID, CAD, and ELSD channels
  • .ms - MS (supports incomplete files)
  • MSProfile.bin - HRMS

Waters .raw

  • CHRO - CAD and ELSD, as well as miscellaneous analog data
  • FUNC - UV and MS

There is documentation for rainbow that also details the structure of each binary file format.

Installation

pip install rainbow-api

Usage

The easiest way to get started is to give rainbow a directory path. Assume that we have a directory mydata.D that contains a binary file DAD1.uv with UV data.

import rainbow as rb
datadir = rb.read("mydata.D")
datafile = datadir.get_file("DAD1A.uv")

Here, the datadir DataDirectory object contains a DataFile object for DAD1A.uv. The raw UV data is contained in numpy arrays that are attributes of datafile. Users may find the following particularly useful:

  • datafile.xlabels - 1D numpy array with retention times
  • datafile.ylabels - 1D numpy array with wavelengths
  • datafile.data - 2D numpy array with absorbances

There is a tutorial available. There are also example snippets for basic tasks. Or just check out the full API.

Contents

  • rainbow/ contains the code of the Python library.
  • docs/ contains code for generating documentation. To build documentation locally, you will need to install the sphinx and sphinx-rtd-theme packages. Then, move to the docs/ directory and run make html. The docpages will be generated under docs/_build.
  • tests/ contains unit tests for the library. These can be run with python -m unittest.

rainbow's People

Contributors

evanyeyeye avatar jmarvi3 avatar ekwan avatar omarashkar avatar

Stargazers

Younes Moussaif avatar Qun Li avatar Abolfazl Karimi avatar El loza avatar Delurion avatar Nicholas Hadler avatar  avatar Chua Cheow Huan avatar  avatar Soumyadeep Shome avatar Jason Wang avatar Max Häußler avatar  avatar Kobi Felton avatar  avatar Fabian L. Zott avatar  avatar  avatar  avatar  avatar Sondre Blegen avatar Erlend Olsen avatar  avatar jstathakis avatar Tim Maier avatar Ethan Bass avatar Corin Wagen avatar

Watchers

 avatar Michael Dussere avatar  avatar Ethan Bass avatar  avatar  avatar  avatar

rainbow's Issues

questions

I would like to express my sincere gratitude for your generous sharing of the rainbow Python code, which has been tremendously
helpful for my data processing. However, I have encountered some issues in extracting the data, so I am writing to kindly ask for
your assistance.
The rainbow functions you developed work flawlessly for extracting data acquired with Agilent's ChemStation software, as we have
extensively tested. However, Agilent Company now predominantly utilize the OpenLab CDS 2.x version for data acquisition.
To extract UV data with rainbow, the data files need to be first exported as .D files from OpenLab CDS software, then processed by rainbow. This differs from ChemStation where the .D files were directly accessible on the acquisition computer. The .D files exported from OpenLab CDS may have slight differences in structure. When using rainbow on these files, the extracted data appears incorrect - the chromatograms for each wavelength are abnormal (The extracted data for the example is shown as "gehua.xlsx").
I suspect the differences between the .D file versions may be causing the wavelengths, retention times, and other parameters to not be located properly during parsing. Since writing source code requires highly specialized background knowledge, I do not have the expertise to address this issue on my own. I wanted to kindly ask if you could examine whether the rainbow functions truly have issues handling data from OpenLab CDS software? If there are problems, would it be possible to assist with modifying the code to be compatible with the OpenLab CDS software for proper data extraction? I've attached example .D files from OpenLAB CDS 2.7 software as demonstration.
I sincerely appreciate you taking the time to address my questions. Your project has been invaluable in assisting me with data processing.
gehua.xlsx
example_data.zip

Waters 4-bytes format

I get a "The 4-bytes format is not supprted" error when trying to read my waters FUNxxx.DAT files.

I have read your doc and code, and I would like to implement some 'parse_funcdat4' function.
Do you have any clue ?
Did you find some documentation from the constructor or is it pure reverse engineering ?

Thanks in advance and congratulation for the good work.
The code is really nice.

test_green Unit Test Date Format Mismatch

Hello!

I pip installed the library (version 1.0.9 according to pip list), cloned the repository, and ran the unit tests to make sure everything was installed correctly. One of the unit tests failed, and it seems like it is due to a date format mismatch.

Here is the printout I got for the "test_green" unit test (I have omitted some of my local paths with the <...> notation):

======================================================================
FAIL: test_green (tests.test_agilent.TestAgilent.test_green)
Tests a directory containing:
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\<...>\rainbow\tests\test_agilent.py", line 45, in test_green
    self._DataTester__test_data_directory("green", "D")
  File "C:\<...>\rainbow\tests\datatester.py", line 68, in __test_data_directory
    self.assertDictEqual(datadir.metadata, json_data['metadata'])
AssertionError: {'vendor': 'Agilent', 'date': '03-Feb-22, 11:22:20'} != {'vendor': 'Agilent', 'date': '3 Feb 22  11:22 am -0500'}
- {'date': '03-Feb-22, 11:22:20', 'vendor': 'Agilent'}
?           - ^   ^  ^      ^^

+ {'date': '3 Feb 22  11:22 am -0500', 'vendor': 'Agilent'}
?            ^   ^  ^      ^^^^^^^^


----------------------------------------------------------------------
Ran 17 tests in 39.398s

If it is relevant, I am on a Windows 11 computer with Python 3.11.5.

Based on my reading of the code, it looks like the metadata's date is being read from the .UV file instead of the .MS files, but the assert statement is expecting the date to match the .MS files' format. A simple fix to get the unit test to pass would be to adjust the date format in the metadata section of the unit test's "info.json" file, but if you actively want the metadata's date to be retrieved from the .MS files and not the .UV file, it may be worth filtering by file extension when setting it here:

if 'date' not in metadata and 'date' in datafile.metadata:
metadata['date'] = datafile.metadata['date']

Can't use Rainbows to access Chemstation directory and HPLC datafiles.

Thank you for providing this Python code, which I am hoping to use for a chemistry project. Unfortunately, it is not working for me so far. When I run the following cell in Jupyter notebook to read the Chemstation data in folder 55.D:
"import rainbow as rb
datadir = rb.read('/Users/ (...) /VTNA-4/55.D')"
Datadir only contains a name '55.D' and the metadata: {'vendor': 'Agilent', 'vialpos': '55'}. When I then try to read one of my .ch files (datadir.get_file('dad1C.ch')), I get an error message saying that "Data file dad1C.ch not found in 55.D.". I have also tried to apply the method rb.agilent.chemstation.parse_ch('dad1C.ch'), but this returns None, suggesting that the file cannot be read. Do you know my rainbows is not working in my case? I can access the data using a program called UniChrom, but this is less useful than using Python. A colleague of mine has used rainbows successfully in the way I have attempted, but he has used data from a newer version of the chemstation software, maybe that makes the difference?

PS: the files in my directory 55.D are as follows:
['.ipynb_checkpoints',
'ACQRES.REG',
'dad1.uv',
'dad1A.ch',
'dad1B.ch',
'dad1C.ch',
'dad1D.ch',
'dad1E.ch',
'LCDIAG.REG',
'Report.TXT',
'Report00.CSV',
'REPORT01.CSV',
'REPORT02.CSV',
'REPORT03.CSV',
'REPORT04.CSV',
'REPORT05.CSV',
'RUN.LOG',
'SAMPLE.MAC',
'Untitled.ipynb']

New to MS Analysis

Hi, I'm the IT person for my lab and I have been tasked to obtained the detected masses and their intensities from a Agilent .D folder.

Could you provide me with the steps on how I could do so with Rainbow?

Thank you so much! :)

Best regards,
Yew Mun

Issue opening GC data "struct.error: unpack requires a buffer of 4 bytes"

Hello, I am trying to use the package but am getting a "struct.error: unpack requires a buffer of 4 bytes" error in /rainbow/agilent/chemstation.py line 568. I expect something is going wrong in determining the right num_times earlier in the script but I am not sure.
Any help appreciated :)
An example .D file is here

lzf has no attribute decompress

Issue

I am using a MAC with an M2 chip

I am trying to use the module rainbow.agilent.masshunter.parse_msdata() but I encounter the following error mesage:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[110], [line 1](vscode-notebook-cell:?execution_count=110&line=1)
----> [1](vscode-notebook-cell:?execution_count=110&line=1) profile = rainbow.agilent.masshunter.parse_msdata("Data/Agilent/MS1/QC_neg_5.d/AcqData")

AttributeError: module 'lzf' has no attribute 'decompress'

I succesfully intalled the lzf library (I think), but when I tried to call the lzf.decompress() module it just does not show up (Image bellow).

Screenshot 2024-03-16 at 1 16 25 AM

Any recomendations to troubleshoot this issue?

More naive questions pertaining to MS results

          Hi, sorry but I have just a few more naive questions pertaining to the results.

I have tried to run this:

import rainbow as rb
datadir = rb.read("tests/inputs/green.D")
datafile = datadir.get_file("MSD1.MS")

For datafile.xlabels, there is an array of values from 0.04725 to 147.69308333333333.
For datafile.ylabels, there is 1 value in the array, which is 204.
And for datafile.data, there is an array of values from 146 to 70968.

If xlabelsare m/z values, then a mass of 0.04725 doesn't correspond to any fragment.
Could I ask then, how should I understand the values of xlabels, ylabels and data?

Thank you!

Originally posted by @yipy0005 in #20 (comment)

parse compound names and MS2 from waters

Hello,

Thank you very much for this work! it's the cleanest way to parse waters data. In waters .raw there is a file called _FUNC001.CMP that contains the compound names. I think it should be easy to parse this as an attribute and save also as csv while looping. Also there must be a file in .raw that contains MS2 reads. The reads it self may not be important, but the mapping between which parent mass and child mass is of interest.

Thanks again!

Extension for Asterix ChemStation

I tried to use you package to read in .ch files from Asterix ChemStation. However, there seems to be an issue with the scaling of the data in the data body. Is there a way to adjust rainbow to support data from Asterix ChemStation? How could one find out the scaling factor to read out the data body correctly?

Data File from MassHunter ICP-MS

Evan and Eugene,

I was wondering if you all had ever experimented with opening data from an Agilent 7900 or 8900 inductively coupled plasma mass spectrometer. I am trying to access the time-series data from both the pulse and analog detectors for modeling detector linearity.

Thanks for all of your hard work on rainbow-api. What a feat!

For the data files coming off our 8900, we do not have a file called 'MSMassCal.bin'

Instead we have:
MSScan_XSpecific.xsd - much like the MSScan.xsd this file likely contains a description of the complex types in the .bin file of the same name

MSScan_XSpecific.bin

MSTS_XSpecific.xml
This file contains integer masses, element names, and accumulation times for each isotope.

I think I can modify your code to open these files but I was wonder if you wanted first crack at it. I am happy to provide data files.

Thanks for your work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.