Coder Social home page Coder Social logo

fisher_py's Introduction

fisher_py

This Python module allows access to Thermo Orbitrap raw mass spectrometer files. Using this library makes it possible to automate the analysis of mass spectra, without having to export the data first with another tool. This module is a wrapper that builds uppon the RawFileReader project which is a library developed for C#. Structures have been implemented to make processing data more convenient for Python users.

Installation

fisher_py can be installed via the package repository PyPi:

pip install fisher_py

System Requirements

fishery_py shoud work on any modern desktop operating system (Linux, Windows, Mac OS) with Python 3.6 (or higher) installed.

  • Windows: Tested on Windows 10 Pro x64
  • Linux: Tested on Ubuntu 20.04 LTS x64
  • Mac OS: Not tested

The module relies on the RawFileReader DLLs (Dynamic-Linked-Libraries) to be loaded at runtime (using pythonnet). Since Microsoft introduced .NET Standard it is possible to load DLLs compiled with this framework to be loaded on non-Windows systems (such as Mac OS and Linux). However, systems other than Windows may require additional setup steps in order for fisher_py to work. If you have trouble problems installing fisher_py it is probably because of pythonnet not being able to compile. To resolve this the usualy path is to install mono (https://www.mono-project.com/). There are several guides online to do this but one that was tested can be found here.

Examples

The following example demonstrates how to extract and plot data from a raw-file:

import matplotlib.pyplot as plt
from fisher_py import RawFile
from fisher_py.data.business import TraceType
raw_file = RawFile('my_file.raw')

target_mass = 848.36862
mass_tolerance_ppm = 10
rt, i = raw_file.get_chromatogram(target_mass, mass_tolerance_ppm, TraceType.MassRange)
mz, i2, charges, real_rt = raw_file.get_scan_ms1(1)
print(real_rt)

plt.figure()
plt.plot(rt, i)

plt.figure()
plt.plot(mz, i2)

plt.show()

This example may be fine for some use-cases but the RawFile class only provides limited access to all the functionalities and can serve as an example how to use the module wihtin a project. For an example that uses more of the modules capabilites have a look at raw_file_reader_examle.py.

License and copyright

fisher_py (Copyright 2021 ethz-institute-of-microbiology) is licensed under the MIT license.

Third-party licenses and copyright

RawFileReader reading tool. Copyright © 2016 by Thermo Fisher Scientific, Inc. All rights reserved. See RawFileReaderLicense.md for licensing information. Note: anyone recieving RawFileReader as part of a larger software distribution (in the current context, as part of fisher_py) is considered an "end user" under section 3.3 of the RawFileReader License, and is not granted rights to redistribute RawFileReader.

fisher_py's People

Contributors

dowerner avatar mdussere avatar dpsmca avatar schatzsc avatar

Stargazers

Leo Fan avatar Ted Corcovilos avatar zhoujiang avatar  avatar Hexwell avatar  avatar  avatar Helge Hecht avatar J. Sebastian Paez avatar  avatar  avatar  avatar  avatar  avatar

Watchers

J. Sebastian Paez avatar  avatar

fisher_py's Issues

Cannot run multiprocessed on linux

When I run a script multiprocessed on Linux I get this telemetry dumper:

  • Assertion at mono-threads.c:499, condition `result' not met
    =================================================================
    Native Crash Reporting
    =================================================================
    Got a SIGABRT while executing native code. This usually indicates
    a fatal error in the mono runtime or one of the native libraries
    used by your application.
    =================================================================
    =================================================================
    Native stacktrace:
    =================================================================
    0x7f3f1127665d - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 :
    0x7f3f112769e9 - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 :
    0x7f3f11207bc2 - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 :
    0x7f3f11275ba2 - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 :
    0x7f3f11f7e3c0 - /lib/x86_64-linux-gnu/libpthread.so.0 :
    0x7f3f11c5703b - /lib/x86_64-linux-gnu/libc.so.6 : gsignal
    0x7f3f11c36859 - /lib/x86_64-linux-gnu/libc.so.6 : abort
    0x7f3f1116fc2f - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 :
    0x7f3f1144b376 - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 :
    0x7f3f11465aaf - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 :
    0x7f3f1146613d - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 : monoeg_assertion_message
    0x7f3f1146617b - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 :
    0x7f3f1145ada9 - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 : mono_thread_info_attach
    0x7f3f113ab1e3 - /home/tochr/miniconda3/lib/python3.8/site-packages/../../libmonosgen-2.0.so.1 :
    0x7f3f11f72609 - /lib/x86_64-linux-gnu/libpthread.so.0 :
    0x7f3f11d33163 - /lib/x86_64-linux-gnu/libc.so.6 : clone
    =================================================================
    Telemetry Dumper:
    =================================================================

This only happens on linux, and happens on multiple independent linux systems.
I installed fisher-py with 'conda install -c conda-forge pythonnet' and 'pip install fisher-py' and to repoduce the error:

from time import sleep
from fisher_py import RawFile
from multiprocessing import Pool
def read_file(file):
RawFile(file)
sleep(10)

files = ['/path/to/file1.raw', '/path/to/file2.raw']
with Pool(2) as ps:
list(ps.map(read_file, files))

[Help wanted] get_retention_time_from_scan_number throws TypeError

I tried to getting the retention time with scan number. I called the get_retention_time_from_scan_number function like below, but it throws an exception which is TypeError: unsupported operand type(s) for -: 'list' and 'int'. How can i fix the issue? Any help will be appreciate.

ms2_scan_numbers = np.array(raw_file_source._ms2_filter_scan_numbers)
for scan_number in ms2_scan_numbers:
    ms2_scan_number = int(scan_number)
    ms2_rt = raw_file_source.get_retention_time_from_scan_number(ms2_scan_number)

Missing fields like "UserLabel" and "Sequence Row Level Name"

Hello,
I am using this nice Python API to access raw file metadata.
Although I can easily obtain sample_information.user_text,
I cannot extract "UserLabel" or "Sequence Row Level Name".
Do I need to add these .NET <=> Python wrappers myself?

Still a troublesome Linux issues raw_file.instrument_methods_count

Hello,
Under Linux Centos 7, we are still being plagued by this error,
Can you please test version 18 on a Mac or Linux box?

`ERROR: test_get_instrument_methods_count_cross_platform (MprcExtractRaw2_per_spectrum_tests.TestMprcExtractRaw2)
This should be tested on Windows 10, 11, MacOS, and CentosOS 7,8

Traceback (most recent call last):
File "/odin/dev/scripts/deployments/MprcExtractRaw/MprcExtractRaw2/src/test/MprcExtractRaw2_per_spectrum_tests.py", line 233, in test_get_instrument_methods_count_cross_platform
instrument_methods_count = raw_file.instrument_methods_count
File "/apps/software/biotools/python/3.9.9/lib/python3.9/site-packages/fisher_py/raw_file_reader/raw_file_access.py", line 119, in instrument_methods_count
return self.get_wrapped_object().InstrumentMethodsCount
System.NullReferenceException: Object reference not set to an instance of an object
at ThermoFisher.CommonCore.RawFileReader.DeviceStorage.GetStorageNames (ThermoFisher.CommonCore.RawFileReader.IOleStorage storage, ThermoFisher.CommonCore.RawFileReader.StgType storageType) [0x00000] in <614d17bf8c0447909a4884c93b4318b5>:0
at ThermoFisher.CommonCore.RawFileReader.DeviceStorage.EnumSubStgsNoRecursion (System.Collections.Generic.List1[T] storageDescriptions) [0x00000] in <614d17bf8c0447909a4884c93b4318b5>:0 at ThermoFisher.CommonCore.RawFileReader.StructWrappers.Method.GetMethodData (ThermoFisher.CommonCore.RawFileReader.Facade.Interfaces.IMemMapReader viewer, System.Int64 startPos, System.Collections.Generic.List1[T] storageDesc) [0x00027] in <614d17bf8c0447909a4884c93b4318b5>:0
at ThermoFisher.CommonCore.RawFileReader.StructWrappers.Method+<>c__DisplayClass25_0.b__1 () [0x00000] in <614d17bf8c0447909a4884c93b4318b5>:0
at System.Lazy1[T].ViaFactory (System.Threading.LazyThreadSafetyMode mode) [0x00043] in <ed178240b0494cd89a00d276a4c37080>:0 at System.Lazy1[T].ExecutionAndPublication (System.LazyHelper executionAndPublication, System.Boolean useDefaultConstructor) [0x00022] in :0
at System.Lazy1[T].CreateValue () [0x00074] in <ed178240b0494cd89a00d276a4c37080>:0 at System.Lazy1[T].get_Value () [0x0000a] in :0
at ThermoFisher.CommonCore.RawFileReader.StructWrappers.Method.get_StorageDescriptions () [0x00008] in <614d17bf8c0447909a4884c93b4318b5>:0
at ThermoFisher.CommonCore.RawFileReader.RawFileAccessBase.get_InstrumentMethodsCount () [0x0003b] in <614d17bf8c0447909a4884c93b4318b5>:0
at (wrapper managed-to-native) System.Reflection.RuntimeMethodInfo.InternalInvoke(System.Reflection.RuntimeMethodInfo,object,object[],System.Exception&)
at System.Reflection.RuntimeMethodInfo.Invoke (System.Object obj, System.Reflection.BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0x0006a] in :0`

Raw file multi-threaded access from Python?

Hello,
Given the following C# multi-threaded interfaces.
Is multi-threaded access via fisher_py available?

from "UsingRawFileReader.pdf"
Multi-threaded code
As noted in the section above, the IRawDataPlus interface is an instance object, and should be used by only one thread.
However, the raw file reader can generate any number of such objects, so that any number of threads can access the
same raw file in parallel.
For multi-threaded code, the pattern is:
A manager object is created (from a file name), which cannot itself read any raw data.
This manager is then used to create IRawDataPlus interfaces for each thread, as detailed below:
The C# raw file reader includes an implementation of IRawFileThreadManager
The factory method RawFileReaderFactory.CreateThreadManager can be used to open a raw file, for use
by multiple threads.
For example:
var myThreadManager = RawFileReaderFactory.CreateThreadManager (path);
This can be used for multi-thread access to the same raw file.
All business logic still accesses the information using the IRawDataPlus interface. Application teams do not have to write
any thread synchronization or locking code.
The usage pattern is as follows.
Open a file returning a thread data manager:
Try
{
myThreadManager = RawFileReaderFactory.CreateThreadManager (filename)
}
Catch exceptions
Exceptions may occur if the required raw file reader dll is not found, or a null string is sent for the raw file name.
The action of opening a file executes a small number of “one time” single threaded actions, such as:
• Opening the file on disk.
• Reading the file header (time stamps, operator name etc.)
• Reading sample information.
• Obtaining the list of detectors.
Important: This “thread manager” cannot itself read any raw data.
Assuming no exceptions, for each thread which needs access to data (including the current thread, if needed):
IRawDataPlus myThreadDataReader = myThreadManager. CreateThreadAccessor();
To test for errors, create a thread accessor.
This property:
///


/// Gets the file error state.
///

IFileError FileError { get; }
Can then be used to check for any errors (such as invalid file name).
Performance Note: Within the C# reader, there is no significant performance overhead in
“CreateThreadAccessor”. In evaluation of this, we have tested that in can be used within a “Parallel.For” pattern,
to make an accessor for each scan in a raw file.
Note: The method “CreateThreadAccessor();” is actually a member of the interface
“IRawFileThreadAccessor” which has other implementations noted later.
After all created threads have exited, call:
myThreadManager.Dispose();
Using this pattern all business logic for multi-threaded code is exactly the same as for single threaded code.
All interface members of IRawDataPlus can be used, with no concerns for locking or thread safety. Do not add any
additional locks in calling code.
Data for each thread is read in parallel (lockless), wherever possible. Some larger objects in the raw file use (thread safe)
lazy loading, such that, for example: The first thread which opens the MS data will incur a small overhead for opening
the MS data stream, then all other threads will share parallel access to that same stream.
Note: Locking may occur when a file is opened in real time mode (during data acquisition), as a real time data is
continually changing state. This locking is internal to the file reader, and calling code need not take any special action for
real time files.

Alternate approaches to threading
Even though the C# reader natively supports parallel access to data, not all file readers may support this, but “There’s an
interface for that”.
The interface “IRawFileThreadManager”
Is defined as follows:
public interface : IRawFileThreadAccessor, IDisposable
That is: When you open a raw file, and obtain this interface, as described above, you “dispose” to close the file.
The mechanism for allocating data to threads is descried in “IRawFileThreadAccessor”, so if the application layer
code opens a file, it may pass “IRawFileThreadAccessor” to the next layer of code.
As above, that code can call “CreateThreadAccessor”:
MyMethod(IRawFileThreadAccessor myThreadDataReader)
IRawDataPlus myThreadDataReader = myThreadDataReader. CreateThreadAccessor();
An advantage of this scheme is that an application can be designed to use other readers which do not support the
thread manager interface, by using another implementation of “IRawFileThreadAccessor”.
CommonCore already includes an implementation of this:
public class ThreadSafeRawFileAccess : IRawCache, IRawFileThreadAccessor
This class permits multi thread support (by IRawFileThreadAccessor) to be generated from any instance of
IRawDataPlus
Using this public constructor:
public ThreadSafeRawFileAccess(IRawDataPlus file).
Via this pattern, library code need not be aware of how the thread management is done. If the library code
(business logic) receives “IRawFileThreadAccessor” it will be able support multi-threaded access to any raw
file, from any file reader.
The difference is one of performance.
When using the C# raw file reader’s direct implementation (ThreadedFileFactory) the business logic will have
lockless parallel access to data. When the interface passed in is created by the class
“ThreadSafeRawFileAccess” then calls into the file are serialized via locks.
Note: In all cases
• The business logic need no add any additional locking.
• The business logic need not reference any specific file reading DLL.
• All required interfaces are contained in ThermoFisher.CommonCore.Data.dll

Error accessing ScanEvent properties on macOS/Linux

Description

I encountered a couple of errors when accessing MS/MS RAW files from our proteomics lab using fisher-py 1.0.17 on our CentOS 7.5 cluster, and also on my MacBook. Both of the errors were part of scan_event.py.

  1. When the ms_order property was accessed, we got an error about a missing attribute. The relevant portion of the stack trace is:

      File "/Users/dpsmca/MprcExtractRaw/venv/lib/python3.9/site-packages/fisher_py/raw_file.py", line 71, in __init__
        scan_numbers, rt = self._get_ms_scan_numbers_and_retention_times_(MsOrderType.Ms)
      File "/Users/dpsmca/MprcExtractRaw/venv/lib/python3.9/site-packages/fisher_py/raw_file.py", line 126, in _get_ms_scan_numbers_and_retention_times_
        if scan_event.ms_order != ms_order:
      File "/Users/dpsmca/MprcExtractRaw/venv/lib/python3.9/site-packages/fisher_py/data/scan_event.py", line 296, in ms_order
        return MsOrderType(self._get_wrapped_object_().MsOrder)
    AttributeError: 'IScanEvent' object has no attribute 'MsOrder'
    
  2. When get_reaction() was called, we got a similar error about the Reactions attribute. Stack trace:

      File "/Users/dpsmca/MprcExtractRaw/venv/lib/python3.9/site-packages/fisher_py/raw_file.py", line 81, in __init__
        scan_numbers, filter_masses = self._get_ms2_scan_numbers_and_masses_()
      File "/Users/dpsmca/MprcExtractRaw/venv/lib/python3.9/site-packages/fisher_py/raw_file.py", line 109, in _get_ms2_scan_numbers_and_masses_
        precursor_mass = self._get_scan_filter_precursor_mass_(scan_number)
      File "/Users/dpsmca/MprcExtractRaw/venv/lib/python3.9/site-packages/fisher_py/raw_file.py", line 138, in _get_scan_filter_precursor_mass_
        return scan_event.get_reaction(0).precursor_mass
      File "/Users/dpsmca/MprcExtractRaw/venv/lib/python3.9/site-packages/fisher_py/data/scan_event.py", line 599, in get_reaction
        return Reaction(self._get_wrapped_object_().Reactions[index])
    AttributeError: 'IScanEvent' object has no attribute 'Reactions'
    

Issues

  • (1) is just a typo, I think -- looking at ThermoFisher.CommonCore.Data.xml, it appears that this should be MSOrder, with a capital S.

  • (2) seems to work if you replace the direct access to Reactions[index] with a call to GetReaction(index).

I've created a pull request for this: #13

Thanks!

How to access full profile spectrum?

Sorry, more a question than an issue - I can access the mass spectra in two ways:

average_scan = raw_file.average_scans_in_scan_range(first_scan_number, last_scan_number, scan_filter, options)
spectrum = average_scan.segmented_scan
for i in range(len(spectrum.positions)):
    print(f'  {i} - {spectrum.positions[i]}, {spectrum.intensities[i]}')
for i in range(average_scan.centroid_scan.length):
    print(f'  {average_scan.centroid_scan.masses[i]} {average_scan.centroid_scan.intensities[i]}')

and there is also a function that Converts the segmented scan to centroid scan. Used to centroid profile data.

But how do I access the raw profile data?

The segmented_scan seems to set values which are below a certain threshold to zero respectively does not show m/z ranges outside the identified peaks but I want the full intensity vs. m/z date pairs for the whole scan range - or is this more of a nomenclature problem and the "segmented scan" is actually the "profile"?!?

Disable native thermo peak picking

Is there a way to disable the native Thermo peak picking using fisher-py? I know very little of the way this peak picking is implemented natively into the raw files, but looking at the ThermoRawFileParser (https://github.com/compomics/ThermoRawFileParser) an option -p is possible to disable this peak picking, returning, on average, 10+ times more peaks per spectra.

I would love if there was a way to achieve this using fisher-py, but from looking through the RawFile class and the documentation I can find no such option, is this manageable or completely not feasible with the current fisher-py?
image

Problem with select_instrument()

I have problems with the raw_file.select_instrument(Device.MS, 1) line using Python 3.10 on Windows 10.

File loads properly and lines such as raw_file.instrument_count and other sample-related functionality such as raw_file.sample_information.vial and others from https://github.com/ethz-institute-of-microbiology/fisher_py/blob/main/examples/raw_file_reader_example.py work well, but not those that directly refer to the instrument access, like raw_file.get_instrument_data().name

Instead of the intended functionality, there is an AssertionError for line 893 in https://github.com/ethz-institute-of-microbiology/fisher_py/blob/main/fisher_py/raw_file_reader/raw_file_access.py

The file I try to open is attached blank219.raw

blank219.zip

I just cannot figure out if the problem is with my installation or there is something else wrong?!?!

Issue reading the file on Mac OS

Hello,
I have a problem reading .raw file. Am I supposed to set a Device type somehow?
`NullReferenceException Traceback (most recent call last)
Cell In[10], line 3
1 from fisher_py import RawFile
2 from fisher_py.data.business import TraceType
----> 3 raw_file = RawFile(file_path)

File /opt/homebrew/lib/python3.11/site-packages/fisher_py/raw_file.py:61, in RawFile.init(self, path)
59 self._path = path
60 self._raw_file_access = RawFileReaderAdapter.file_factory(path)
---> 61 self._raw_file_access.select_instrument(Device.MS, 1)
63 # fetch retention times and scan numbers
64 scan_numbers, rt = self.get_scan_numbers_and_retention_times()

File /opt/homebrew/lib/python3.11/site-packages/fisher_py/raw_file_reader/raw_file_access.py:915, in RawFileAccess.select_instrument(self, instrument_type, instrument_index)
913 assert type(instrument_type) is Device
914 assert type(instrument_index) is int
--> 915 self.get_wrapped_object().SelectInstrument(instrument_type.value, instrument_index)

NullReferenceException: Object reference not set to an instance of an object.
at ThermoFisher.CommonCore.RawFileReader.RawFileAccessBase.SelectInstrument(Device instrumentType, Int32 instrumentIndex) in C:\Thermofisher\Projects\1\s\Libraries\RawFileReader\RawFileAccessBase.cs:line 2368
at InvokeStub_IRawData.SelectInstrument(Object, Object, IntPtr*)
at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.