Coder Social home page Coder Social logo

mannlabs / directlfq Goto Github PK

View Code? Open in Web Editor NEW
36.0 6.0 4.0 181.13 MB

Fast and accurate label-free quantification for small and very large numbers of proteomes

Home Page: https://doi.org/10.1101/2023.02.17.528962

License: Apache License 2.0

Jupyter Notebook 96.36% Python 3.48% Shell 0.06% HTML 0.01% Inno Setup 0.02% R 0.02% CSS 0.04%
algorithms bioinformatics proteomics quantification alphapept-ecosystem python

directlfq's People

Contributors

ammarcsj avatar georgwa avatar sander-willems-bruker avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

directlfq's Issues

Execution time

I wanted to ask how long the expected execution time for e.g. 1333 Proteins across e.g. 3 replicates is to be expected.

Python version > 3.8

Hi, I wanted to ask if you would at some point also support python versions > 3.8.

Query/feature request: batch correction

I'm working with the CLI and am hoping to build in a batch-correction step for large datasets. My initial thoughts would be to use deactivate_normalization TRUE on peptide intensities which has already been normalized + undergone a batch correction externally to directLFQ. Is there a better approach, and if not, is there a recommended batch correction approach?

Many thanks

Issue warning if quant_id is not unique key

Describe the bug
If we pass a dataframe with duplicates in the quant_id column to lfqnorm.NormalizationManagerSamplesOnSelectedProteins() it results in a rather strange numba error.

A more informative error message or a check on the column might be usefull.

Logs

../alphadia/outputtransform.py:705: in build_lfq_tables
    lfq_df = qb.lfq(
../alphadia/outputtransform.py:284: in lfq
    protein_df, _ = lfqprot_estimation.estimate_protein_intensities(
/usr/local/miniconda/envs/alphadia/lib/python3.9/site-packages/directlfq/protein_intensity_estimation.py:44: in estimate_protein_intensities
    list_of_tuple_w_protein_profiles_and_shifted_peptides = get_list_of_tuple_w_protein_profiles_and_shifted_peptides(normed_df, num_samples_quadratic, min_nonan, num_cores)
/usr/local/miniconda/envs/alphadia/lib/python3.9/site-packages/directlfq/protein_intensity_estimation.py:60: in get_list_of_tuple_w_protein_profiles_and_shifted_peptides
    list_of_tuple_w_protein_profiles_and_shifted_peptides = get_list_with_multiprocessing(input_specification_tuplelist_idx__df__num_samples_quadratic__min_nonan, num_cores)
/usr/local/miniconda/envs/alphadia/lib/python3.9/site-packages/directlfq/protein_intensity_estimation.py:107: in get_list_with_multiprocessing
    list_of_tuple_w_protein_profiles_and_shifted_peptides = pool.starmap(calculate_peptide_and_protein_intensities, input_specification_tuplelist_idx__df__num_samples_quadratic__min_nonan)
/usr/local/miniconda/envs/alphadia/lib/python3.9/site-packages/multiprocess/pool.py:372: in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <multiprocess.pool.MapResult object at 0x7fe2f7857a90>, timeout = None

    def get(self, timeout=None):
        self.wait(timeout)
        if not self.ready():
            raise TimeoutError
        if self._success:
            return self._value
        else:
>           raise self._value
E           numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
E           No implementation of function Function(<built-in function iadd>) found for signature:
E            
E            >>> iadd(Literal[int](0), array(bool, 1d, A))
E            
E           There are 18 candidate implementations:
E             - Of which 16 did not match due to:
E             Overload of function 'iadd': File: <numerous>: Line N/A.
E               With argument(s): '(int64, array(bool, 1d, A))':
E              No match.
E             - Of which 2 did not match due to:
E             Operator Overload in function 'iadd': File: unknown: Line unknown.
E               With argument(s): '(int64, array(bool, 1d, A))':
E              No match for registered cases:
E               * (int64, int64) -> int64
E               * (int64, uint64) -> int64
E               * (uint64, int64) -> int64
E               * (uint64, uint64) -> uint64
E               * (float32, float32) -> float32
E               * (float64, float64) -> float64
E               * (complex64, complex64) -> complex64
E               * (complex128, complex128) -> complex128
E           
E           During: typing of intrinsic-call at /usr/local/miniconda/envs/alphadia/lib/python3.9/site-packages/directlfq/normalization.py (304)
E           
E           File "../../../../../../usr/local/miniconda/envs/alphadia/lib/python3.9/site-packages/directlfq/normalization.py", line 304:
E               def _get_num_nas_in_row(row):
E                   <source elided>
E                   for is_nan in isnans:
E                       sum+=is_nan
E                       ^

/usr/local/miniconda/envs/alphadia/lib/python3.9/site-packages/multiprocess/pool.py:771: TypingError

Missing row intensities and entries

Describe the bug
Some row entries are 0 for all replicates in the CustomDf.aq_reformat.ion_intensties even though there are valid values in the original CustomDf.aq_reformat input dataframe. Furthermore, there are row entries missing when comparing both .tsv files which i did not expect to happen.

To Reproduce
see #16 for input and output files.

Expected behavior

  1. Intensities curated via directLFQ should not be 0 for all replicates and
  2. "CustomDf.aq_reformat.tsv.ion_intensities.tsv" row entry number is the same as from the input data "CustomDf.aq_reformat.tsv"

Version (please complete the following information):

  • 0.2.11

directLFQ 0.2.16 fails with IndexError: list index out of range

Describe the bug
The most recent version of directLFQ fails with IndexError: list index out of range during the alphaDIA testcase.

To Reproduce
Steps to reproduce the behavior:

  1. run the test case test_output_transform() in alphadia/tests/unit_tests/test_outputtransform.py

Expected behavior
A clear and concise description of what you expected to happen.

Logs

================================================================================================= test session starts =================================================================================================
platform darwin -- Python 3.9.18, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/georgwallmann/Documents/git/alphadia
collected 59 items / 5 deselected / 54 selected                                                                                                                                                                       

tests/unit_tests/test_calibration.py ....                                                                                                                                                                       [  7%]
tests/unit_tests/test_data.py ..                                                                                                                                                                                [ 11%]
tests/unit_tests/test_fdr.py .....                                                                                                                                                                              [ 20%]
tests/unit_tests/test_fragcomp.py ...                                                                                                                                                                           [ 25%]
tests/unit_tests/test_grouping.py .........                                                                                                                                                                     [ 42%]
tests/unit_tests/test_libtransform.py .                                                                                                                                                                         [ 44%]
tests/unit_tests/test_numba.py ....                                                                                                                                                                             [ 51%]
tests/unit_tests/test_outputtransform.py F                                                                                                                                                                      [ 53%]
tests/unit_tests/test_plexscoring.py .                                                                                                                                                                          [ 55%]
tests/unit_tests/test_plotting.py ..                                                                                                                                                                            [ 59%]
tests/unit_tests/test_quadrupole.py ...                                                                                                                                                                         [ 64%]
tests/unit_tests/test_reporting.py ......                                                                                                                                                                       [ 75%]
tests/unit_tests/test_utils.py ....                                                                                                                                                                             [ 83%]
tests/unit_tests/test_workflow.py .........                                                                                                                                                                     [100%]

====================================================================================================== FAILURES =======================================================================================================
________________________________________________________________________________________________ test_output_transform ________________________________________________________________________________________________

    def test_output_transform():
        run_columns = ["run_0", "run_1", "run_2"]
    
        config = {
            "general": {
                "thread_count": 8,
            },
            "fdr": {
                "fdr": 0.01,
                "inference_strategy": "heuristic",
                "group_level": "proteins",
                "keep_decoys": False,
            },
            "search_output": {
                "min_k_fragments": 3,
                "min_correlation": 0.25,
                "num_samples_quadratic": 50,
                "min_nonnan": 1,
                "normalize_lfq": True,
                "peptide_level_lfq": False,
                "precursor_level_lfq": False,
            },
        }
    
        temp_folder = os.path.join(tempfile.gettempdir(), "alphadia")
        os.makedirs(temp_folder, exist_ok=True)
    
        progress_folder = os.path.join(temp_folder, "progress")
        os.makedirs(progress_folder, exist_ok=True)
    
        # setup raw folders
        raw_folders = [os.path.join(progress_folder, run) for run in run_columns]
    
        psm_base_df = _mock_precursor_df(n_precursor=100)
        fragment_base_df = _mock_fragment_df(n_precursor=200)
    
        for raw_folder in raw_folders:
            os.makedirs(raw_folder, exist_ok=True)
    
            psm_df = psm_base_df.sample(50)
            psm_df["run"] = os.path.basename(raw_folder)
            frag_df = fragment_base_df[
                fragment_base_df["precursor_idx"].isin(psm_df["precursor_idx"])
            ]
    
            frag_df.to_csv(os.path.join(raw_folder, "frag.tsv"), sep="\t", index=False)
            psm_df.to_csv(os.path.join(raw_folder, "psm.tsv"), sep="\t", index=False)
    
        output = outputtransform.SearchPlanOutput(config, temp_folder)
        _ = output.build_precursor_table(raw_folders, save=True)
        _ = output.build_stat_df(raw_folders, save=True)
>       _ = output.build_lfq_tables(raw_folders, save=True)

tests/unit_tests/test_outputtransform.py:169: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
alphadia/outputtransform.py:645: in build_lfq_tables
    lfq_df = qb.lfq(
alphadia/outputtransform.py:276: in lfq
    protein_df, _ = lfqprot_estimation.estimate_protein_intensities(
../../../miniconda3/envs/alpha/lib/python3.9/site-packages/directlfq/protein_intensity_estimation.py:37: in estimate_protein_intensities
    ion_df = get_ion_intensity_dataframe_from_list_of_shifted_peptides(list_of_tuple_w_protein_profiles_and_shifted_peptides, allprots)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

list_of_tuple_w_protein_profiles_and_shifted_peptides = [(array([9.24812545, 9.24812545,        nan]),                               0         1   2
pg    ion                ...6417  1.6417  1.6417
      695990860382217  1.6417  1.6417  1.6417
      695995155349513  1.6417  1.6417  1.6417), ...]
allprots = ['EPROT', 'VPROT', 'ZPROT', 'LPROT', 'FPROT', 'SPROT', ...]

    def get_ion_intensity_dataframe_from_list_of_shifted_peptides(list_of_tuple_w_protein_profiles_and_shifted_peptides, allprots):
        ion_names = []
        ion_vals = []
        protein_names = []
        column_names = list_of_tuple_w_protein_profiles_and_shifted_peptides[0][1].columns.tolist()
        for idx in range(len(list_of_tuple_w_protein_profiles_and_shifted_peptides)):
>           protein_name = allprots[idx]
E           IndexError: list index out of range

../../../miniconda3/envs/alpha/lib/python3.9/site-packages/directlfq/protein_intensity_estimation.py:206: IndexError
------------------------------------------------------------------------------------------------ Captured stdout call -------------------------------------------------------------------------------------------------
2024-01-24 12:08:13> Performing protein grouping and FDR
2024-01-24 12:08:13> Building output for run_0
2024-01-24 12:08:13> Building output for run_1
2024-01-24 12:08:13> Building output for run_2
2024-01-24 12:08:13> Building combined output
2024-01-24 12:08:13> Performing protein inference
2024-01-24 12:08:13> Inference strategy: heuristic. Using maximum parsimony with grouping for protein inference
2024-01-24 12:08:13> Performing protein FDR
2024-01-24 12:08:13> Test AUC: 1.000
2024-01-24 12:08:13> Train AUC: 1.000
2024-01-24 12:08:13> AUC difference: 0.00%
2024-01-24 12:08:13> ================ Protein FDR =================
2024-01-24 12:08:13> Unique protein groups in output
2024-01-24 12:08:13>   1% protein FDR: 24
2024-01-24 12:08:13> 
2024-01-24 12:08:13> Unique precursor in output
2024-01-24 12:08:13>   1% protein FDR: 42
2024-01-24 12:08:13> ================================================
2024-01-24 12:08:13> Writing precursor output to disk
2024-01-24 12:08:13> Building search statistics
2024-01-24 12:08:13> Reading precursors.tsv file
2024-01-24 12:08:13> Writing stat output to disk
2024-01-24 12:08:13> Performing label free quantification
2024-01-24 12:08:13> Reading precursors.tsv file
2024-01-24 12:08:13> Accumulating fragment data
2024-01-24 12:08:13> reading frag file for run_0
2024-01-24 12:08:13> reading frag file for run_1
2024-01-24 12:08:13> reading frag file for run_2
2024-01-24 12:08:13> Performing label free quantification on the pg level
2024-01-24 12:08:13> Filtering fragments by quality
2024-01-24 12:08:13> Performing label-free quantification using directLFQ
2024-01-24 12:08:13> to few values for normalization without missing values. Including missing values
2024-01-24 12:08:13> 24 lfq-groups total
2024-01-24 12:08:13> using 8 processes
2024-01-24 12:08:13> lfq-object 0
-------------------------------------------------------------------------------------------------- Captured log call --------------------------------------------------------------------------------------------------
PROGRESS root:outputtransform.py:419 Performing protein grouping and FDR
INFO     root:outputtransform.py:427 Building output for run_0
INFO     root:outputtransform.py:427 Building output for run_1
INFO     root:outputtransform.py:427 Building output for run_2
INFO     root:outputtransform.py:446 Building combined output
INFO     root:outputtransform.py:456 Performing protein inference
INFO     root:outputtransform.py:488 Inference strategy: heuristic. Using maximum parsimony with grouping for protein inference
INFO     root:outputtransform.py:501 Performing protein FDR
INFO     root:fdr.py:355 Test AUC: 1.000
INFO     root:fdr.py:356 Train AUC: 1.000
INFO     root:fdr.py:359 AUC difference: 0.00%
PROGRESS root:outputtransform.py:508 ================ Protein FDR =================
PROGRESS root:outputtransform.py:511 Unique protein groups in output
PROGRESS root:outputtransform.py:512   1% protein FDR: 24
PROGRESS root:outputtransform.py:513 
PROGRESS root:outputtransform.py:514 Unique precursor in output
PROGRESS root:outputtransform.py:515   1% protein FDR: 42
PROGRESS root:outputtransform.py:516 ================================================
INFO     root:outputtransform.py:524 Writing precursor output to disk
PROGRESS root:outputtransform.py:560 Building search statistics
INFO     root:outputtransform.py:390 Reading precursors.tsv file
INFO     root:outputtransform.py:576 Writing stat output to disk
PROGRESS root:outputtransform.py:607 Performing label free quantification
INFO     root:outputtransform.py:390 Reading precursors.tsv file
INFO     root:outputtransform.py:123 Accumulating fragment data
INFO     root:outputtransform.py:58 reading frag file for run_0
INFO     root:outputtransform.py:58 reading frag file for run_1
INFO     root:outputtransform.py:58 reading frag file for run_2
PROGRESS root:outputtransform.py:633 Performing label free quantification on the pg level
INFO     root:outputtransform.py:208 Filtering fragments by quality
INFO     root:outputtransform.py:255 Performing label-free quantification using directLFQ
INFO     directlfq.normalization:normalization.py:239 to few values for normalization without missing values. Including missing values
INFO     directlfq.protein_intensity_estimation:protein_intensity_estimation.py:32 24 lfq-groups total
INFO     directlfq.protein_intensity_estimation:protein_intensity_estimation.py:107 using 8 processes
================================================================================================== warnings summary ===================================================================================================
tests/unit_tests/test_fragcomp.py::test_fragment_competition
  /Users/georgwallmann/Documents/git/alphadia/alphadia/fragcomp.py:189: FutureWarning: The provided callable <built-in function min> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
    index_df = frag_df.groupby("_candidate_idx", as_index=False).agg(

tests/unit_tests/test_fragcomp.py::test_fragment_competition
  /Users/georgwallmann/Documents/git/alphadia/alphadia/fragcomp.py:189: FutureWarning: The provided callable <built-in function max> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
    index_df = frag_df.groupby("_candidate_idx", as_index=False).agg(

tests/unit_tests/test_fragcomp.py::test_fragment_competition
  /Users/georgwallmann/Documents/git/alphadia/alphadia/fragcomp.py:247: FutureWarning: The provided callable <built-in function min> is currently using SeriesGroupBy.min. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "min" instead.
    index_df = psm_df.groupby("window_idx", as_index=False).agg(

tests/unit_tests/test_fragcomp.py::test_fragment_competition
  /Users/georgwallmann/Documents/git/alphadia/alphadia/fragcomp.py:247: FutureWarning: The provided callable <built-in function max> is currently using SeriesGroupBy.max. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "max" instead.
    index_df = psm_df.groupby("window_idx", as_index=False).agg(

tests/unit_tests/test_outputtransform.py::test_output_transform
  /Users/georgwallmann/Documents/git/alphadia/alphadia/outputtransform.py:458: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    psm_df["mods"].fillna("", inplace=True)

tests/unit_tests/test_outputtransform.py::test_output_transform
  /Users/georgwallmann/Documents/git/alphadia/alphadia/outputtransform.py:461: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
    psm_df["mod_sites"].fillna("", inplace=True)

tests/unit_tests/test_outputtransform.py::test_output_transform
  /Users/georgwallmann/miniconda3/envs/alpha/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:691: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.
    warnings.warn(

tests/unit_tests/test_outputtransform.py::test_output_transform
  /Users/georgwallmann/Documents/git/alphadia/alphadia/fdr.py:403: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
    plt.show()

tests/unit_tests/test_plotting.py::test_plot_cycle
  /Users/georgwallmann/Documents/git/alphadia/alphadia/plotting/cycle.py:189: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
    cmap = cm.get_cmap(cmap_name)

tests/unit_tests/test_plotting.py::test_plot_cycle
  /Users/georgwallmann/Documents/git/alphadia/alphadia/plotting/cycle.py:46: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead.
    cmap = cm.get_cmap(cmap_name)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================================================== short test summary info ===============================================================================================
FAILED tests/unit_tests/test_outputtransform.py::test_output_transform - IndexError: list index out of range
============================================================================== 1 failed, 53 passed, 5 deselected, 10 warnings in 34.87s ===============================================================================

directlfq gui windows won't start (version 0.2.13 and 0.2.14)

First of all,
thanks a lot for that great tool, very useful.
Unfortunately, the last versions 0.2.13 and 0.2.14 won't start (windows gui) the version 0.2.11 seems to be the last working version, although it comes with a warning (below) that seems not to affect running data.
thanks again for your help


WARNING:param.TextInput: Providing a width-responsive sizing_mode ('stretch_width') and a fixed width is not supported. Converting fixed width to min_width. If you intended the component to be fully width-responsive remove the heightsetting, otherwise change it to min_height. To error on the incorrect specification disable the config.layout_compatibility option.
WARNING:param.TextInput: Providing a width-responsive sizing_mode ('stretch_width') and a fixed width is not supported. Converting fixed width to min_width. If you intended the component to be fully width-responsive remove the heightsetting, otherwise change it to min_height. To error on the incorrect specification disable the config.layout_compatibility option.
WARNING:param.(optional) If you are using MaxQuant evidence.txt or peptides.txt files, you can add the link to the corresponding proteinGroups.txt file (will improve peptide-to-protein mapping): Setting non-parameter attribute default=None using a mechanism intended only for parameters
WARNING:param.: Setting non-parameter attribute default=None using a mechanism intended only for parameters
WARNING:param.TextInput: Providing a width-responsive sizing_mode ('stretch_width') and a fixed width is not supported. Converting fixed width to min_width. If you intended the component to be fully width-responsive remove the heightsetting, otherwise change it to min_height. To error on the incorrect specification disable the config.layout_compatibility option.
WARNING:param.: Setting non-parameter attribute default=None using a mechanism intended only for parameters
WARNING:param.TextInput: Providing a width-responsive sizing_mode ('stretch_width') and a fixed width is not supported. Converting fixed width to min_width. If you intended the component to be fully width-responsive remove the heightsetting, otherwise change it to min_height. To error on the incorrect specification disable the config.layout_compatibility option.
WARNING:param.: Setting non-parameter attribute default=None using a mechanism intended only for parameters
panel\util\warnings.py:26: PanelDeprecationWarning: "Row(..., background='#eaeaea')" is deprecated and will be removed in version 1.3, use "Row(..., styles={'background': '#eaeaea'})" instead.
warnings.warn(message, category, stacklevel=stacklevel)
Launching server at http://localhost:51170
WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: Progress(id='p1152', ...)


DIA-NN proteotypic peptide filtering

Hello,

I have noticed that directLFQ does not filter out shared peptides from the DIA-NN output so the quant can be highly misleading for certain protein families. I can use the generic input format with the precursors table as a work-around, but it would be great to filter for proteotypic peptides only by default or as an optional parameter if possible.

Thanks!

AlphaPept input files

I found error while executing AlphaPept input file. According to the intable_config.yaml, the results_peptides.csv file should have these columns ['protein_group', 'decoy', 'ms1_int_sum', 'charge', 'shortname', 'sequence']. However, the column 'ms1_int_sum' is not present in the results_peptides.csv generated by the AlphaPept. Instead, the AlphaPept generated files has these columns 'ms1_int_sum_apex', 'ms1_int_sum_area', and 'ms1_int_sum_apex_dn'. Therefore, when we tried to execute AlphaPept generated results_peptides.csv file, the program terminates with error.

TypeError: format not specified in intable_config.yaml!
image

It would be helpful if it can be clarified. Also, I didnt found any sample data file/examaple for AlphaPept that can be executed with directlfq.

Lastly, I also have confusion, the results-peptides.csv file is generated in the quantification step of the AlphaPept. Can we take the required columns after the protein grouping step from the results.hdf (dataset = 'protein_fdr'). 'protein_fdr' table is saved in the results.hdf after protein grouping step.

Default config.QUANT_ID and config.PROTEIN_ID not used on import

Describe the bug
Somehow the default values defined for config.QUANT_ID and config.PROTEIN_ID are not set when importing directlfq as module.

The result is a rather confusing pandas error None of [None, None] are in the columns for users.
It can be mitigated by calling lfqconfig.set_global_protein_and_ion_id(protein_id = 'protein', quant_id = 'ion') before.

To Reproduce
Steps to reproduce the behavior:

  1. import direct LFQ
  2. call directLFQ on a dataframe with default columns protein and ion.

Logs

Traceback (most recent call last):
  File "d:\alphadia\alphadia\planning.py", line 274, in run
    output.build(workflow_folder_list, base_spec_lib)
  File "d:\alphadia\alphadia\outputtransform.py", line 365, in build
    _ = self.build_protein_table(folder_list, psm_df=psm_df, save=True)
  File "d:\alphadia\alphadia\outputtransform.py", line 571, in build_protein_table
    protein_df = qb.lfq(
  File "d:\alphadia\alphadia\outputtransform.py", line 269, in lfq
    lfq_df = lfqutils.index_and_log_transform_input_df(intensity_df)
  File "D:\Maria\anaconda\envs\alpha\lib\site-packages\directlfq\utils.py", line 323, in index_and_log_transform_input_df
    data_df = data_df.set_index([config.PROTEIN_ID, config.QUANT_ID])
  File "D:\Maria\anaconda\envs\alpha\lib\site-packages\pandas\core\frame.py", line 5859, in set_index
    raise KeyError(f"None of {missing} are in the columns")
KeyError: 'None of [None, None] are in the columns'
0:00:57.756267 �ERROR: Output failed with error 'None of [None, None] are in the columns''None of [None, None] are in the columns'

Version (please complete the following information):

  • Installation Type Pip on windows
  • directLFQ 0.2.13

Normalisation values are extremly high

Describe the bug

Its not really a bug but i recognized that the values of the normalisation output are overall very high (around 1e15 to 1e16).
I am wondering why this is so. Processing was done with the python version on windows.

To Reproduce

simply run lfq_manager.run_lfq(CustomDf.aq_reformat.txt)

Spectronaut Report Schema

Hi, great work and thanks for sharing!

While playing around, I could not access your preferred export schema for Spectronaut's report files.
It doesn't seem to exist (404), neither for precursor nor for fragment ion quan.

Is there an update going on?

Best, Karl

Minimum number for peptides for LFQ

Hi @ammarcsj ,

I am back with an additional question :)
Is there a way to specify the minimum number of peptides a protein should have to get a LFQ value?
Neither in the GUI nor in the code I don't see a filter on this, am I wrong?
It would also be nice to get in the output table which are the peptides that were used for the LFQ calculation.

Feature request: parameter for output folder

It would be very useful to be able to specify where the output should go.

Folders with the original data might be read-only, subject to file-watchers, or generally required to remain 'clean' e.g. for uncomplicated PRIDE uploads.

First QC results indicate reduced quantitative accuracy of directLFQ vs MaxLFQ

Thanks for this very interesting and easily accessible work!

Unfortunately my first attempt to reprocess a mixed proteome standard (Human/Ecoli -> 1:1 vs1:3) processed via DIA-NN only with default options resulted in clearly reduced quantitative accuracy for directLFQ.

Is this actually to be expected?

Kind regards
Michael

Capture

directLFQ fails to apply grouping in v0.2.17

Describe the bug
It looks like directLFQ ignores the grouping variable in the most recent version. Instead of 10,330 protein groups, all 174,000 fragments are handled like individual proteins and no output is generated.

Broken output 0.2.17

2024-02-15 22:21:38> ================ Protein FDR =================
2024-02-15 22:21:38> Unique protein groups in output
2024-02-15 22:21:38>   1% protein FDR: 10,330
2024-02-15 22:21:38> 
2024-02-15 22:21:38> Unique precursor in output
2024-02-15 22:21:38>   1% protein FDR: 112,094
2024-02-15 22:21:38> ================================================
2024-02-15 22:21:38> Building search statistics
2024-02-15 22:21:40> Writing stat output to disk
2024-02-15 22:21:40> Performing label free quantification
2024-02-15 22:21:41> Accumulating fragment data
2024-02-15 22:21:41> reading frag file for 20231212_OA1_MCT_SA_M768_AD02_HYE_200ng_quadPolON_sample3_01
...
2024-02-15 22:22:13> reading frag file for 20231212_OA1_MCT_SA_M768_AD02_HYE_200ng_quadPolON_sample4_01
2024-02-15 22:22:16> Performing label free quantification on the pg level
2024-02-15 22:22:16> Filtering fragments by quality
2024-02-15 22:22:16> Performing label-free quantification using directLFQ
2024-02-15 22:22:18> 10330 lfq-groups total
2024-02-15 22:22:39> using 8 processes
2024-02-15 22:22:43> lfq-object 0
2024-02-15 22:22:43> lfq-object 100
2024-02-15 22:22:43> lfq-object 200
2024-02-15 22:22:43> lfq-object 300
2024-02-15 22:22:43> lfq-object 400
2024-02-15 22:22:43> lfq-object 500
...
2024-02-15 22:24:08> lfq-object 173800
2024-02-15 22:24:08> lfq-object 173900
2024-02-15 22:24:08> lfq-object 174000
2024-02-15 22:25:17> Writing pg output to disk
2024-02-15 22:25:19> Writing psm output to disk

Correct output 0.2.14

2024-02-15 22:33:11> ================ Protein FDR =================
2024-02-15 22:33:11> Unique protein groups in output
2024-02-15 22:33:11>   1% protein FDR: 10,330
2024-02-15 22:33:11> 
2024-02-15 22:33:11> Unique precursor in output
2024-02-15 22:33:11>   1% protein FDR: 112,094
2024-02-15 22:33:11> ================================================
2024-02-15 22:33:11> Building search statistics
2024-02-15 22:33:13> Writing stat output to disk
2024-02-15 22:33:13> Performing label free quantification
2024-02-15 22:33:13> Accumulating fragment data
2024-02-15 22:33:13> reading frag file for 20231212_OA1_MCT_SA_M768_AD02_HYE_200ng_quadPolON_sample3_01
...
2024-02-15 22:33:46> reading frag file for 20231212_OA1_MCT_SA_M768_AD02_HYE_200ng_quadPolON_sample4_01
2024-02-15 22:33:48> Performing label free quantification on the pg level
2024-02-15 22:33:48> Filtering fragments by quality
2024-02-15 22:33:49> Performing label-free quantification using directLFQ
2024-02-15 22:33:51> 10330 prots total
2024-02-15 22:33:51> using 8 processes
2024-02-15 22:33:52> prot 0
2024-02-15 22:33:53> prot 1300
2024-02-15 22:33:53> prot 700
2024-02-15 22:33:53> prot 1000
2024-02-15 22:33:53> prot 1700
2024-02-15 22:33:53> prot 2300
2024-02-15 22:33:53> prot 400
...
2024-02-15 22:33:57> prot 9900
2024-02-15 22:33:57> prot 10200
2024-02-15 22:33:57> prot 10000
2024-02-15 22:33:57> prot 10300
2024-02-15 22:34:07> Writing pg output to disk
2024-02-15 22:34:08> Writing psm output to disk

DIA-NN and FDR filtering

First of all great work! :)

Then a quick question, does directlfq performs any sort of FDR filtering (as iq does) for DIA-NN data?
If not, I guess one could do this beforehand, i.e. by filtering the report.tsv, but it would be quite useful to have it done on the fly by directlfq.

Maxquant_evidence Config TypeError

Describe the bug
Running on MQ evidence file from the CLI raises TypeError: format not specified in intable_config.yaml

To Reproduce
Steps to reproduce the behavior:

directlfq lfq -i "yaddayadda\txt\evidence.txt" -it maxquant_evidence

Direct lfq Console

Starting directLFQ analysis
You provided a MaxQuant peptide or evidence file as input. To have the identical ProteinGroups as in the MaxQuant analysis, please provide the ProteinGroups.txt file as well.
Traceback (most recent call last):
  File "D:\pipenvs\directlfq\Scripts\directlfq-script.py", line 33, in <module>
    sys.exit(load_entry_point('directlfq==0.2.3', 'console_scripts', 'directlfq')())
  File "d:\pipenvs\directlfq\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "d:\pipenvs\directlfq\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "d:\pipenvs\directlfq\lib\site-packages\click\core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "d:\pipenvs\directlfq\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "d:\pipenvs\directlfq\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "d:\pipenvs\directlfq\lib\site-packages\directlfq\cli.py", line 202, in run_directlfq
    directlfq.lfq_manager.run_lfq(**kwargs)
  File "d:\pipenvs\directlfq\lib\site-packages\directlfq\lfq_manager.py", line 36, in run_lfq
    input_df = lfqutils.import_data(input_file=input_file, input_type_to_use=input_type_to_use)
  File "d:\pipenvs\directlfq\lib\site-packages\directlfq\utils.py", line 783, in import_data
    file_to_read = reformat_and_save_input_file(input_file=input_file, input_type_to_use=input_type_to_use)
  File "d:\pipenvs\directlfq\lib\site-packages\directlfq\utils.py", line 792, in reformat_and_save_input_file
    input_type, config_dict_for_type, sep = get_input_type_and_config_dict(input_file, input_type_to_use)
  File "d:\pipenvs\directlfq\lib\site-packages\directlfq\utils.py", line 856, in get_input_type_and_config_dict
    raise TypeError("format not specified in intable_config.yaml!")
TypeError: format not specified in intable_config.yaml!

Version (please complete the following information):

  • Native python 3.8.10 venv + pip [stable, developer-stable]
  • Windows 10
  • directlfq 0.2.3

Additional context
I already checked on github and in my local clone: maxquant_evidence is in the yaml file. I tied this both on DDA and DIA output and both raise the same error.
Using the gui from the same environment starts fine, but autodetects the input as maxquant_evidence_leading_razor_protein.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.