Coder Social home page Coder Social logo

Comments (9)

LonnekeScheffer avatar LonnekeScheffer commented on July 29, 2024

Hi Genokarma,

Thanks for reaching out! You mentioned the process has been running for 5 hours, but depending on the size of the dataset and specific methods and parameters used, some processes may be very computationally expensive and can indeed run for a long time. Since I don't have more information on the analysis you are trying to run, I cannot help you determine what the cause of this long running time may be. If you would like my input on that, you're welcome to share the YAML analysis specification with me.

As for debugging the problem: You could try running a small example to ensure everything works, for example using only a small number of repertoires or sequences. There is also an automatic test instruction which can be run, to check if the immuneML installation works at all: https://docs.immuneml.uio.no/latest/installation/install_with_package_manager.html#testing-immuneml

Since this issue as of now does not point towards a concrete bug, I will close it for now. Feel free to reach out on [email protected] if you have more questions.

from immuneml.

Genokarma avatar Genokarma commented on July 29, 2024

Hi LonnekeScheffer,
Its been more than 12 hours and still showing same thing
My sample details are: 15 breast cancer study samples; 30 Control samples.
Here I have attached my yaml specifications
1.converted ImmunoSEQRearrangement data in to ImmuneML format
Script for conversion:
definitions:
datasets:
dataset:
format: ImmunoSEQSample
params:
is_repertoire: true
metadata_file: /data/metadata2.csv
path: /data/DataS2/
region_type: IMGT_CDR3
result_path: /data/
instructions:
my_dataset_generation_instruction:
datasets:
- dataset
export_formats:
- ImmuneML
type: DatasetExport

2.Use following script to train model
definitions:
datasets:
dataset:
format: ImmuneML
params:
path: /data/
result_path: /data/results
encodings:
encoding_1:
KmerFrequency:
k: 3
reads: all
sequence_encoding: CONTINUOUS_KMER
ml_methods:
k_nearest_neighbors:
KNN:
n_neighbors:
- 3
- 5
- 7
show_warnings: true
model_selection_cv: true
model_selection_n_folds: 5
logistic_regression:
LogisticRegression:
C:
- 0.01
- 0.1
- 1
- 10
- 100
class_weight:
- balanced
penalty:
- l1
show_warnings: true
model_selection_cv: true
model_selection_n_folds: 5
random_forest:
RandomForestClassifier:
class_weight:
- balanced
n_estimators:
- 10
- 50
- 100
show_warnings: true
model_selection_cv: true
model_selection_n_folds: 5
support_vector_machine:
SVC:
C:
- 0.01
- 0.1
- 1
- 10
- 100
class_weight:
- balanced
dual: false
penalty:
- l1
show_warnings: true
model_selection_cv: true
model_selection_n_folds: 5
motifs: {}
preprocessing_sequences: {}
reports:
benchmark:
MLSettingsPerformance:
name: benchmark
single_axis_labels: false
x_label_position: -0.12
y_label_position: -0.08
coefficients:
Coefficients:
coefs_to_plot:
- N_LARGEST
n_largest:
- 25
name: coefficients
signals: {}
simulations: {}
instructions:
inst1:
assessment:
reports:
models:
- coefficients
split_count: 5
split_strategy: random
training_percentage: 0.7
dataset: dataset
labels:
- signal_disease
metrics: []
number_of_processes: 10
optimization_metric: accuracy
refit_optimal_model: true
reports:
- benchmark
selection:
split_count: 1
split_strategy: random
training_percentage: 0.7
settings:
- encoding: encoding_1
ml_method: random_forest
preprocessing: null
- encoding: encoding_1
ml_method: logistic_regression
preprocessing: null
- encoding: encoding_1
ml_method: support_vector_machine
preprocessing: null
- encoding: encoding_1
ml_method: k_nearest_neighbors
preprocessing: null
strategy: GridSearch
type: TrainMLModel
output:
format: HTML

from immuneml.

LonnekeScheffer avatar LonnekeScheffer commented on July 29, 2024

Hi Genokarma,

I don't think there is necessarily any reason why this should not work. The dataset does not seem extremely large. For debugging purposes, I recommend the following steps:

By following these steps, we can pinpoint where the issue might be (e.g., if there is something wrong with the installation, the computer setup, or the dataset). I don't believe there is a bug in immuneML that is causing this, since everything runs like normal on our end, but if we do find such indication we will of course fix it as soon as possible.

As a side note, it is not necessary to convert the dataset to immuneML format first (you can simply use the ImmunoSEQSample import in the same yaml as where the training happens), although it should work like this as well. Also, you have set the number of processes to 10, which may be alright, but please make sure the system you are running this on supports that number of CPUs (specifying too many processes can also slow down the runtime).

from immuneml.

Genokarma avatar Genokarma commented on July 29, 2024

Hi LonnekeScheffer,
I want to express my gratitude for your assistance; your time and efforts are highly appreciated. For your reference, I've attached my dataset files and YAML script. I am utilizing a Docker container, and the command details are provided in the attached README file. Link for dataset and yaml file is https://github.com/Genokarma/ImmuneMLTest

I've encountered an issue while running the process on two different systems. On my MacOS system with 16 CPUs and 16GB RAM, the process gets stuck at "parsing the specification." On the Linux system with 48GB RAM, it encounters an issue with encoding (encoding 1...). It's been more than 24 hrs but not progress.

I have attempted to troubleshoot the problem on both systems without success. Could you please attempt to execute the process or provide any suggestions to address this issue? Your assistance in resolving this issue is invaluable.

from immuneml.

Genokarma avatar Genokarma commented on July 29, 2024

Hello again LonnekeScheffer,

I want to express my gratitude for your assistance; your time and efforts are highly appreciated. For your reference, I've attached my dataset files and YAML script. I am utilizing a Docker container, and the command details are provided in the attached README file. Link for the dataset and yaml file is: https://github.com/Genokarma/ImmuneMLTest

I've encountered an issue while running the process on two different systems. On my MacOS system with 16 CPUs and 16GB RAM, the process gets stuck at "parsing the specification." On the Linux system with 48GB RAM, it encounters an issue with encoding (stuck at encoding 1).

I have attempted to troubleshoot the problem on both systems without success. Could you please attempt to execute the process or provide any suggestions to address this issue? Your assistance in resolving this issue is invaluable.

from immuneml.

LonnekeScheffer avatar LonnekeScheffer commented on July 29, 2024

Dear GenoKarma,

Thanks for sharing the test dataset and YAML. I'm currently very busy (in preparation of my PhD defence), and I will have more time available in the last week of January. In the meantime, it would be helpful to try to run the test and Quickstart examples as mentioned in my previous comments. These examples are small and known to take only a short time to run, and can help us find an indication of whether immuneML is actually getting "stuck" on your system, or simply takes a long time to run.

from immuneml.

Genokarma avatar Genokarma commented on July 29, 2024

Thank you for your prompt response and for sharing the information. I completely understand that you're currently occupied with your PhD defense preparations. Wishing you the best of luck with your PhD defense.

I have used demo data during installation process. In the meantime, I took your advice and ran the Quickstart example again as per your previous suggestions. I'm pleased to inform you that the quickstart/demo went smoothly, and the process was completed successfully. I have attached the screenshots for your reference.

Screenshot from 2024-01-18 18-40-30

I look forward to connecting with you again in the last week of January.

from immuneml.

LonnekeScheffer avatar LonnekeScheffer commented on July 29, 2024

Dear GenoKarma,

My apologies for the delay, it was a busy period. But I have good news; I finally managed to take a deeper look into this issue, and implement a solution. I cloned your github repository and tried to reproduce your immuneML run. I indeed discovered two issues, one bug and one performance issue, which both were introduced during out recent large refactoring for the alpha version of immuneML 3.

Firstly, there was a bug in KmerFequencyEncoder due to some changed variable names. If you encountered this bug, you would run into the following error message:

--- Exception in _encode_examples : 'SequenceMetadata' object has no attribute 'count'

This bug was solved in the latest version of immuneML. I originally thought that this bug may have been the culprit for your analysis. But when I tried to run immuneML on your entire dataset, I indeed found that I did not even encounter the error above, because immuneML was taking a long time at some step earlier in the encoding process. I was able to locate and fix the issue, the KmerFrequencyEncoder should be a lot faster. With the updated code, encoding your dataset with 4 parallel processes took 8 minutes on my computer.

So in conclusion, if you reinstall the latest version of immuneML (version 3.0.0a3), encoding will be a lot faster. Since immuneML 3 is still in its 'alpha' version, there have been major refactorings and ongoing developments which have not yet been thoroughly tested. We therefore highly appreciate the user feedback, and I will try my best to resolve issues as soon as I can. However, if some issue is halting your work, it is always possible to downgrade to the latest stable immuneML release (v2.2.5).

All the best,
Lonneke

from immuneml.

Genokarma avatar Genokarma commented on July 29, 2024

Hi Lonneke,

Hope your viva went well! Thank you for reaching out.

Yes I have tried your suggestions with newer as well as stable version(s). However, I am not able to run as newer version provide some other error. I have attached log.txt for your reference. Have you prepare docker image for the newer version of ImmuneMl, if yes please share with me.

log.txt
Once again thank you.
with regards.

from immuneml.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.