Comments (9)
Hi Genokarma,
Thanks for reaching out! You mentioned the process has been running for 5 hours, but depending on the size of the dataset and specific methods and parameters used, some processes may be very computationally expensive and can indeed run for a long time. Since I don't have more information on the analysis you are trying to run, I cannot help you determine what the cause of this long running time may be. If you would like my input on that, you're welcome to share the YAML analysis specification with me.
As for debugging the problem: You could try running a small example to ensure everything works, for example using only a small number of repertoires or sequences. There is also an automatic test instruction which can be run, to check if the immuneML installation works at all: https://docs.immuneml.uio.no/latest/installation/install_with_package_manager.html#testing-immuneml
Since this issue as of now does not point towards a concrete bug, I will close it for now. Feel free to reach out on [email protected] if you have more questions.
from immuneml.
Hi LonnekeScheffer,
Its been more than 12 hours and still showing same thing
My sample details are: 15 breast cancer study samples; 30 Control samples.
Here I have attached my yaml specifications
1.converted ImmunoSEQRearrangement data in to ImmuneML format
Script for conversion:
definitions:
datasets:
dataset:
format: ImmunoSEQSample
params:
is_repertoire: true
metadata_file: /data/metadata2.csv
path: /data/DataS2/
region_type: IMGT_CDR3
result_path: /data/
instructions:
my_dataset_generation_instruction:
datasets:
- dataset
export_formats:
- ImmuneML
type: DatasetExport
2.Use following script to train model
definitions:
datasets:
dataset:
format: ImmuneML
params:
path: /data/
result_path: /data/results
encodings:
encoding_1:
KmerFrequency:
k: 3
reads: all
sequence_encoding: CONTINUOUS_KMER
ml_methods:
k_nearest_neighbors:
KNN:
n_neighbors:
- 3
- 5
- 7
show_warnings: true
model_selection_cv: true
model_selection_n_folds: 5
logistic_regression:
LogisticRegression:
C:
- 0.01
- 0.1
- 1
- 10
- 100
class_weight:
- balanced
penalty:
- l1
show_warnings: true
model_selection_cv: true
model_selection_n_folds: 5
random_forest:
RandomForestClassifier:
class_weight:
- balanced
n_estimators:
- 10
- 50
- 100
show_warnings: true
model_selection_cv: true
model_selection_n_folds: 5
support_vector_machine:
SVC:
C:
- 0.01
- 0.1
- 1
- 10
- 100
class_weight:
- balanced
dual: false
penalty:
- l1
show_warnings: true
model_selection_cv: true
model_selection_n_folds: 5
motifs: {}
preprocessing_sequences: {}
reports:
benchmark:
MLSettingsPerformance:
name: benchmark
single_axis_labels: false
x_label_position: -0.12
y_label_position: -0.08
coefficients:
Coefficients:
coefs_to_plot:
- N_LARGEST
n_largest:
- 25
name: coefficients
signals: {}
simulations: {}
instructions:
inst1:
assessment:
reports:
models:
- coefficients
split_count: 5
split_strategy: random
training_percentage: 0.7
dataset: dataset
labels:
- signal_disease
metrics: []
number_of_processes: 10
optimization_metric: accuracy
refit_optimal_model: true
reports:
- benchmark
selection:
split_count: 1
split_strategy: random
training_percentage: 0.7
settings:
- encoding: encoding_1
ml_method: random_forest
preprocessing: null
- encoding: encoding_1
ml_method: logistic_regression
preprocessing: null
- encoding: encoding_1
ml_method: support_vector_machine
preprocessing: null
- encoding: encoding_1
ml_method: k_nearest_neighbors
preprocessing: null
strategy: GridSearch
type: TrainMLModel
output:
format: HTML
from immuneml.
Hi Genokarma,
I don't think there is necessarily any reason why this should not work. The dataset does not seem extremely large. For debugging purposes, I recommend the following steps:
- kill the existing run
- make sure you have the latest version of immuneML installed
- test if the immuneML installation works correctly according to the documentation: https://docs.immuneml.uio.no/latest/installation/install_with_package_manager.html#testing-immuneml
- try running the quickstart example: https://docs.immuneml.uio.no/latest/quickstart/cli_yaml.html
- try to run the TrainMLModel instruction with a minimal example, for instance, only running logistic regression, or perhaps a smaller dataset as well.
By following these steps, we can pinpoint where the issue might be (e.g., if there is something wrong with the installation, the computer setup, or the dataset). I don't believe there is a bug in immuneML that is causing this, since everything runs like normal on our end, but if we do find such indication we will of course fix it as soon as possible.
As a side note, it is not necessary to convert the dataset to immuneML format first (you can simply use the ImmunoSEQSample import in the same yaml as where the training happens), although it should work like this as well. Also, you have set the number of processes to 10, which may be alright, but please make sure the system you are running this on supports that number of CPUs (specifying too many processes can also slow down the runtime).
from immuneml.
Hi LonnekeScheffer,
I want to express my gratitude for your assistance; your time and efforts are highly appreciated. For your reference, I've attached my dataset files and YAML script. I am utilizing a Docker container, and the command details are provided in the attached README file. Link for dataset and yaml file is https://github.com/Genokarma/ImmuneMLTest
I've encountered an issue while running the process on two different systems. On my MacOS system with 16 CPUs and 16GB RAM, the process gets stuck at "parsing the specification." On the Linux system with 48GB RAM, it encounters an issue with encoding (encoding 1...). It's been more than 24 hrs but not progress.
I have attempted to troubleshoot the problem on both systems without success. Could you please attempt to execute the process or provide any suggestions to address this issue? Your assistance in resolving this issue is invaluable.
from immuneml.
Hello again LonnekeScheffer,
I want to express my gratitude for your assistance; your time and efforts are highly appreciated. For your reference, I've attached my dataset files and YAML script. I am utilizing a Docker container, and the command details are provided in the attached README file. Link for the dataset and yaml file is: https://github.com/Genokarma/ImmuneMLTest
I've encountered an issue while running the process on two different systems. On my MacOS system with 16 CPUs and 16GB RAM, the process gets stuck at "parsing the specification." On the Linux system with 48GB RAM, it encounters an issue with encoding (stuck at encoding 1).
I have attempted to troubleshoot the problem on both systems without success. Could you please attempt to execute the process or provide any suggestions to address this issue? Your assistance in resolving this issue is invaluable.
from immuneml.
Dear GenoKarma,
Thanks for sharing the test dataset and YAML. I'm currently very busy (in preparation of my PhD defence), and I will have more time available in the last week of January. In the meantime, it would be helpful to try to run the test and Quickstart examples as mentioned in my previous comments. These examples are small and known to take only a short time to run, and can help us find an indication of whether immuneML is actually getting "stuck" on your system, or simply takes a long time to run.
from immuneml.
Thank you for your prompt response and for sharing the information. I completely understand that you're currently occupied with your PhD defense preparations. Wishing you the best of luck with your PhD defense.
I have used demo data during installation process. In the meantime, I took your advice and ran the Quickstart example again as per your previous suggestions. I'm pleased to inform you that the quickstart/demo went smoothly, and the process was completed successfully. I have attached the screenshots for your reference.
![Screenshot from 2024-01-18 18-40-30](https://private-user-images.githubusercontent.com/135817703/297755232-59029728-b73c-470a-b786-51b781612080.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk0MTk0MTUsIm5iZiI6MTcxOTQxOTExNSwicGF0aCI6Ii8xMzU4MTc3MDMvMjk3NzU1MjMyLTU5MDI5NzI4LWI3M2MtNDcwYS1iNzg2LTUxYjc4MTYxMjA4MC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNjI2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDYyNlQxNjI1MTVaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1lMDdhZmJjOWI1OGIzYjVlMGU5YzE1ZTZmYTgwNTFkODYzNDViMGY1N2EzN2NkNWZiMWE5ZTM1ODNkOTg2ZTMzJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.2v5SUHF7jLuFLbl9WhyMOyF8Xttm7ufQMv37beHBjGg)
I look forward to connecting with you again in the last week of January.
from immuneml.
Dear GenoKarma,
My apologies for the delay, it was a busy period. But I have good news; I finally managed to take a deeper look into this issue, and implement a solution. I cloned your github repository and tried to reproduce your immuneML run. I indeed discovered two issues, one bug and one performance issue, which both were introduced during out recent large refactoring for the alpha version of immuneML 3.
Firstly, there was a bug in KmerFequencyEncoder due to some changed variable names. If you encountered this bug, you would run into the following error message:
--- Exception in _encode_examples : 'SequenceMetadata' object has no attribute 'count'
This bug was solved in the latest version of immuneML. I originally thought that this bug may have been the culprit for your analysis. But when I tried to run immuneML on your entire dataset, I indeed found that I did not even encounter the error above, because immuneML was taking a long time at some step earlier in the encoding process. I was able to locate and fix the issue, the KmerFrequencyEncoder should be a lot faster. With the updated code, encoding your dataset with 4 parallel processes took 8 minutes on my computer.
So in conclusion, if you reinstall the latest version of immuneML (version 3.0.0a3), encoding will be a lot faster. Since immuneML 3 is still in its 'alpha' version, there have been major refactorings and ongoing developments which have not yet been thoroughly tested. We therefore highly appreciate the user feedback, and I will try my best to resolve issues as soon as I can. However, if some issue is halting your work, it is always possible to downgrade to the latest stable immuneML release (v2.2.5).
All the best,
Lonneke
from immuneml.
Hi Lonneke,
Hope your viva went well! Thank you for reaching out.
Yes I have tried your suggestions with newer as well as stable version(s). However, I am not able to run as newer version provide some other error. I have attached log.txt for your reference. Have you prepare docker image for the newer version of ImmuneMl, if yes please share with me.
log.txt
Once again thank you.
with regards.
from immuneml.
Related Issues (16)
- ValueError: RegionType NaN HOT 1
- Add check to ML parser for positive class setting
- perhaps not fail when not supplying sequence position weights in full sequence implantation
- perhaps add a warning to users in docs in relevant sections of full sequence implantation that data import by default trims first and last amino acid residues
- please add a version parameter
- ImmuneML requires scikit-learn==1.2.2 to work, the default pip install uses scikit-learn==1.3.0 HOT 2
- IMGT positions are computed wrong HOT 1
- Galaxy interface trims CDR3 residues in create dataset HOT 3
- More example of yaml please HOT 3
- ImmuneML galaxy-error in training model HOT 1
- Error while running the quickstart analysis HOT 8
- Matches report with MatchedRegex encoder encountered an error and could not be generated HOT 11
- KeyError in exploratory analysis HOT 1
- WARNING: ABCMeta: chain was not set for sequence 0, skipping the sequence for matching... HOT 8
- error when exporting evenness encoded data as design matrix HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from immuneml.