Classification of protein localization sites in yeast bacteria using Random Forest Classifier
The dataset can be obtained from :https://archive.ics.uci.edu/ml/datasets/Yeast
Attribute Information:
- Sequence Name: Accession number for the SWISS-PROT database
- mcg: McGeoch's method for signal sequence recognition.
- gvh: von Heijne's method for signal sequence recognition.
- alm: Score of the ALOM membrane spanning region prediction program.
- mit: Score of discriminant analysis of the amino acid content of the N-terminal region (20 residues long) of mitochondrial and non-mitochondrial proteins.
- erl: Presence of "HDEL" substring (thought to act as a signal for retention in the endoplasmic reticulum lumen). Binary attribute.
- pox: Peroxisomal targeting signal in the C-terminus.
- vac: Score of discriminant analysis of the amino acid content of vacuolar and extracellular proteins.
- nuc: Score of discriminant analysis of nuclear localization signals of nuclear and non-nuclear proteins.