Abdullah Agah Kanat
Oct, 2022
In this project, we will apply some data pre-processing techniques to the expression of 181 genes, each represented by 500 attributes, with the python program. Using the Pandas library, the dataset was taken and converted into a list. The matplotlib library was used to generate the desired plots. The Random library was used to make random selections from the data, and finally the NumPy library was used to use the arrange() and linspace() functions. These are the only libraries required to run this program. Python IDLE was preferred as the editor and functions were avoided as much as possible while writing the codes. The codes contain calculations simply and in different ways. Where necessary, explanations are made with comment lines. In this study, firstly, a discretization was obtained by applying the equal width approach and the equal frequency approach on three randomly selected data. We will also see the distributions of the data on the graph for each gene used. Euclidian Distance was then calculated for five randomly selected genes. For this, calculations were made as double combinations of five gene data. Then, Cosine Similarity calculations were made on the same five pairs, and calculations were made for each 2-combination of these five genes. Then, Correlation calculations were made on the same data groups. Finally, all the data used were normalized according to the negative and positive numbers in each gene. With the new data set obtained, we will re-apply all the operations we have applied before, and we will report the results in this report. Analysis and tables will be used where necessary for results. The code and the dataset used will be available with the report.
In this section, operations will be performed on the data set with the methods specified for Attribute Discretization and Attribute Similarity.
In this section, equal width approach and equal frequency approach will be applied for three randomly selected genes.
import pandas as pd
import matplotlib.pyplot as plt
import numpy
import random
df = pd.read_csv('PA1data')
data_list = df.values.tolist()
random1 = random.randint(0,179)
random2 = random.randint(0,179)
random3 = random.randint(0,179)
value1 = data_list[random1]
value2 = data_list[random2]
value3 = data_list[random3]
Figure 1 - Data Selection
We will use 4 different groups for this process. First we need to find the width value. For this, we will divide the difference between the highest value and the smallest value in the gene by the number of groups we want. The first width value that we will add to the smallest value later will be our first border. Then we will add the same width value to the result we found, determine the other two borders and place all the data in these 4 groups.
#Discretization
#Question1>a>I-----------------------------------------------------------------------------
#Calculating width for 4 different groups
#Calculating the width
width = (max(value1)-min(value1))/4
width = round(width,1)
print("Width for {}. gene is : {}".format(random1, width) )
#Calculating the borders
firstline = min(value1)+width
secondline = min(value1)+width+width
thirdline = min(value1)+width+width+width
firstline = round(firstline,1)
secondline = round(secondline,1)
thirdline = round(thirdline,1)
group1forvalue1 = []
group2forvalue1 = []
group3forvalue1 = []
group4forvalue1 = []
#Placing the values to the groups for value 1 list
for i in range(500):
if (value1[i]< firstline):
group1forvalue1.append(value1[i])
elif (value1[i]<secondline):
group2forvalue1.append(value1[i])
elif (value1[i]<thirdline):
group3forvalue1.append(value1[i])
else:
group4forvalue1.append(value1[i])
print("Equal width approach for {}. gene : ".format(random1))
print("\nThe attributes between {} and {} :".format(min(value1), firstline))
print(group1forvalue1)
print("\nThe attributes between {} and {} :".format(firstline, secondline))
print(group2forvalue1)
print("\nThe attributes between {} and {} :".format(secondline, thirdline))
print(group3forvalue1)
print("\nThe attributes between {} and {} :".format(thirdline, max(value1)))
print(group4forvalue1)
if(len(group1forvalue1)>0):
x_axis1_1 = numpy.linspace(min(group1forvalue1), max(group1forvalue1),len(group1forvalue1))
plt.scatter(x_axis1_1 , group1forvalue1, color='g')
if(len(group2forvalue1)>0):
x_axis2_1 = numpy.linspace(min(group2forvalue1), max(group2forvalue1),len(group2forvalue1))
plt.scatter(x_axis2_1 , group2forvalue1, color='r')
if(len(group3forvalue1)>0):
x_axis3_1 = numpy.linspace(min(group3forvalue1), max(group3forvalue1),len(group3forvalue1))
plt.scatter(x_axis3_1, group3forvalue1, color='b')
if(len(group4forvalue1)>0):
x_axis4_1 = numpy.linspace(min(group4forvalue1), max(group4forvalue1),len(group4forvalue1))
plt.scatter(x_axis4_1, group4forvalue1, color='hotpink')
plt.axvline(x = firstline, color = 'r')
plt.axvline(x = secondline, color = 'b')
plt.axvline(x = thirdline, color = 'hotpink')
plt.title("Equal width approach for {}. gene : ".format(random1))
plt.xlabel("Width")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 2 - First Random Gene's Equal Width Approach Process
Width for 72. gene is : 1590.0
Equal width approach for 72. gene :
The attributes between -509.8 and 1080.2 :
[369.7, 49.3, -26.8, -187.7, 18.7, 935.5, 11.8, 475.7, 347.4, 969.6, 164.0, 9.6, 124.5, 11.6, 6.1, 105.7, -102.1, 11.2, 25.6, 24.4, 106.8, 27.4, 17.2, 1.5, -116.4, -29.4, 30.0, -30.6, 8.6, 3.6, 12.3, 36.1, 675.4, 61.9, 111.4, 24.5, 313.5, 452.3, 25.7, -11.2, 281.4, 424.1, 17.6, 23.5, -6.9, 173.7, 0.8, 95.2, 33.9, 32.5, 38.8, 43.4, -24.2, 48.4, 6.1, 42.6, 864.8, 11.5, 17.9, 77.9, 20.2, 7.6, -4.8, 0.8, 7.4, 47.9, 61.0, 216.0, 15.6, 93.8, 33.6, -43.8, 12.1, -143.2, 27.9, 95.7, 26.7, -41.9, 83.5, 330.7, 136.4, 3.7, 28.0, 11.8, -3.8, 111.3, 6.9, -33.1, 656.0, 40.0, 281.4, -13.2, 186.8, 9.5, -9.8, -7.0, -50.2, -94.7, 8.5, 90.1, 19.2, 15.8, -37.2, -2.0, 131.3, 41.5, 14.4, -107.8, 113.0, 146.5, -32.8, 43.9, -11.9, 393.6, 504.3, 373.4, 556.5, -12.3, -5.4, 207.3, 71.5, 112.1, 98.7, 10.4, 9.1, 41.1, 76.9, 378.9, 68.1, 278.0, 86.8, 8.7, 3.0, 348.9, -165.8, 20.9, 64.1, 102.0, 128.4, 101.2, -35.1, 23.3, 360.5, 184.7, 42.8, 11.0, 199.7, 23.7, 93.6, 62.7, 382.6, 13.0, 8.4, 32.9, 37.7, 46.6, 22.6, 25.3, -2.9, 97.9, 12.3, 100.5, -9.8, -7.1, 104.2, 528.9, -2.0, 105.0, 99.7, 234.0, 18.8, 15.5, 368.0, -7.8, 81.3, 401.9, 127.6, 28.9, 76.3, 10.1, 731.7, 135.0, 14.1, -2.5, 24.2, 94.0, 30.8, -28.5, -17.2, 19.9, 75.7, 46.9, -1.5, 676.7, 42.9, 12.4, 8.9, 13.5, 923.5, 68.5, -6.4, 16.6, -11.3, 81.8, -27.2, -3.8, 338.8, 51.4, -51.7, -23.5, 50.5, 72.9, 110.5, 4.3, -30.0, -0.3, -7.7, 8.1, -53.7, -1.1, 21.4, 42.5, -87.3, 8.6, 162.8, -134.2, 31.9, 39.6, 23.2, -19.9, 24.4, 147.4, -34.0, 618.7, 89.9, 19.7, -2.3, 31.3, 125.8, 236.3, 5.8, -47.0, 229.1, 151.7, -53.4, 215.6, 30.2, 41.5, -96.2, -0.9, -5.3, 606.7, 56.7, 15.3, 25.4, 135.9, 425.9, 97.8, 131.5, 177.1, 7.5, -7.2, -46.0, 342.7, 5.4, 75.6, -54.5, 256.7, 86.6, 100.3, 43.8, 32.8, 83.0, 60.9, 57.8, -144.7, 53.8, -8.0, 118.9, 8.8, 15.1, -15.4, 14.7, 848.9, 24.7, -4.9, 82.6, 280.9, 68.0, 128.0, 87.2, 5.9, 153.5, 24.8, 56.0, 18.1, 132.0, 8.3, -5.0, -6.5, -37.6, -4.6, 7.1, 176.8, 95.0, 90.0, 60.0, 65.6, -27.3, -18.7, 253.1, 479.2, 139.3, -72.3, 1.7, -5.7, 50.5, 230.2, 28.8, 10.3, 126.6, 335.2, 23.2, 36.0, 366.7, -53.1, 705.9, 432.5, 426.7, 340.2, 329.6, 77.4, 84.6, 104.5, 52.6, 18.6, 10.8, 40.5, 73.1, 6.0, 72.8, 63.4, 56.0, 18.1, -146.1, -84.3, 32.6, 48.5, -46.9, -79.4, -52.5, 33.8, 19.1, 25.4, 157.4, 1052.4, -11.1, 44.1, 0.7, 25.4, -35.6, 8.1, 298.6, 188.6, 19.2, -2.7, -6.4, 21.6, 25.4, -71.1, 5.0, 50.1, 38.8, 250.1, 147.7, 109.1, 4.6, 7.1, 19.6, 89.5, 60.7, 2.0, -7.1, -12.3, 131.8, 18.1, -77.5, 276.5, 111.7, 60.4, 55.9, -509.8, 199.1, -31.1, 155.5, 170.4, -2.4, 11.7, 20.6, 11.1, 26.9, 232.4, 18.2, 164.1, -36.0, 61.6, 104.1, 85.4, 59.3, -82.3, 83.5, -2.1, 767.1, 539.0, 717.7, 54.9, 482.6, 348.5, -4.8, 3.0, 8.6, 243.3, 453.2, 30.9, 179.6, 38.3, 21.3, -14.5, 102.4, 235.5, 34.2, 46.5, 44.5, -16.1, -43.6, 236.6, -8.3, 3.6, -28.4, 82.9, -8.4, -2.5, 14.5, 18.1, 17.4, 245.4, 90.3, -15.7, -23.5, -3.7, 32.6, 118.0, 86.0, 30.9, -0.2, 0.0, -11.4, 84.1, 59.0, 15.5, 50.7, 6.8, 118.4, 136.8, -19.5, 86.6, 59.8, 257.6, 369.2, 520.2, -6.9, 923.0, 632.2, 112.1, 92.1, 41.6, 160.3, 143.2, 57.6, 32.8, -3.4, 63.2, 516.4, -180.4, 209.8]
The attributes between 1080.2 and 2670.2 :
[1106.5, 1598.1, 1443.0, 1325.0, 1814.9, 1916.6, 1106.7, 1399.9, 1361.0, 1232.2]
The attributes between 2670.2 and 4260.2 :
[2783.0, 3109.6, 2738.3, 3840.5]
The attributes between 4260.2 and 5850.3 :
[5850.3]
Figure 3 - Output of The First Random Gene's Equal Width Approach Process
Figure 4 - The Scatterplot of The First Random Gene's Equal Width Approach
#Calculating the width
width2 = (max(value2)-min(value2))/4
width2 =round(width2,1)
print("\n\nWidth for {}. gene is : {}".format(random2, width2) )
#Calculating the borders
firstline2 = min(value2)+width2
secondline2 = min(value2)+width2+width2
thirdline2 = min(value2)+width2+width2+width2
firstline2 = round(firstline2,1)
secondline2 = round(secondline2,1)
thirdline2 = round(thirdline2,1)
group1forvalue2 = []
group2forvalue2 = []
group3forvalue2 = []
group4forvalue2 = []
#Placing the values to the groups for value 2 list
for i in range(500):
if (value2[i]< firstline2):
group1forvalue2.append(value2[i])
elif (value2[i]<secondline2):
group2forvalue2.append(value2[i])
elif (value2[i]<thirdline2):
group3forvalue2.append(value2[i])
else:
group4forvalue2.append(value2[i])
print("Equal width approach for {}. gene : ".format(random2))
print("\nThe attributes between {} and {} :".format(min(value2), firstline2))
print(group1forvalue2)
print("\nThe attributes between {} and {} :".format(firstline2, secondline2))
print(group2forvalue2)
print("\nThe attributes between {} and {} :".format(secondline2, thirdline2))
print(group3forvalue2)
print("\nThe attributes between {} and {} :".format(thirdline2, max(value2)))
print(group4forvalue2)
if(len(group1forvalue2)>0):
x_axis1_2 = numpy.linspace(min(group1forvalue2), max(group1forvalue2),len(group1forvalue2))
plt.scatter(x_axis1_2, group1forvalue2, color='g')
if(len(group2forvalue2)>0):
x_axis2_2 = numpy.linspace(min(group2forvalue2), max(group2forvalue2),len(group2forvalue2))
plt.scatter(x_axis2_2, group2forvalue2, color='r')
if(len(group3forvalue2)>0):
x_axis3_2 = numpy.linspace(min(group3forvalue2), max(group3forvalue2),len(group3forvalue2))
plt.scatter(x_axis3_2, group3forvalue2, color='b')
if(len(group4forvalue2)>0):
x_axis4_2 = numpy.linspace(min(group4forvalue2), max(group4forvalue2),len(group4forvalue2))
plt.scatter(x_axis4_2, group4forvalue2, color='hotpink')
plt.axvline(x = firstline2, color = 'r')
plt.axvline(x = secondline2, color = 'b')
plt.axvline(x = thirdline2, color = 'hotpink')
plt.title("Equal width approach for {}. gene : ".format(random2))
plt.xlabel("Width")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 5 - Second Random Gene's Equal Width Approach Process
Width for 179. gene is : 4804.1
Equal width approach for 179. gene :
The attributes between -392.9 and 4411.2 :
[172.9, -5.4, -69.5, -80.6, -72.9, 222.3, 97.6, 292.9, 399.9, 301.0, 1.2, 179.7, 60.9, -54.2, 122.4, -71.4, -31.4, 97.3, 21.0, 192.8, 1.0, 196.2, 14.7, -33.2, -38.6, -25.1, 17.9, 60.5, 129.8, 271.4, -36.6, 157.1, 532.0, 56.0, 260.1, 51.5, 270.1, 594.3, 84.3, 191.6, 253.1, 517.5, -133.4, -11.8, 66.1, 31.3, -55.4, 110.9, 71.8, 68.0, 3.5, 1.3, -159.6, -52.8, -83.1, -3.0, 1512.4, -58.4, 12.7, -19.4, 16.9, -59.9, -42.4, 12.9, -20.8, 36.0, 64.6, 209.9, -11.7, -30.3, 87.0, 103.0, 93.8, -236.9, 105.2, 27.7, -4.4, -14.3, 83.1, 222.1, 44.2, 9.6, -19.2, 43.3, 44.7, 69.8, 155.4, -85.4, 713.5, 136.3, 97.6, -69.5, 196.5, 29.5, 85.5, -83.4, -17.6, -153.6, 24.9, 94.7, -103.1, 11.5, -57.7, -111.7, 212.0, 177.0, -61.3, -76.3, 62.8, 226.6, 100.0, 12.1, -31.9, 551.6, 691.5, 196.0, 818.5, -40.8, -87.6, 114.6, 115.1, 128.8, 85.2, -58.9, -4.3, -85.4, 112.1, 386.8, 154.9, 321.8, -74.9, 103.8, -35.0, 449.0, -121.7, -2.9, 120.0, 129.0, 408.3, 99.9, -43.4, 40.6, 224.4, 155.5, 61.8, 113.6, 292.2, 148.7, 10.5, 141.7, 1073.6, -33.1, -102.2, -36.2, 31.0, 115.5, -15.0, 122.4, -106.6, 78.6, 26.6, 124.1, -58.8, 63.0, 102.2, 516.4, -30.0, -392.9, 31.8, 463.6, 45.3, -5.8, 365.6, -9.8, 75.9, 78.8, 382.3, 66.1, -59.2, -29.8, 45.1, 393.7, 312.6, -12.9, 31.1, -21.8, 185.0, 65.7, -244.9, 3692.8, -14.3, -14.9, 76.4, 86.9, 26.0, 472.9, 62.5, 1230.1, 28.2, -29.8, 104.0, 761.1, 89.1, 71.0, 78.2, -99.4, 21.4, -109.6, -57.6, 14.6, -5.8, -147.1, 61.0, 2.3, 44.4, 71.0, 50.0, 926.7, -68.2, 37.8, -117.6, 122.0, -215.1, -14.0, 62.7, 76.6, -33.3, -84.7, 230.1, -80.7, 77.9, 29.9, -14.5, -50.7, 21.4, -30.0, -77.8, 534.5, 110.9, 142.9, 4.0, 20.1, 114.8, 227.4, 20.5, -128.1, 2.4, -35.6, -78.8, 226.8, 9.2, 27.5, -153.6, -30.7, 1044.9, -63.9, 1012.2, 95.4, -57.4, 25.2, 113.7, 627.7, 15.3, 332.7, 232.7, -33.2, -55.4, -0.4, 109.6, 26.0, 30.7, -61.4, 237.3, 60.5, 152.3, -117.1, -24.4, 185.2, -15.3, -94.7, -86.5, 95.4, 29.1, 49.0, 12.9, 65.1, -89.6, 39.2, 694.7, 122.0, -92.2, 135.1, 244.0, -18.1, 53.9, 123.1, 3.0, -30.7, 204.1, 66.1, 55.3, 244.0, -131.3, 20.2, -38.1, -14.3, 50.4, 52.0, 1.7, -46.2, 112.9, 93.1, 6.8, -12.1, -91.4, 155.3, 757.2, 171.2, 34.0, 7.0, 27.5, 72.3, 151.2, 97.4, -88.7, 122.2, 145.8, 13.6, 143.4, 348.9, -9.9, 383.5, 419.1, 194.6, 154.2, 243.9, 313.0, -12.8, -16.7, 26.7, -0.3, -152.7, 34.7, 158.0, 1673.7, 68.0, 73.4, 82.5, 106.2, -64.3, -261.3, -139.4, 146.8, 94.5, -62.2, -64.9, -66.3, -55.4, -27.3, 14.3, 553.5, 341.4, -43.7, 37.2, 61.0, -59.1, -34.2, 68.8, -30.8, 123.4, 40.8, -116.6, -47.2, -45.9, -38.1, -63.2, -38.7, 56.5, 83.6, 150.7, 236.2, 1570.3, -14.0, -2.7, -6.2, 152.7, 83.5, -43.4, 11.1, -58.3, 286.9, 77.6, 41.3, 587.5, 30.3, 1.1, 15.2, -267.5, 516.4, 40.9, 136.8, 79.3, 200.4, -72.1, 29.2, -11.8, 36.3, 186.7, 110.2, 976.3, 337.4, -65.4, 128.9, 232.1, -28.6, 115.5, 36.0, 247.2, -36.4, 811.3, 728.7, 458.2, 89.6, 342.9, 205.8, -89.9, 74.2, 87.9, 227.4, 370.4, 46.5, 248.9, 195.2, -9.9, -23.4, 153.7, 205.8, -38.7, 330.8, 5.9, -14.6, 18.6, 699.4, 10.9, 52.7, -1.0, -100.7, 10.5, 692.4, 8.0, 59.1, 10.5, 1391.2, 13.0, 92.4, 105.9, -11.7, 67.6, -132.7, 112.5, -156.4, 22.7, 128.4, -84.0, -99.4, -60.2, 70.8, 21.8, -41.6, 57.8, 8.6, 191.1, 210.6, -59.8, -119.7, 114.9, 194.7, 598.7, 27.5, 143.5, -105.9, 499.0, 1028.3, 87.4, -20.4, 104.2, 307.7, 58.6, 41.5, 8.7, 75.1, 42.0, 1226.0, -140.7, 208.9]
The attributes between 4411.2 and 9215.3 :
[7990.2, 4446.6, 4720.1]
The attributes between 9215.3 and 14019.4 :
[9795.0]
The attributes between 14019.4 and 18823.6 :
[18823.6]
Figure 6 - Output of The Second Random Gene's Equal Width Approach Process
Figure 7 - The Scatterplot of The Second Random Gene's Equal Width Approach
#Calculating the width
width3 = (max(value3)-min(value3))/4
width3 = round(width3,1)
print("\n\nWidth for {}. gene is : {}".format(random3, width3) )
#Calculating the borders
firstline3 = min(value3)+width3
secondline3 = min(value3)+width3+width3
thirdline3 = min(value3)+width3+width3+width3
firstline3 = round(firstline3,1)
secondline3 = round(secondline3,1)
thirdline3 = round(thirdline3,1)
group1forvalue3 = []
group2forvalue3 = []
group3forvalue3 = []
group4forvalue3 = []
#Placing the values to the groups for value 3 list
for i in range(500):
if (value3[i]< firstline3):
group1forvalue3.append(value3[i])
elif (value3[i]<secondline3):
group2forvalue3.append(value3[i])
elif (value3[i]<thirdline3):
group3forvalue3.append(value3[i])
else:
group4forvalue3.append(value3[i])
print("Equal width approach for {}. gene : ".format(random3))
print("\nThe attributes between {} and {} :".format(min(value3), firstline3))
print(group1forvalue3)
print("\nThe attributes between {} and {} :".format(firstline3, secondline3))
print(group2forvalue3)
print("\nThe attributes between {} and {} :".format(secondline3, thirdline3))
print(group3forvalue3)
print("\nThe attributes between {} and {} :".format(thirdline3, max(value3)))
print(group4forvalue3)
if(len(group1forvalue3)>0):
x_axis1_3 = numpy.linspace(min(group1forvalue3), max(group1forvalue3),len(group1forvalue3))
plt.scatter(x_axis1_3, group1forvalue3, color='g')
if(len(group2forvalue3)>0):
x_axis2_3 = numpy.linspace(min(group2forvalue3), max(group2forvalue3),len(group2forvalue3))
plt.scatter(x_axis2_3, group2forvalue3, color='r')
if(len(group3forvalue3)>0):
x_axis3_3 = numpy.linspace(min(group3forvalue3), max(group3forvalue3),len(group3forvalue3))
plt.scatter(x_axis3_3, group3forvalue3, color='b')
if(len(group4forvalue3)>0):
x_axis4_3 = numpy.linspace(min(group4forvalue3), max(group4forvalue3),len(group4forvalue3))
plt.scatter(x_axis4_3, group4forvalue3, color='hotpink')
plt.axvline(x = firstline3, color = 'r')
plt.axvline(x = secondline3, color = 'b')
plt.axvline(x = thirdline3, color = 'hotpink')
plt.title("Equal width approach for {}. gene : ".format(random3))
plt.xlabel("Width")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 8 - Third Random Gene's Equal Width Approach Process
Width for 126. gene is : 3230.3
Equal width approach for 126. gene :
The attributes between -278.9 and 2951.4 :
[205.8, 86.9, 52.7, -133.2, 57.6, 2040.1, 510.4, 394.0, 874.2, 683.4, 127.6, 20.9, 258.9, 26.9, 6.3, 70.9, 6.1, 29.0, 33.7, -57.5, 75.4, 3.3, 430.2, 6.7, 12.5, -185.5, -40.1, 27.6, 47.4, 41.6, 59.6, 16.7, 31.9, 298.4, 43.2, 101.5, 30.3, 272.7, 283.2, 20.5, -26.7, 389.8, 400.9, 31.7, 41.7, 69.9, 120.0, 1.0, 55.1, 67.5, 26.1, -4.0, 25.3, -14.9, 89.7, -12.1, -22.0, 1675.0, -5.1, 36.5, 49.1, 13.3, 233.7, -37.3, 11.0, 13.3, 24.7, 64.8, 131.2, 73.1, 53.0, 17.0, -9.2, 141.2, 6.4, 138.6, 62.1, 58.9, 31.6, 117.2, 251.0, 186.9, 12.1, 15.8, -6.0, 1.9, 53.6, -28.9, -32.6, 364.8, 20.8, 943.8, -29.9, 187.0, -16.4, -0.7, -0.7, -40.7, -48.1, 39.3, 43.0, 1.0, 11.7, -26.5, -1.0, 185.2, 101.5, -14.6, -71.3, 73.0, 141.3, 10.4, 64.0, -35.0, 298.3, 294.0, 132.2, 620.9, 4.6, 0.6, 376.6, 82.6, 1.3, 142.6, 182.3, 66.6, 48.0, 105.8, 649.4, 104.5, 221.7, 73.9, 4.8, 11.4, 352.6, -39.9, 56.3, 229.4, 348.2, 241.2, 43.7, 0.6, -11.2, 121.1, 265.2, -14.8, 22.7, 267.9, 99.4, -1.6, 58.9, 450.0, 19.0, 14.9, -34.8, 73.4, 69.2, 45.2, 20.0, 35.2, 73.0, 43.0, 31.4, 7.3, -61.8, 119.5, 582.0, 15.3, 111.0, 100.1, 217.0, 26.4, -0.5, 457.4, 20.3, 89.3, 187.1, 1233.9, 119.3, 35.3, 5.9, 48.2, 398.5, 266.2, -2.5, 26.7, 28.5, 128.6, 55.1, -75.0, -23.1, 41.2, 41.5, 83.6, 3.7, 622.3, 146.8, 1094.4, 28.2, 4.0, 26.3, 400.7, 138.6, -1.4, 19.7, -13.6, 66.1, -6.0, 6.4, 163.8, 66.4, -41.2, -28.9, 4.3, 30.1, 17.5, 26.0, 709.6, -94.3, 20.2, -3.4, 22.8, -70.4, 27.2, 46.2, 70.0, -48.8, -11.2, 115.8, -89.7, 39.4, 15.9, 35.0, -19.2, 27.5, 106.1, -33.5, 460.5, 156.3, 46.7, 0.6, 76.7, 125.8, 111.6, -11.2, -52.3, 110.7, 35.2, -6.3, 8.2, 29.0, 17.8, -104.9, -23.4, 886.1, 10.1, 832.7, 19.8, 3.1, 8.5, 70.3, 525.0, 170.6, 130.7, 211.2, 17.2, 1.4, -28.9, 220.2, 7.5, 43.8, 51.4, 263.5, 54.2, 95.9, 34.3, 34.7, 283.9, 49.0, 122.3, -101.7, 3.7, 14.8, 105.6, 11.9, 21.0, 19.7, 79.2, 703.0, 15.9, -1.0, 257.3, 288.1, 12.8, 157.1, 130.6, 20.9, 198.2, 149.7, 81.8, -17.8, 261.3, -36.9, 22.6, 16.4, 17.9, 7.6, 6.0, 168.6, 77.7, 118.6, 72.6, 96.2, -56.0, -24.5, 188.5, 294.9, 168.2, 23.6, -0.3, -5.5, 129.2, 202.1, 36.7, 13.2, 71.5, 281.3, 44.2, 54.3, 278.7, 113.3, 454.2, 422.7, 374.7, 246.5, 154.7, 65.7, 34.8, 122.6, 56.5, 25.4, -28.5, 52.1, 75.5, 1823.9, 47.5, 106.3, 91.2, 71.6, 22.3, -92.7, -41.8, 41.8, 55.8, 6.6, 36.9, -35.6, 44.0, 56.1, 5.8, 146.3, 545.8, -24.7, 130.4, 8.3, 81.4, 17.2, -26.6, 563.1, 138.1, 17.5, -4.2, 9.9, -7.5, 68.8, -80.8, 58.3, 14.0, 26.9, 136.2, 127.5, 179.4, -16.2, -13.0, 23.5, 81.6, 106.7, 45.3, -19.9, 2.9, 330.3, 161.8, -127.8, 328.4, 447.4, 77.4, 52.3, -278.9, 314.5, 6.4, 126.5, 50.5, 61.4, -45.2, 37.5, 0.5, 23.6, 143.7, 15.3, 783.7, 166.4, -41.7, 78.8, 27.2, 120.5, 111.9, -91.9, 359.6, -6.2, 644.5, 485.6, 795.3, 75.0, 408.1, 320.0, -18.0, 36.9, -61.9, 181.0, 195.7, 8.9, 100.7, 15.2, 21.7, -1.9, 143.8, 200.4, 24.6, 284.5, -26.0, -10.3, -23.4, 267.1, 3.1, 5.9, -49.0, 87.8, -0.1, 1282.2, 1.4, 8.0, 26.6, 912.2, -17.3, 187.2, 95.7, -46.9, -3.2, -29.3, 57.1, 69.6, 86.8, 12.6, 3.5, 16.8, 18.3, 76.3, 178.8, 20.2, 51.4, 17.5, 187.2, 136.7, -13.1, 127.8, 81.5, 203.0, 764.1, 203.5, 169.2, 11.7, 322.0, 935.1, 91.8, 134.3, 105.8, 206.4, 147.3, -125.5, 108.2, 7.7, 104.6, 1294.1, -178.1, 215.1]
The attributes between 2951.4 and 6181.7 :
[3575.0, 3295.4, 3754.2, 5753.0]
The attributes between 6181.7 and 9412.0 :
[]
The attributes between 9412.0 and 12642.2 :
[12642.2]
Figure 9 - Output of The Third Random Gene's Equal Width Approach Process
Figure 10 - The Scatterplot of The Third Random Gene's Equal Width Approach
We will use 4 different groups for this process. First we need to find the frequency value. For this we will divide the number of data in the gene by the number of groups. The first frequency value we will add later, starting from zero, will be our first limit. Then we will add the same frequency value to the result we found, determine the other two limits and place all the data in these 4 groups.
#Question1>a>II-----------------------------------------------------------------------------
#Calculating the frequency for any object
print(len(value1))
frequency = int(len(value1)/4)
print("\n\n\n\nThe Frequency is : ", frequency)
group1_forvalue1_forfrequency = []
group2_forvalue1_forfrequency = []
group3_forvalue1_forfrequency = []
group4_forvalue1_forfrequency = []
for i in range(frequency):
group1_forvalue1_forfrequency.append(value1[i])
for i in range(frequency,2*frequency):
group2_forvalue1_forfrequency.append(value1[i])
for i in range(2*frequency,3*frequency):
group3_forvalue1_forfrequency.append(value1[i])
for i in range(3*frequency,(int(len(value1)))):
group4_forvalue1_forfrequency.append(value1[i])
print("\n\nEqual frequency approach for {}. gene : ".format(random1))
print("\nThe attributes between {} and {} :".format(0, frequency))
print(group1_forvalue1_forfrequency)
print("\nThe attributes between {} and {} :".format(frequency, frequency*2))
print(group2_forvalue1_forfrequency)
print("\nThe attributes between {} and {} :".format(frequency*2, frequency*3))
print(group3_forvalue1_forfrequency)
print("\nThe attributes between {} and {} :".format(frequency*3, len(value1)))
print(group4_forvalue1_forfrequency)
x1_1 = numpy.arange(0,frequency)
plt.scatter(x1_1, group1_forvalue1_forfrequency, color='g')
x1_2 = numpy.arange(frequency,2*frequency)
plt.scatter(x1_2, group2_forvalue1_forfrequency, color='r')
x1_3 = numpy.arange(2*frequency,3*frequency)
plt.scatter(x1_3, group3_forvalue1_forfrequency, color='b')
x1_4 = numpy.arange(3*frequency,4*frequency+1)
plt.scatter(x1_4, group4_forvalue1_forfrequency, color='hotpink')
plt.axvline(x = frequency, color = 'r')
plt.axvline(x = 2*frequency, color = 'b')
plt.axvline(x = 3*frequency, color = 'hotpink')
plt.title("Equal frequency approach for {}. gene : ".format(random1))
plt.xlabel("Frequency")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 11 - First Random Gene's Equal Frequency Approach Process
The Frequency is : 125
Equal frequency approach for 72. gene :
The attributes between 0 and 125 :
[369.7, 49.3, -26.8, -187.7, 18.7, 935.5, 11.8, 475.7, 347.4, 969.6, 164.0, 9.6, 124.5, 11.6, 6.1, 105.7, -102.1, 11.2, 25.6, 24.4, 106.8, 27.4, 1106.5, 17.2, 1.5, -116.4, -29.4, 30.0, -30.6, 8.6, 3.6, 12.3, 36.1, 675.4, 61.9, 111.4, 24.5, 313.5, 452.3, 25.7, -11.2, 281.4, 424.1, 17.6, 23.5, -6.9, 173.7, 0.8, 95.2, 33.9, 32.5, 38.8, 43.4, -24.2, 48.4, 6.1, 42.6, 864.8, 11.5, 17.9, 77.9, 20.2, 7.6, -4.8, 0.8, 7.4, 47.9, 61.0, 216.0, 15.6, 93.8, 33.6, -43.8, 12.1, -143.2, 27.9, 95.7, 26.7, -41.9, 83.5, 330.7, 136.4, 3.7, 28.0, 11.8, -3.8, 111.3, 6.9, -33.1, 656.0, 40.0, 281.4, -13.2, 186.8, 9.5, -9.8, -7.0, -50.2, -94.7, 8.5, 90.1, 19.2, 15.8, -37.2, -2.0, 131.3, 41.5, 14.4, -107.8, 113.0, 146.5, -32.8, 43.9, -11.9, 393.6, 504.3, 373.4, 556.5, -12.3, -5.4, 207.3, 71.5, 112.1, 98.7, 10.4]
The attributes between 125 and 250 :
[9.1, 41.1, 76.9, 378.9, 68.1, 278.0, 86.8, 8.7, 3.0, 348.9, -165.8, 20.9, 64.1, 102.0, 128.4, 101.2, -35.1, 23.3, 360.5, 184.7, 42.8, 11.0, 199.7, 23.7, 93.6, 62.7, 382.6, 13.0, 8.4, 32.9, 37.7, 46.6, 22.6, 25.3, -2.9, 97.9, 12.3, 100.5, -9.8, -7.1, 104.2, 528.9, -2.0, 105.0, 99.7, 234.0, 18.8, 15.5, 368.0, -7.8, 81.3, 401.9, 1598.1, 127.6, 28.9, 76.3, 10.1, 731.7, 135.0, 14.1, -2.5, 24.2, 94.0, 30.8, -28.5, 2783.0, -17.2, 19.9, 75.7, 46.9, -1.5, 676.7, 42.9, 1443.0, 12.4, 8.9, 13.5, 923.5, 68.5, -6.4, 16.6, -11.3, 81.8, -27.2, -3.8, 338.8, 51.4, -51.7, -23.5, 50.5, 72.9, 110.5, 4.3, 1325.0, -30.0, -0.3, -7.7, 8.1, -53.7, -1.1, 21.4, 42.5, -87.3, 8.6, 162.8, -134.2, 31.9, 39.6, 23.2, -19.9, 24.4, 147.4, -34.0, 618.7, 89.9, 19.7, -2.3, 31.3, 125.8, 236.3, 5.8, -47.0, 229.1, 151.7, -53.4]
The attributes between 250 and 375 :
[215.6, 30.2, 41.5, -96.2, -0.9, 1814.9, -5.3, 606.7, 56.7, 15.3, 25.4, 135.9, 425.9, 97.8, 131.5, 177.1, 7.5, -7.2, -46.0, 342.7, 5.4, 75.6, -54.5, 256.7, 86.6, 100.3, 43.8, 32.8, 83.0, 60.9, 57.8, -144.7, 53.8, -8.0, 118.9, 8.8, 15.1, -15.4, 14.7, 848.9, 24.7, -4.9, 82.6, 280.9, 68.0, 128.0, 87.2, 5.9, 153.5, 24.8, 56.0, 18.1, 132.0, 8.3, -5.0, -6.5, -37.6, -4.6, 7.1, 176.8, 5850.3, 95.0, 90.0, 60.0, 65.6, -27.3, -18.7, 253.1, 479.2, 139.3, -72.3, 1.7, -5.7, 50.5, 230.2, 28.8, 10.3, 126.6, 335.2, 23.2, 36.0, 366.7, -53.1, 705.9, 432.5, 426.7, 340.2, 329.6, 3109.6, 77.4, 84.6, 104.5, 52.6, 18.6, 10.8, 40.5, 73.1, 1916.6, 6.0, 72.8, 63.4, 56.0, 18.1, -146.1, -84.3, 32.6, 48.5, -46.9, -79.4, -52.5, 33.8, 19.1, 25.4, 157.4, 1052.4, -11.1, 44.1, 0.7, 25.4, -35.6, 8.1, 298.6, 188.6, 19.2, -2.7]
The attributes between 375 and 501 :
[-6.4, 21.6, 25.4, -71.1, 5.0, 50.1, 38.8, 250.1, 147.7, 109.1, 4.6, 7.1, 19.6, 89.5, 60.7, 2.0, -7.1, -12.3, 2738.3, 3840.5, 131.8, 18.1, -77.5, 276.5, 111.7, 60.4, 55.9, -509.8, 199.1, -31.1, 155.5, 170.4, -2.4, 11.7, 20.6, 11.1, 26.9, 232.4, 18.2, 1106.7, 164.1, -36.0, 61.6, 104.1, 85.4, 59.3, -82.3, 83.5, -2.1, 767.1, 539.0, 717.7, 54.9, 482.6, 348.5, -4.8, 3.0, 8.6, 243.3, 453.2, 30.9, 179.6, 38.3, 21.3, -14.5, 102.4, 235.5, 34.2, 46.5, 44.5, -16.1, -43.6, 236.6, -8.3, 3.6, -28.4, 82.9, -8.4, 1399.9, -2.5, 14.5, 18.1, 1361.0, 17.4, 245.4, 90.3, -15.7, -23.5, -3.7, 32.6, 118.0, 86.0, 30.9, -0.2, 0.0, -11.4, 84.1, 59.0, 15.5, 50.7, 6.8, 118.4, 136.8, -19.5, 86.6, 59.8, 257.6, 1232.2, 369.2, 520.2, -6.9, 923.0, 632.2, 112.1, 92.1, 41.6, 160.3, 143.2, 57.6, 32.8, -3.4, 63.2, 516.4, -180.4, 209.8, 0.0]
Figure 12 - Output of The First Random Gene's Equal Frequency Approach Process
Figure 13 - The Scatterplot of The First Random Gene's Equal Frequency Approach
group1_forvalue2_forfrequency = []
group2_forvalue2_forfrequency = []
group3_forvalue2_forfrequency = []
group4_forvalue2_forfrequency = []
for i in range(frequency):
group1_forvalue2_forfrequency.append(value2[i])
for i in range(frequency,2*frequency):
group2_forvalue2_forfrequency.append(value2[i])
for i in range(2*frequency,3*frequency):
group3_forvalue2_forfrequency.append(value2[i])
for i in range(3*frequency,(int(len(value2)))):
group4_forvalue2_forfrequency.append(value2[i])
print("\n\nEqual frequency approach for {}. gene : ".format(random2))
print("\nThe attributes between {} and {} :".format(0, frequency))
print(group1_forvalue2_forfrequency)
print("\nThe attributes between {} and {} :".format(frequency, frequency*2))
print(group2_forvalue2_forfrequency)
print("\nThe attributes between {} and {} :".format(frequency*2, frequency*3))
print(group3_forvalue2_forfrequency)
print("\nThe attributes between {} and {} :".format(frequency*3, len(value2)))
print(group4_forvalue2_forfrequency)
x2_1 = numpy.arange(0,frequency)
plt.scatter(x2_1, group1_forvalue2_forfrequency, color='g')
x2_2 = numpy.arange(frequency,2*frequency)
plt.scatter(x2_2, group2_forvalue2_forfrequency, color='r')
x2_3 = numpy.arange(2*frequency,3*frequency)
plt.scatter(x2_3, group3_forvalue2_forfrequency, color='b')
x2_4 = numpy.arange(3*frequency,4*frequency+1)
plt.scatter(x2_4, group4_forvalue2_forfrequency, color='hotpink')
plt.axvline(x = frequency, color = 'r')
plt.axvline(x = 2*frequency, color = 'b')
plt.axvline(x = 3*frequency, color = 'hotpink')
plt.title("Equal frequency approach for {}. gene : ".format(random2))
plt.xlabel("Frequency")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 14 - Second Random Gene's Equal Frequency Approach Process
Equal frequency approach for 179. gene :
The attributes between 0 and 125 :
[172.9, -5.4, -69.5, -80.6, -72.9, 222.3, 97.6, 292.9, 9795.0, 399.9, 301.0, 1.2, 179.7, 60.9, -54.2, 122.4, -71.4, -31.4, 97.3, 21.0, 192.8, 1.0, 196.2, 14.7, -33.2, -38.6, -25.1, 17.9, 60.5, 129.8, 271.4, -36.6, 157.1, 532.0, 56.0, 260.1, 51.5, 270.1, 594.3, 84.3, 191.6, 253.1, 517.5, -133.4, -11.8, 66.1, 31.3, -55.4, 110.9, 71.8, 68.0, 3.5, 1.3, -159.6, -52.8, -83.1, -3.0, 1512.4, -58.4, 12.7, -19.4, 16.9, -59.9, -42.4, 12.9, -20.8, 36.0, 64.6, 209.9, -11.7, -30.3, 87.0, 103.0, 93.8, -236.9, 105.2, 27.7, -4.4, -14.3, 83.1, 222.1, 44.2, 9.6, -19.2, 43.3, 44.7, 69.8, 155.4, -85.4, 713.5, 136.3, 97.6, -69.5, 196.5, 29.5, 85.5, -83.4, -17.6, -153.6, 24.9, 94.7, -103.1, 11.5, -57.7, -111.7, 212.0, 177.0, -61.3, -76.3, 62.8, 226.6, 100.0, 12.1, -31.9, 551.6, 691.5, 196.0, 818.5, -40.8, -87.6, 114.6, 115.1, 128.8, 85.2, -58.9]
The attributes between 125 and 250 :
[-4.3, -85.4, 112.1, 386.8, 154.9, 321.8, -74.9, 103.8, -35.0, 449.0, -121.7, -2.9, 120.0, 129.0, 408.3, 99.9, -43.4, 40.6, 224.4, 155.5, 61.8, 113.6, 292.2, 148.7, 10.5, 141.7, 1073.6, -33.1, -102.2, -36.2, 31.0, 115.5, -15.0, 122.4, -106.6, 78.6, 26.6, 124.1, -58.8, 63.0, 102.2, 516.4, -30.0, -392.9, 31.8, 463.6, 45.3, -5.8, 365.6, -9.8, 75.9, 78.8, 382.3, 66.1, -59.2, -29.8, 45.1, 393.7, 312.6, -12.9, 31.1, -21.8, 185.0, 65.7, -244.9, 3692.8, -14.3, -14.9, 76.4, 86.9, 26.0, 472.9, 62.5, 1230.1, 28.2, -29.8, 104.0, 761.1, 89.1, 71.0, 78.2, -99.4, 21.4, -109.6, -57.6, 14.6, -5.8, -147.1, 61.0, 2.3, 44.4, 71.0, 50.0, 926.7, -68.2, 37.8, -117.6, 122.0, -215.1, -14.0, 62.7, 76.6, -33.3, -84.7, 230.1, -80.7, 77.9, 29.9, -14.5, -50.7, 21.4, -30.0, -77.8, 534.5, 110.9, 142.9, 4.0, 20.1, 114.8, 227.4, 20.5, -128.1, 2.4, -35.6, -78.8]
The attributes between 250 and 375 :
[226.8, 9.2, 27.5, -153.6, -30.7, 1044.9, -63.9, 1012.2, 95.4, -57.4, 25.2, 113.7, 627.7, 15.3, 332.7, 232.7, -33.2, -55.4, -0.4, 109.6, 26.0, 30.7, -61.4, 237.3, 60.5, 152.3, -117.1, -24.4, 185.2, -15.3, -94.7, -86.5, 95.4, 29.1, 49.0, 12.9, 65.1, -89.6, 39.2, 694.7, 122.0, -92.2, 135.1, 244.0, -18.1, 53.9, 123.1, 3.0, -30.7, 204.1, 66.1, 55.3, 244.0, -131.3, 20.2, -38.1, -14.3, 50.4, 52.0, 1.7, 7990.2, -46.2, 112.9, 93.1, 6.8, -12.1, -91.4, 155.3, 757.2, 171.2, 34.0, 7.0, 27.5, 72.3, 151.2, 97.4, -88.7, 122.2, 145.8, 13.6, 143.4, 348.9, -9.9, 383.5, 419.1, 194.6, 154.2, 243.9, 4446.6, 313.0, -12.8, -16.7, 26.7, -0.3, -152.7, 34.7, 158.0, 1673.7, 68.0, 73.4, 82.5, 106.2, -64.3, -261.3, -139.4, 146.8, 94.5, -62.2, -64.9, -66.3, -55.4, -27.3, 14.3, 553.5, 341.4, -43.7, 37.2, 61.0, -59.1, -34.2, 68.8, -30.8, 123.4, 40.8, -116.6]
The attributes between 375 and 501 :
[-47.2, -45.9, -38.1, -63.2, -38.7, 56.5, 83.6, 150.7, 236.2, 1570.3, -14.0, -2.7, -6.2, 152.7, 83.5, -43.4, 11.1, -58.3, 18823.6, 4720.1, 286.9, 77.6, 41.3, 587.5, 30.3, 1.1, 15.2, -267.5, 516.4, 40.9, 136.8, 79.3, 200.4, -72.1, 29.2, -11.8, 36.3, 186.7, 110.2, 976.3, 337.4, -65.4, 128.9, 232.1, -28.6, 115.5, 36.0, 247.2, -36.4, 811.3, 728.7, 458.2, 89.6, 342.9, 205.8, -89.9, 74.2, 87.9, 227.4, 370.4, 46.5, 248.9, 195.2, -9.9, -23.4, 153.7, 205.8, -38.7, 330.8, 5.9, -14.6, 18.6, 699.4, 10.9, 52.7, -1.0, -100.7, 10.5, 692.4, 8.0, 59.1, 10.5, 1391.2, 13.0, 92.4, 105.9, -11.7, 67.6, -132.7, 112.5, -156.4, 22.7, 128.4, -84.0, -99.4, -60.2, 70.8, 21.8, -41.6, 57.8, 8.6, 191.1, 210.6, -59.8, -119.7, 114.9, 194.7, 598.7, 27.5, 143.5, -105.9, 499.0, 1028.3, 87.4, -20.4, 104.2, 307.7, 58.6, 41.5, 8.7, 75.1, 42.0, 1226.0, -140.7, 208.9, 0.0]
Figure 15 - Output of The Second Random Gene's Equal Frequency Approach Process
Figure 16 - The Scatterplot of The Second Random Gene's Equal Frequency Approach
group1_forvalue3_forfrequency = []
group2_forvalue3_forfrequency = []
group3_forvalue3_forfrequency = []
group4_forvalue3_forfrequency = []
for i in range(frequency):
group1_forvalue3_forfrequency.append(value3[i])
for i in range(frequency,2*frequency):
group2_forvalue3_forfrequency.append(value3[i])
for i in range(2*frequency,3*frequency):
group3_forvalue3_forfrequency.append(value3[i])
for i in range(3*frequency,(int(len(value1)))):
group4_forvalue3_forfrequency.append(value3[i])
print("\n\nEqual frequency approach for {}. gene : ".format(random3))
print("\nThe attributes between {} and {} :".format(0, frequency))
print(group1_forvalue3_forfrequency)
print("\nThe attributes between {} and {} :".format(frequency, frequency*2))
print(group2_forvalue3_forfrequency)
print("\nThe attributes between {} and {} :".format(frequency*2, frequency*3))
print(group3_forvalue3_forfrequency)
print("\nThe attributes between {} and {} :".format(frequency*3, len(value3)))
print(group4_forvalue3_forfrequency)
x3_1 = numpy.arange(0,frequency)
plt.scatter(x3_1, group1_forvalue3_forfrequency, color='g')
x3_2 = numpy.arange(frequency,2*frequency)
plt.scatter(x3_2, group2_forvalue3_forfrequency, color='r')
x3_3 = numpy.arange(2*frequency,3*frequency)
plt.scatter(x3_3, group3_forvalue3_forfrequency, color='b')
x3_4 = numpy.arange(3*frequency,4*frequency+1)
plt.scatter(x3_4, group4_forvalue3_forfrequency, color='hotpink')
plt.axvline(x = frequency, color = 'r')
plt.axvline(x = 2*frequency, color = 'b')
plt.axvline(x = 3*frequency, color = 'hotpink')
plt.title("Equal frequency approach for {}. gene : ".format(random3))
plt.xlabel("Frequency")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 17 - Third Random Gene's Equal Frequency Approach Process
Equal frequency approach for 126. gene :
The attributes between 0 and 125 :
[205.8, 86.9, 52.7, -133.2, 57.6, 2040.1, 510.4, 394.0, 874.2, 683.4, 127.6, 20.9, 258.9, 26.9, 6.3, 70.9, 6.1, 29.0, 33.7, -57.5, 75.4, 3.3, 430.2, 6.7, 12.5, -185.5, -40.1, 27.6, 47.4, 41.6, 59.6, 16.7, 31.9, 298.4, 43.2, 101.5, 30.3, 272.7, 283.2, 20.5, -26.7, 389.8, 400.9, 31.7, 41.7, 69.9, 120.0, 1.0, 55.1, 67.5, 26.1, -4.0, 25.3, -14.9, 89.7, -12.1, -22.0, 1675.0, -5.1, 36.5, 49.1, 13.3, 233.7, -37.3, 11.0, 13.3, 24.7, 64.8, 131.2, 73.1, 53.0, 17.0, -9.2, 141.2, 6.4, 138.6, 62.1, 58.9, 31.6, 117.2, 251.0, 186.9, 12.1, 15.8, -6.0, 1.9, 53.6, -28.9, -32.6, 364.8, 20.8, 943.8, -29.9, 187.0, -16.4, -0.7, -0.7, -40.7, -48.1, 39.3, 43.0, 1.0, 11.7, -26.5, -1.0, 185.2, 101.5, -14.6, -71.3, 73.0, 141.3, 10.4, 64.0, -35.0, 298.3, 294.0, 132.2, 620.9, 4.6, 0.6, 376.6, 82.6, 1.3, 142.6, 182.3]
The attributes between 125 and 250 :
[66.6, 48.0, 105.8, 649.4, 104.5, 221.7, 73.9, 4.8, 11.4, 352.6, -39.9, 56.3, 229.4, 348.2, 241.2, 43.7, 0.6, -11.2, 121.1, 265.2, -14.8, 22.7, 267.9, 99.4, -1.6, 58.9, 450.0, 19.0, 14.9, -34.8, 73.4, 69.2, 45.2, 20.0, 35.2, 73.0, 43.0, 31.4, 7.3, -61.8, 119.5, 582.0, 15.3, 111.0, 100.1, 217.0, 26.4, -0.5, 457.4, 20.3, 89.3, 187.1, 1233.9, 119.3, 35.3, 5.9, 48.2, 398.5, 266.2, -2.5, 26.7, 28.5, 128.6, 55.1, -75.0, 3575.0, -23.1, 41.2, 41.5, 83.6, 3.7, 622.3, 146.8, 1094.4, 28.2, 4.0, 26.3, 400.7, 138.6, -1.4, 19.7, -13.6, 66.1, -6.0, 6.4, 163.8, 66.4, -41.2, -28.9, 4.3, 30.1, 17.5, 26.0, 709.6, -94.3, 20.2, -3.4, 22.8, -70.4, 27.2, 46.2, 70.0, -48.8, -11.2, 115.8, -89.7, 39.4, 15.9, 35.0, -19.2, 27.5, 106.1, -33.5, 460.5, 156.3, 46.7, 0.6, 76.7, 125.8, 111.6, -11.2, -52.3, 110.7, 35.2, -6.3]
The attributes between 250 and 375 :
[8.2, 29.0, 17.8, -104.9, -23.4, 886.1, 10.1, 832.7, 19.8, 3.1, 8.5, 70.3, 525.0, 170.6, 130.7, 211.2, 17.2, 1.4, -28.9, 220.2, 7.5, 43.8, 51.4, 263.5, 54.2, 95.9, 34.3, 34.7, 283.9, 49.0, 122.3, -101.7, 3.7, 14.8, 105.6, 11.9, 21.0, 19.7, 79.2, 703.0, 15.9, -1.0, 257.3, 288.1, 12.8, 157.1, 130.6, 20.9, 198.2, 149.7, 81.8, -17.8, 261.3, -36.9, 22.6, 16.4, 17.9, 7.6, 6.0, 168.6, 12642.2, 77.7, 118.6, 72.6, 96.2, -56.0, -24.5, 188.5, 294.9, 168.2, 23.6, -0.3, -5.5, 129.2, 202.1, 36.7, 13.2, 71.5, 281.3, 44.2, 54.3, 278.7, 113.3, 454.2, 422.7, 374.7, 246.5, 154.7, 3295.4, 65.7, 34.8, 122.6, 56.5, 25.4, -28.5, 52.1, 75.5, 1823.9, 47.5, 106.3, 91.2, 71.6, 22.3, -92.7, -41.8, 41.8, 55.8, 6.6, 36.9, -35.6, 44.0, 56.1, 5.8, 146.3, 545.8, -24.7, 130.4, 8.3, 81.4, 17.2, -26.6, 563.1, 138.1, 17.5, -4.2]
The attributes between 375 and 501 :
[9.9, -7.5, 68.8, -80.8, 58.3, 14.0, 26.9, 136.2, 127.5, 179.4, -16.2, -13.0, 23.5, 81.6, 106.7, 45.3, -19.9, 2.9, 3754.2, 5753.0, 330.3, 161.8, -127.8, 328.4, 447.4, 77.4, 52.3, -278.9, 314.5, 6.4, 126.5, 50.5, 61.4, -45.2, 37.5, 0.5, 23.6, 143.7, 15.3, 783.7, 166.4, -41.7, 78.8, 27.2, 120.5, 111.9, -91.9, 359.6, -6.2, 644.5, 485.6, 795.3, 75.0, 408.1, 320.0, -18.0, 36.9, -61.9, 181.0, 195.7, 8.9, 100.7, 15.2, 21.7, -1.9, 143.8, 200.4, 24.6, 284.5, -26.0, -10.3, -23.4, 267.1, 3.1, 5.9, -49.0, 87.8, -0.1, 1282.2, 1.4, 8.0, 26.6, 912.2, -17.3, 187.2, 95.7, -46.9, -3.2, -29.3, 57.1, 69.6, 86.8, 12.6, 3.5, 16.8, 18.3, 76.3, 178.8, 20.2, 51.4, 17.5, 187.2, 136.7, -13.1, 127.8, 81.5, 203.0, 764.1, 203.5, 169.2, 11.7, 322.0, 935.1, 91.8, 134.3, 105.8, 206.4, 147.3, -125.5, 108.2, 7.7, 104.6, 1294.1, -178.1, 215.1, 0.0]
Figure 18 - Output of The Third Random Gene's Equal Frequency Approach Process
Figure 19 - The Scatterplot of The Third Random Gene's Equal Frequency Approach
Under this heading, we will see The Euclidean Distance, Cosine Similarity and Correlation.
In general, we can say that it is a value that we find by taking the root of the sum of squares of the differences of the values of two data lists. We can say that the larger the Euclidean distance, the farther these two values are from each other. If this value gets smaller, of course, we can say that these two values are more similar to each other.
#Question 2 - Attribute Similarity----------------------------------------------------------
#Part a>i The Euclidian Distance
random4 = random.randint(0,179)
random5 = random.randint(0,179)
value1 = data_list[random1]
value2 = data_list[random2]
value3 = data_list[random3]
value4 = data_list[random2]
value5 = data_list[random3]
#Euclidian Distance Between value1 and value2
answersquare_val1val2 = 0
for i in range(500):
answersquare_val1val2 += (value1[i]-value2[i])**2
answer_val1val2 = answersquare_val1val2**0.5
answer_val1val2 = round(answer_val1val2,1)
print("\nEuclidian Distance Between {}. gene and {}. gene is : {}".format(random1, random2, answer_val1val2 ))
#Euclidian Distance Between value1 and value3
answersquare_val1val3 = 0
for i in range(500):
answersquare_val1val3 += (value1[i]-value3[i])**2
answer_val1val3 = answersquare_val1val3**0.5
answer_val1val3 = round(answer_val1val3,1)
print("\nEuclidian Distance Between {}. gene and {}. gene is : {}".format(random1, random3, answer_val1val3 ))
#Euclidian Distance Between value1 and value4
answersquare_val1val4 = 0
for i in range(500):
answersquare_val1val4 += (value1[i]-value4[i])**2
answer_val1val4 = answersquare_val1val4**0.5
answer_val1val4 = round(answer_val1val4,1)
print("\nEuclidian Distance Between {}. gene and {}. gene is : {}".format(random1, random4, answer_val1val4 ))
#Euclidian Distance Between value1 and value5
answersquare_val1val5 = 0
for i in range(500):
answersquare_val1val5 += (value1[i]-value5[i])**2
answer_val1val5 = answersquare_val1val5**0.5
answer_val1val5 = round(answer_val1val5,1)
print("\nEuclidian Distance Between {}. gene and {}. gene is : {}".format(random1, random5, answer_val1val5 ))
#Euclidian Distance Between value2 and value3
answersquare_val2val3 = 0
for i in range(500):
answersquare_val2val3 += (value2[i]-value3[i])**2
answer_val2val3 = answersquare_val2val3**0.5
answer_val2val3 = round(answer_val2val3,1)
print("\nEuclidian Distance Between {}. gene and {}. gene is : {}".format(random2, random3, answer_val2val3 ))
#Euclidian Distance Between value2 and value4
answersquare_val2val4 = 0
for i in range(500):
answersquare_val2val4 += (value2[i]-value4[i])**2
answer_val2val4 = answersquare_val2val4**0.5
answer_val2val4 = round(answer_val2val4,1)
print("\nEuclidian Distance Between {}. gene and {}. gene is : {}".format(random2, random4, answer_val2val4 ))
#Euclidian Distance Between value2 and value5
answersquare_val2val5 = 0
for i in range(500):
answersquare_val2val5 += (value2[i]-value5[i])**2
answer_val2val5 = answersquare_val2val5**0.5
answer_val2val5 = round(answer_val2val5,1)
print("\nEuclidian Distance Between {}. gene and {}. gene is : {}".format(random2, random5, answer_val2val5 ))
#Euclidian Distance Between value3 and value4
answersquare_val3val4 = 0
for i in range(500):
answersquare_val3val4 += (value3[i]-value4[i])**2
answer_val3val4 = answersquare_val3val4**0.5
answer_val3val4 = round(answer_val3val4,1)
print("\nEuclidian Distance Between {}. gene and {}. gene is : {}".format(random3, random4, answer_val3val4 ))
#Euclidian Distance Between value3 and value5
answersquare_val3val5 = 0
for i in range(500):
answersquare_val3val5 += (value3[i]-value5[i])**2
answer_val3val5 = answersquare_val3val5**0.5
answer_val3val5 = round(answer_val3val5,1)
print("\nEuclidian Distance Between {}. gene and {}. gene is : {}".format(random3, random5, answer_val3val5 ))
#Euclidian Distance Between value4 and value5
answersquare_val4val5 = 0
for i in range(500):
answersquare_val4val5 += (value4[i]-value5[i])**2
answer_val4val5 = answersquare_val4val5**0.5
answer_val4val5 = round(answer_val4val5,1)
print("\nEuclidian Distance Between {}. gene and {}. gene is : {}".format(random4, random5, answer_val4val5 ))
Figure 20 - The Process of The Euclidian Distance for Five Random Genes
Euclidian Distance Between 72. gene and 179. gene is : 19255.0
Euclidian Distance Between 72. gene and 126. gene is : 7832.7
Euclidian Distance Between 72. gene and 161. gene is : 19255.0
Euclidian Distance Between 72. gene and 18. gene is : 7832.7
Euclidian Distance Between 179. gene and 126. gene is : 18567.5
Euclidian Distance Between 179. gene and 161. gene is : 0.0
Euclidian Distance Between 179. gene and 18. gene is : 18567.5
Euclidian Distance Between 126. gene and 161. gene is : 18567.5
Euclidian Distance Between 126. gene and 18. gene is : 0.0
Euclidian Distance Between 161. gene and 18. gene is : 18567.5
Figure 21 - The Output of The Five Random Genes' Euclidian Distance
In the output, some genes have the same or close Euclidian distances. But by looking at this result, we can say that genes 179 and 161 are the same. Also the 126th and 18th genes are the same. In this case, we can say that the most similar genes are the 72th, 126th and 18rd genes. At the same time, it is possible to say that the least similar genes are the 179th and 161th genes.
The cosine of two non-zero vectors can be derived by using this formula:
Cos(d1,d2) = (d1 . d2) / ||d1|| . ||d2||
For example, two proportional vectors have a cosine similarity of 1, two orthogonal vectors have a similarity of 0, and two opposite vectors have a similarity of -1.
#Part A>II Cosine Similarity-----------------------------------------------------------
#Cosine Similarity Between Value 1 and Value 2
val1_val2_d1d2 = 0
val1_val2_d1d1 = 0
val1_val2_d2d2 = 0
for i in range(500):
val1_val2_d1d2 += value1[i]*value2[i]
val1_val2_d1d1 += value1[i]**2
val1_val2_d2d2 += value2[i]**2
CosSimilarity_val1_val2 = val1_val2_d1d2 / ((val1_val2_d1d1**0.5) * (val1_val2_d2d2**0.5) )
CosSimilarity_val1_val2 = round(CosSimilarity_val1_val2,1)
print("\nCosine Similarity between {} and {} is : {}".format(random1,random2,CosSimilarity_val1_val2))
#Cosine Similarity Between Value 1 and Value 3
val1_val3_d1d2 = 0
val1_val3_d1d1 = 0
val1_val3_d2d2 = 0
for i in range(500):
val1_val3_d1d2 += value1[i]*value3[i]
val1_val3_d1d1 += value1[i]**2
val1_val3_d2d2 += value3[i]**2
CosSimilarity_val1_val3 = val1_val3_d1d2 / ((val1_val3_d1d1**0.5) * (val1_val3_d2d2**0.5) )
CosSimilarity_val1_val3 = round(CosSimilarity_val1_val3,1)
print("\nCosine Similarity between {} and {} is : {}".format(random1,random3,CosSimilarity_val1_val3))
#Cosine Similarity Between Value 1 and Value 4
val1_val4_d1d2 = 0
val1_val4_d1d1 = 0
val1_val4_d2d2 = 0
for i in range(500):
val1_val4_d1d2 += value1[i]*value4[i]
val1_val4_d1d1 += value1[i]**2
val1_val4_d2d2 += value4[i]**2
CosSimilarity_val1_val4 = val1_val4_d1d2 / ((val1_val4_d1d1**0.5) * (val1_val4_d2d2**0.5) )
CosSimilarity_val1_val4 = round(CosSimilarity_val1_val4,1)
print("\nCosine Similarity between {} and {} is : {}".format(random1,random4,CosSimilarity_val1_val4))
#Cosine Similarity Between Value 1 and Value 5
val1_val5_d1d2 = 0
val1_val5_d1d1 = 0
val1_val5_d2d2 = 0
for i in range(500):
val1_val5_d1d2 += value1[i]*value5[i]
val1_val5_d1d1 += value1[i]**2
val1_val5_d2d2 += value5[i]**2
CosSimilarity_val1_val5 = val1_val5_d1d2 / ((val1_val5_d1d1**0.5) * (val1_val5_d2d2**0.5) )
CosSimilarity_val1_val5 = round(CosSimilarity_val1_val5,1)
print("\nCosine Similarity between {} and {} is : {}".format(random1,random5,CosSimilarity_val1_val5))
#Cosine Similarity Between Value 2 and Value 3
val2_val3_d1d2 = 0
val2_val3_d1d1 = 0
val2_val3_d2d2 = 0
for i in range(500):
val2_val3_d1d2 += value2[i]*value3[i]
val2_val3_d1d1 += value2[i]**2
val2_val3_d2d2 += value3[i]**2
CosSimilarity_val2_val3 = val2_val3_d1d2 / ((val2_val3_d1d1**0.5) * (val2_val3_d2d2**0.5) )
CosSimilarity_val2_val3 = round(CosSimilarity_val2_val3,1)
print("\nCosine Similarity between {} and {} is : {}".format(random2,random3,CosSimilarity_val2_val3))
#Cosine Similarity Between Value 2 and Value 4
val2_val4_d1d2 = 0
val2_val4_d1d1 = 0
val2_val4_d2d2 = 0
for i in range(500):
val2_val4_d1d2 += value2[i]*value4[i]
val2_val4_d1d1 += value2[i]**2
val2_val4_d2d2 += value4[i]**2
CosSimilarity_val2_val4 = val2_val4_d1d2 / ((val2_val4_d1d1**0.5) * (val2_val4_d2d2**0.5) )
CosSimilarity_val2_val4 = round(CosSimilarity_val2_val4,1)
print("\nCosine Similarity between {} and {} is : {}".format(random2,random4,CosSimilarity_val2_val4))
#Cosine Similarity Between Value 2 and Value 5
val2_val5_d1d2 = 0
val2_val5_d1d1 = 0
val2_val5_d2d2 = 0
for i in range(500):
val2_val5_d1d2 += value2[i]*value4[i]
val2_val5_d1d1 += value2[i]**2
val2_val5_d2d2 += value4[i]**2
CosSimilarity_val2_val5 = val2_val5_d1d2 / ((val2_val5_d1d1**0.5) * (val2_val5_d2d2**0.5) )
CosSimilarity_val2_val5 = round(CosSimilarity_val2_val5,1)
print("\nCosine Similarity between {} and {} is : {}".format(random2,random5,CosSimilarity_val2_val5))
#Cosine Similarity Between Value 3 and Value 4
val3_val4_d1d2 = 0
val3_val4_d1d1 = 0
val3_val4_d2d2 = 0
for i in range(500):
val3_val4_d1d2 += value3[i]*value4[i]
val3_val4_d1d1 += value3[i]**2
val3_val4_d2d2 += value4[i]**2
CosSimilarity_val3_val4 = val3_val4_d1d2 / ((val3_val4_d1d1**0.5) * (val3_val4_d2d2**0.5) )
CosSimilarity_val3_val4 = round(CosSimilarity_val3_val4,1)
print("\nCosine Similarity between {} and {} is : {}".format(random3,random4,CosSimilarity_val3_val4))
#Cosine Similarity Between Value 3 and Value 5
val3_val5_d1d2 = 0
val3_val5_d1d1 = 0
val3_val5_d2d2 = 0
for i in range(500):
val3_val5_d1d2 += value3[i]*value5[i]
val3_val5_d1d1 += value3[i]**2
val3_val5_d2d2 += value5[i]**2
CosSimilarity_val3_val5 = val3_val5_d1d2 / ((val3_val5_d1d1**0.5) * (val3_val5_d2d2**0.5) )
CosSimilarity_val3_val5 = round(CosSimilarity_val3_val5,1)
print("\nCosine Similarity between {} and {} is : {}".format(random3,random5,CosSimilarity_val3_val5))
#Cosine Similarity Between Value 4 and Value 5
val4_val5_d1d2 = 0
val4_val5_d1d1 = 0
val4_val5_d2d2 = 0
for i in range(500):
val4_val5_d1d2 += value4[i]*value5[i]
val4_val5_d1d1 += value4[i]**2
val4_val5_d2d2 += value5[i]**2
CosSimilarity_val4_val5 = val4_val5_d1d2 / ((val4_val5_d1d1**0.5) * (val4_val5_d2d2**0.5) )
CosSimilarity_val4_val5 = round(CosSimilarity_val4_val5,1)
print("\nCosine Similarity between {} and {} is : {}".format(random4,random5,CosSimilarity_val4_val5))
Figure 22 - The Process of The Cosine Similarity for Five Random Genes
Cosine Similarity
Cosine Similarity between 72 and 179 is : 0.7
Cosine Similarity between 72 and 126 is : 0.9
Cosine Similarity between 72 and 161 is : 0.7
Cosine Similarity between 72 and 18 is : 0.9
Cosine Similarity between 179 and 126 is : 0.7
Cosine Similarity between 179 and 161 is : 1.0
Cosine Similarity between 179 and 18 is : 1.0
Cosine Similarity between 126 and 161 is : 0.7
Cosine Similarity between 126 and 18 is : 1.0
Cosine Similarity between 161 and 18 is : 0.7
Figure 23 - The Output of The Five Random Genes' Cosine Similarity
Since we use the same randomly selected genes in all processes, we can see the benefit of comparing different methods under the same heading.
Based on Figure 21, we observed that gene 179 and gene 161 were the same but very different from the other three. . Now we see that the cosine similarity largely confirms the previous data. However, although the 126th gene and the 18th gene are the same, the 179th gene is the same as the 18th gene but has a distant relationship with the 126th gene. I expected about 0.7 cosine similarity between the 179th gen and the 18th gen. But the value I got surprised me.
Correlation analysis is a statistical method used to measure the strength of the linear relationship between two variables and compute their association. Correlation analysis calculates the level of change in one variable due to the change in the other. A high correlation points to a strong relationship between the two variables, while a low correlation means that the variables are weakly related [5].
#Part A>III Correlation-----------------------------------------------------------
totalForVal1 = 0
totalForVal2 = 0
totalForVal3 = 0
totalForVal4 = 0
totalForVal5 = 0
n = len(value1)
for i in range(500):
totalForVal1+= value1[i]
totalForVal2+= value2[i]
totalForVal3+= value3[i]
totalForVal4+= value4[i]
totalForVal5+= value5[i]
meanForVal1 = totalForVal1 / n
meanForVal2 = totalForVal2 / n
meanForVal3 = totalForVal3 / n
meanForVal4 = totalForVal4 / n
meanForVal5 = totalForVal5 / n
#The sum of the squares of the differences
total_val1 = 0
total_val2 = 0
total_val3 = 0
total_val4 = 0
total_val5 = 0
for i in range(500):
total_val1 += (value1[i]-meanForVal1)**2
total_val2 += (value2[i]-meanForVal2)**2
total_val3 += (value3[i]-meanForVal3)**2
total_val4 += (value4[i]-meanForVal4)**2
total_val5 += (value5[i]-meanForVal5)**2
std_val1 =( 1/(n-1)* total_val1 ) **2
std_val2 =( 1/(n-1)* total_val2 ) **2
std_val3 =( 1/(n-1)* total_val3 ) **2
std_val4 =( 1/(n-1)* total_val4 ) **2
std_val5 =( 1/(n-1)* total_val5 ) **2
#p' of p
value1__ = []
value2__ = []
value3__ = []
value4__ = []
value5__ = []
for i in range(500):
value1__.append((value1[i]-meanForVal1 )/std_val1)
value2__.append((value2[i]-meanForVal2 )/std_val2)
value3__.append((value3[i]-meanForVal3 )/std_val3)
value4__.append((value4[i]-meanForVal4 )/std_val4)
value5__.append((value5[i]-meanForVal5 )/std_val5)
#Correlation Between Value x' and Value y'
corBet_Val1andVal2 = 0
corBet_Val1andVal3 = 0
corBet_Val1andVal4 = 0
corBet_Val1andVal5 = 0
corBet_Val2andVal3 = 0
corBet_Val2andVal4 = 0
corBet_Val2andVal5 = 0
corBet_Val3andVal4 = 0
corBet_Val3andVal5 = 0
corBet_Val4andVal5 = 0
for i in range(500):
corBet_Val1andVal2 += value1__[i]*value2__[i]
corBet_Val1andVal3 += value1__[i]*value3__[i]
corBet_Val1andVal4 += value1__[i]*value4__[i]
corBet_Val1andVal5 += value1__[i]*value5__[i]
corBet_Val2andVal3 += value2__[i]*value3__[i]
corBet_Val2andVal4 += value2__[i]*value4__[i]
corBet_Val2andVal5 += value2__[i]*value5__[i]
corBet_Val3andVal4 += value3__[i]*value4__[i]
corBet_Val3andVal5 += value3__[i]*value5__[i]
corBet_Val4andVal5 += value4__[i]*value5__[i]
print("\n\n\nCorrelation Between {} and {} is : {}".format(random1,random2, corBet_Val1andVal2))
print("\nCorrelation Between {} and {} is : {}".format(random1,random3, corBet_Val1andVal3))
print("\nCorrelation Between {} and {} is : {}".format(random1,random4, corBet_Val1andVal4))
print("\nCorrelation Between {} and {} is : {}".format(random1,random5, corBet_Val1andVal5))
print("\nCorrelation Between {} and {} is : {}".format(random2,random3, corBet_Val2andVal3))
print("\nCorrelation Between {} and {} is : {}".format(random2,random4, corBet_Val2andVal4))
print("\nCorrelation Between {} and {} is : {}".format(random2,random5, corBet_Val2andVal5))
print("\nCorrelation Between {} and {} is : {}".format(random3,random4, corBet_Val3andVal4))
print("\nCorrelation Between {} and {} is : {}".format(random3,random5, corBet_Val3andVal5))
print("\nCorrelation Between {} and {} is : {}".format(random4,random5, corBet_Val4andVal5))
Figure 24 - The Process of The Correlation for Five Random Genes
Correlation Between 72 and 179 is : 2.734229784397608e-15
Correlation Between 72 and 126 is : 1.366364231780023e-14
Correlation Between 72 and 161 is : 2.734229784397608e-15
Correlation Between 72 and 18 is : 1.366364231780023e-14
Correlation Between 179 and 126 is : 7.126178743382514e-16
Correlation Between 179 and 161 is : 3.121216991025851e-16
Correlation Between 179 and 18 is : 7.126178743382514e-16
Correlation Between 126 and 161 is : 7.126178743382514e-16
Correlation Between 126 and 18 is : 3.953451270883555e-15
Correlation Between 161 and 18 is : 7.126178743382514e-16
Figure 25 - The Output of The Five Random Genes' Correlation
When we look at the correlation values of genes with each other, it would not be wrong to say that all genes have a similarity. However, we can say that the 72nd gen is the least similar. When we compared the 179th gene with the 161st gene with other results, I observed that these two genes were not the same in these results. I also see that the affinity of the 179th gene with the 18th and 126th genes is the same as the 161st gene's affinity with the 18th and 126th genes.
#Question 2 >Part B>Min - Max Normalization-----------------------------------------------------------
Normalized_data_list = [[0 for _ in range(500)] for _ in range(180)]
for i in range(180):
num_of_positive=0
num_of_negative=0
for j in range(500):
if (data_list[i][j] > 0):
num_of_positive +=1
if (data_list[i][j] < 0):
num_of_negative +=1
value = data_list[i]
if (num_of_positive > 0 and num_of_negative>0):
for k in range(500):
z = (value[k]-min(value))/ (max(value)-min(value))*(1-(-1))+(-1)
z= round(z,1)
Normalized_data_list[i][k] = z
elif (num_of_positive > 0 and num_of_negative ==0):
for k in range(500):
z = (value[k]-min(value))/ (max(value)-min(value))*(1-0)+(0)
z= round(z,1)
Normalized_data_list[i][k] = z
else:
for k in range(500):
z = (value[k]-min(value))/ (max(value)-min(value))*(0-(-1))+(-1)
z= round(z,1)
Normalized_data_list[i][k] = z
Figure 26 - The Process of The Normalization of All Genes in Data
In this study, we applied normalization to all data and created our new data set by applying different processes regarding whether the data contains positive or negative values, as requested. Now, we will apply all the operations we have applied before under this heading to this new data set.
In fact, we will only apply the operations we applied in 2.1.1 to the normalized version of the same data and compare the studies.
#Discretization for Normalized Data
#Question2>B>-----------------------------------------------------------------------------
#Calculating width for 4 different groups
value1_N = Normalized_data_list[random1]
value2_N = Normalized_data_list[random2]
value3_N = Normalized_data_list[random3]
#Calculating the width for normalized data
width_N = (max(value1_N)-min(value1_N))/4
width_N = round(width_N,1)
print("\n\nCalculating the width for normalized data : ")
print("\nWidth :", width_N)
#Calculating the borders
firstline_N = min(value1_N)+width_N
secondline_N = min(value1_N)+width_N+width_N
thirdline_N = min(value1_N)+width_N+width_N+width_N
firstline_N = round(firstline_N,1)
secondline_N = round(secondline_N,1)
thirdline_N = round(thirdline_N,1)
group1forvalue1_N = []
group2forvalue1_N = []
group3forvalue1_N = []
group4forvalue1_N = []
#Placing the values to the groups for value 1 list
for i in range(500):
if (value1_N[i]< firstline_N):
group1forvalue1_N.append(value1_N[i])
elif (value1_N[i]<secondline_N):
group2forvalue1_N.append(value1_N[i])
elif (value1_N[i]<thirdline_N):
group3forvalue1_N.append(value1_N[i])
else:
group4forvalue1_N.append(value1_N[i])
print("Equal width approach for {}. gene for normalized data: ".format(random1))
print("\nThe attributes between {} and {} :".format(min(value1_N), firstline_N))
print(group1forvalue1_N)
print("\nThe attributes between {} and {} :".format(firstline_N, secondline_N))
print(group2forvalue1_N)
print("\nThe attributes between {} and {} :".format(secondline_N, thirdline_N))
print(group3forvalue1_N)
print("\nThe attributes between {} and {} :".format(thirdline_N, max(value1_N)))
print(group4forvalue1_N)
if(len(group1forvalue1_N)>0):
x_axis1_1_N = numpy.linspace(min(group1forvalue1_N), max(group1forvalue1_N),len(group1forvalue1_N))
plt.scatter(x_axis1_1_N, group1forvalue1_N, color='g')
if(len(group2forvalue1_N)>0):
x_axis2_1_N = numpy.linspace(min(group2forvalue1_N), max(group2forvalue1_N),len(group2forvalue1_N))
plt.scatter(x_axis2_1_N , group2forvalue1_N, color='r')
if(len(group3forvalue1_N)>0):
x_axis3_1_N = numpy.linspace(min(group3forvalue1_N), max(group3forvalue1_N),len(group3forvalue1_N))
plt.scatter(x_axis3_1_N, group3forvalue1_N, color='b')
if(len(group4forvalue1_N)>0):
x_axis4_1_N = numpy.linspace(min(group4forvalue1_N), max(group4forvalue1_N),len(group4forvalue1_N))
plt.scatter(x_axis4_1_N, group4forvalue1_N, color='hotpink')
plt.axvline(x = firstline_N, color = 'r')
plt.axvline(x = secondline_N, color = 'b')
plt.axvline(x = thirdline_N, color = 'hotpink')
plt.title("Equal width approach for {}. gene for normalized data: ".format(random1))
plt.xlabel("Width")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 27 - First Random Gene's Equal Width Approach Process for Normalized Data
Calculating the width for normalized data :
Width : 0.5
Equal width approach for 72. gene for normalized data:
The attributes between -1.0 and -0.5 :
[-0.7, -0.8, -0.8, -0.9, -0.8, -0.8, -0.7, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.6, -0.8, -0.8, -0.8, -0.7, -0.7, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.9, -0.8, -0.8, -0.8, -0.9, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.9, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.9, -0.8, -0.8, -0.7, -0.7, -0.7, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.6, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.7, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.7, -0.9, -0.6, -0.7, -0.7, -0.7, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.9, -0.8, -0.8, -0.9, -0.9, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -1.0, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.6, -0.7, -0.6, -0.8, -0.7, -0.7, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.7, -0.8, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.9, -0.8]
The attributes between -0.5 and 0.0 :
[-0.5, -0.5, -0.5, -0.3, -0.4, -0.5, -0.4, -0.3, -0.2, -0.5, -0.5, -0.4, -0.4, -0.5, -0.5]
The attributes between 0.0 and 0.5 :
[0.0, 0.1, 0.0, 0.4]
The attributes between 0.5 and 1.0 :
[1.0]
Figure 28 - Output of The First Random Gene's Equal Width Approach Process for Normalized Data
Figure 29 - The Scatterplot of The First Random Gene's Equal Width Approach for Normalized Data
#------------------------------------------------------------
#Calculating the width for normalized data
width2_N = (max(value2_N)-min(value2_N))/4
width2_N = round(width2_N,1)
print("\n\nCalculating the width for normalized data : ")
print("\nWidth :", width2_N)
print(value2_N)
#Calculating the borders
firstline2_N = min(value2_N)+width2_N
secondline2_N = min(value2_N)+width2_N+width2_N
thirdline2_N = min(value2_N)+width2_N+width2_N+width2_N
firstline2_N = round(firstline2_N,1)
secondline2_N = round(secondline2_N,1)
thirdline2_N = round(thirdline2_N,1)
group1forvalue2_N = []
group2forvalue2_N = []
group3forvalue2_N = []
group4forvalue2_N = []
#Placing the values to the groups for value 2 list
for i in range(500):
if (value2_N[i]< firstline2_N):
group1forvalue2_N.append(value2_N[i])
elif (value2_N[i]<secondline2_N):
group2forvalue2_N.append(value2_N[i])
elif (value2_N[i]<thirdline2):
group3forvalue2_N.append(value2_N[i])
else:
group4forvalue2_N.append(value2_N[i])
print("Equal width approach for {}. gene for normalized data: ".format(random2))
print("\nThe attributes between {} and {} :".format(min(value2_N), firstline2_N))
print(group1forvalue2_N)
print("\nThe attributes between {} and {} :".format(firstline2_N, secondline2_N))
print(group2forvalue2_N)
print("\nThe attributes between {} and {} :".format(secondline2_N, thirdline2_N))
print(group3forvalue2_N)
print("\nThe attributes between {} and {} :".format(thirdline2_N, max(value2_N)))
print(group4forvalue2_N)
if(len(group1forvalue2_N)>0):
x_axis1_2_N = numpy.linspace(min(group1forvalue2_N), max(group1forvalue2_N),len(group1forvalue2_N))
plt.scatter(x_axis1_2_N, group1forvalue2_N, color='g')
if(len(group2forvalue2_N)>0):
x_axis2_2_N = numpy.linspace(min(group2forvalue2_N), max(group2forvalue2_N),len(group2forvalue2_N))
plt.scatter(x_axis2_2_N, group2forvalue2_N, color='r')
if(len(group3forvalue2_N)>0):
x_axis3_2_N = numpy.linspace(min(group3forvalue2_N), max(group3forvalue2_N),len(group3forvalue2_N))
plt.scatter(x_axis3_2_N, group3forvalue2_N, color='b')
if(len(group4forvalue2_N)>0):
x_axis4_2_N = numpy.linspace(min(group4forvalue2_N), max(group4forvalue2_N),len(group4forvalue2_N))
plt.scatter(x_axis4_2_N, group4forvalue2_N, color='hotpink')
plt.axvline(x = firstline2_N, color = 'r')
plt.axvline(x = secondline2_N, color = 'b')
plt.axvline(x = thirdline2_N, color = 'hotpink')
plt.title("Equal width approach for {}. gene for normalized data: ".format(random2))
plt.xlabel("Width")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 30 - Second Random Gene's Equal Width Approach Process for Normalized Data
Calculating the width for normalized data :
Width : 0.5
[-0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, 0.1, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.8, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -0.8, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.6, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.8, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.1, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -0.5, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.8, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.8, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, 1.0, -0.5, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.8, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.8, -1.0, -0.9]
Equal width approach for 179. gene for normalized data:
The attributes between -1.0 and -0.5 :
[-0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.8, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -0.8, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.6, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.8, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.8, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.8, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.8, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.8, -1.0, -0.9]
The attributes between -0.5 and 0.0 :
[-0.1, -0.5, -0.5]
The attributes between 0.0 and 0.5 :
[0.1, 1.0]
The attributes between 0.5 and 1.0 :
[]
Figure 31 - Output of The Second Random Gene's Equal Width Approach Process for Normalized Data
Figure 32 - The Scatterplot of The Second Random Gene's Equal Width Approach for Normalized Data
#Calculating the width for normalized data
width3_N = (max(value3_N)-min(value3_N))/4
width3_N = round(width3_N,1)
print("\n\nCalculating the width for normalized data : ")
print("\nWidth :", width3_N)
#Calculating the borders
firstline3_N = min(value3_N)+width3_N
secondline3_N = min(value3_N)+width3_N+width3_N
thirdline3_N = min(value3_N)+width3_N+width3_N+width3_N
firstline3_N = round(firstline3_N,1)
secondline3_N = round(secondline3_N,1)
thirdline3_N = round(thirdline3_N,1)
group1forvalue3_N = []
group2forvalue3_N = []
group3forvalue3_N = []
group4forvalue3_N = []
#Placing the values to the groups for value 3 list
for i in range(500):
if (value3_N[i]< firstline3_N):
group1forvalue3_N.append(value3_N[i])
elif (value3_N[i]<secondline3_N):
group2forvalue3_N.append(value3_N[i])
elif (value3_N[i]<thirdline3_N):
group3forvalue3_N.append(value3_N[i])
else:
group4forvalue3_N.append(value3_N[i])
print("Equal width approach for {}. gene for normalized data: ".format(random3))
print("\nThe attributes between {} and {} :".format(min(value3_N), firstline3_N))
print(group1forvalue3_N)
print("\nThe attributes between {} and {} :".format(firstline3_N, secondline3_N))
print(group2forvalue3_N)
print("\nThe attributes between {} and {} :".format(secondline3_N, thirdline3_N))
print(group3forvalue3_N)
print("\nThe attributes between {} and {} :".format(thirdline3_N, max(value3_N)))
print(group4forvalue3_N)
if(len(group1forvalue3_N)>0):
x_axis1_3_N = numpy.linspace(min(group1forvalue3_N), max(group1forvalue3_N),len(group1forvalue3_N))
plt.scatter(x_axis1_3_N, group1forvalue3_N, color='g')
if(len(group2forvalue3_N)>0):
x_axis2_3_N = numpy.linspace(min(group2forvalue3_N), max(group2forvalue3_N),len(group2forvalue3_N))
plt.scatter(x_axis2_3_N, group2forvalue3_N, color='r')
if(len(group3forvalue3_N)>0):
x_axis3_3_N = numpy.linspace(min(group3forvalue3_N), max(group3forvalue3_N),len(group3forvalue3_N))
plt.scatter(x_axis3_3_N, group3forvalue3_N, color='b')
if(len(group4forvalue3_N)>0):
x_axis4_3_N = numpy.linspace(min(group4forvalue3_N), max(group4forvalue3_N),len(group4forvalue3_N))
plt.scatter(x_axis4_3_N, group4forvalue3_N, color='hotpink')
plt.axvline(x = firstline3_N, color = 'r')
plt.axvline(x = secondline3_N, color = 'b')
plt.axvline(x = thirdline3_N, color = 'hotpink')
plt.title("Equal width approach for {}. gene for normalized data: ".format(random3))
plt.xlabel("Width")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 33 - Third Random Gene's Equal Width Approach Process for Normalized Data
Width : 0.5
Equal width approach for 126. gene for normalized data:
The attributes between -1.0 and -0.5 :
[-0.9, -0.9, -0.9, -1.0, -0.9, -0.6, -0.9, -0.9, -0.8, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.7, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -0.8, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.8, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.8, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.8, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.8, -1.0, -0.8, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -0.8, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.7, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.8, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -0.8, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -0.8, -1.0, -1.0, -1.0, -0.8, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -0.8, -0.9, -0.9, -1.0, -0.9, -0.8, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.8, -1.0, -0.9]
The attributes between -0.5 and 0.0 :
[-0.4, -0.4, -0.4, -0.1]
The attributes between 0.0 and 0.5 :
[]
The attributes between 0.5 and 1.0 :
[1.0]
Figure 34 - Output of The Third Random Gene's Equal Width Approach Process for Normalized Data
Figure 35 - The Scatterplot of The Third Random Gene's Equal Width Approach for Normalized Data
Again, we will apply the same process on the normalized data. In fact, we know that we will not achieve a different result in terms of distribution, but as part of this study, we want to carry out every process, including the ones we are sure of.
#Question1>a>II-----------------------------------------------------------------------------
#Calculating the frequency for any object
frequency_N = int(len(value1_N)/4)
print("\n\n\n\nThe Frequency is for normalized data: ", frequency)
group1_forvalue1_forfrequency_N = []
group2_forvalue1_forfrequency_N = []
group3_forvalue1_forfrequency_N = []
group4_forvalue1_forfrequency_N = []
for i in range(frequency_N):
group1_forvalue1_forfrequency_N.append(value1_N[i])
for i in range(frequency_N,2*frequency_N):
group2_forvalue1_forfrequency_N.append(value1_N[i])
for i in range(2*frequency_N,3*frequency_N):
group3_forvalue1_forfrequency_N.append(value1_N[i])
for i in range(3*frequency_N,(int(len(value1_N)))):
group4_forvalue1_forfrequency_N.append(value1_N[i])
print("\n\nEqual frequency approach for {}. gene for normalized data: ".format(random1))
print("\nThe attributes between {} and {} :".format(0, frequency_N))
print(group1_forvalue1_forfrequency_N)
print("\nThe attributes between {} and {} :".format(frequency_N, frequency_N*2))
print(group2_forvalue1_forfrequency_N)
print("\nThe attributes between {} and {} :".format(frequency_N*2, frequency_N*3))
print(group3_forvalue1_forfrequency_N)
print("\nThe attributes between {} and {} :".format(frequency_N*3, len(value1_N)))
print(group4_forvalue1_forfrequency_N)
x1_1_N = numpy.arange(0,frequency_N)
plt.scatter(x1_1_N, group1_forvalue1_forfrequency_N, color='g')
x1_2_N = numpy.arange(frequency_N,2*frequency_N)
plt.scatter(x1_2_N, group2_forvalue1_forfrequency_N, color='r')
x1_3_N = numpy.arange(2*frequency_N,3*frequency_N)
plt.scatter(x1_3_N, group3_forvalue1_forfrequency_N, color='b')
x1_4_N = numpy.arange(3*frequency_N,4*frequency_N)
plt.scatter(x1_4_N, group4_forvalue1_forfrequency_N, color='hotpink')
plt.axvline(x = frequency_N, color = 'r')
plt.axvline(x = 2*frequency_N, color = 'b')
plt.axvline(x = 3*frequency_N, color = 'hotpink')
plt.title("Equal frequency approach for {}. gene for normalized data: ".format(random1))
plt.xlabel("Frequency")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 36 - First Random Gene's Equal Frequency Approach Process for Normalized Data
The Frequency is for normalized data: 125
Equal frequency approach for 72. gene for normalized data:
The attributes between 0 and 125 :
[-0.7, -0.8, -0.8, -0.9, -0.8, -0.5, -0.8, -0.7, -0.7, -0.5, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.5, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.6, -0.8, -0.8, -0.8, -0.7, -0.7, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.9, -0.8, -0.8, -0.8, -0.9, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.9, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.9, -0.8, -0.8, -0.7, -0.7, -0.7, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8]
The attributes between 125 and 250 :
[-0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.7, -0.3, -0.8, -0.8, -0.8, -0.8, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, 0.0, -0.8, -0.8, -0.8, -0.8, -0.8, -0.6, -0.8, -0.4, -0.8, -0.8, -0.8, -0.5, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.4, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.9]
The attributes between 250 and 375 :
[-0.8, -0.8, -0.8, -0.9, -0.8, -0.3, -0.8, -0.6, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.7, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, 1.0, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.7, -0.9, -0.6, -0.7, -0.7, -0.7, -0.7, 0.1, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.2, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.9, -0.8, -0.8, -0.9, -0.9, -0.9, -0.8, -0.8, -0.8, -0.8, -0.5, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.7, -0.8, -0.8, -0.8]
The attributes between 375 and 500 :
[-0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, 0.0, 0.4, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -1.0, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.5, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.6, -0.7, -0.6, -0.8, -0.7, -0.7, -0.8, -0.8, -0.8, -0.8, -0.7, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.9, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.4, -0.8, -0.8, -0.8, -0.4, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.5, -0.7, -0.7, -0.8, -0.5, -0.6, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.8, -0.7, -0.9, -0.8]
Figure 37 - Output of the First Random Gene's Equal Frequency Approach Process for Normalized Data
Figure 38 - The Scatterplot of The First Random Gene's Equal Frequency Approach for Normalized Data
group1_forvalue2_forfrequency_N = []
group2_forvalue2_forfrequency_N = []
group3_forvalue2_forfrequency_N = []
group4_forvalue2_forfrequency_N = []
for i in range(frequency_N):
group1_forvalue2_forfrequency_N.append(value2_N[i])
for i in range(frequency_N,2*frequency_N):
group2_forvalue2_forfrequency_N.append(value2_N[i])
for i in range(2*frequency_N,3*frequency_N):
group3_forvalue2_forfrequency_N.append(value2_N[i])
for i in range(3*frequency_N,(int(len(value2_N)))):
group4_forvalue2_forfrequency_N.append(value2_N[i])
print("\n\nEqual frequency approach for {}. gene for normalized data: ".format(random2))
print("\nThe attributes between {} and {} :".format(0, frequency_N))
print(group1_forvalue2_forfrequency_N)
print("\nThe attributes between {} and {} :".format(frequency_N, frequency_N*2))
print(group2_forvalue2_forfrequency_N)
print("\nThe attributes between {} and {} :".format(frequency_N*2, frequency_N*3))
print(group3_forvalue2_forfrequency_N)
print("\nThe attributes between {} and {} :".format(frequency_N*3, len(value2_N)))
print(group4_forvalue2_forfrequency_N)
x2_1_N = numpy.arange(0,frequency_N)
plt.scatter(x2_1_N, group1_forvalue2_forfrequency_N, color='g')
x2_2_N = numpy.arange(frequency_N,2*frequency_N)
plt.scatter(x2_2_N, group2_forvalue2_forfrequency_N, color='r')
x2_3_N = numpy.arange(2*frequency_N,3*frequency_N)
plt.scatter(x2_3_N, group3_forvalue2_forfrequency_N, color='b')
x2_4_N = numpy.arange(3*frequency_N,4*frequency_N)
plt.scatter(x2_4_N, group4_forvalue2_forfrequency_N, color='hotpink')
plt.axvline(x = frequency_N, color = 'r')
plt.axvline(x = 2*frequency_N, color = 'b')
plt.axvline(x = 3*frequency_N, color = 'hotpink')
plt.title("Equal frequency approach for {}. gene for normalized data: ".format(random2))
plt.xlabel("Frequency")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 39 - Second Random Gene's Equal Frequency Approach Process for Normalized Data
Equal frequency approach for 179. gene for normalized data:
The attributes between 0 and 125 :
[-0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, 0.1, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.8, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0]
The attributes between 125 and 250 :
[-1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -0.8, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.6, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.8, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0]
The attributes between 250 and 375 :
[-0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.1, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -0.5, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.8, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0]
The attributes between 375 and 500 :
[-1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.8, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, 1.0, -0.5, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.8, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.8, -1.0, -0.9]
Figure 40 - Output of The Second Random Gene's Equal Frequency Approach Process for Normalized Data
Figure 41 - The Scatterplot of The Second Random Gene's Equal Frequency Approach for Normalized Data
group1_forvalue3_forfrequency_N = []
group2_forvalue3_forfrequency_N = []
group3_forvalue3_forfrequency_N = []
group4_forvalue3_forfrequency_N = []
for i in range(frequency_N):
group1_forvalue3_forfrequency_N.append(value3_N[i])
for i in range(frequency_N,2*frequency_N):
group2_forvalue3_forfrequency_N.append(value3_N[i])
for i in range(2*frequency_N,3*frequency_N):
group3_forvalue3_forfrequency_N.append(value3_N[i])
for i in range(3*frequency_N,(int(len(value1_N)))):
group4_forvalue3_forfrequency_N.append(value3_N[i])
print("\n\nEqual frequency approach for {}. gene for normalized data: ".format(random3))
print("\nThe attributes between {} and {} :".format(0, frequency_N))
print(group1_forvalue3_forfrequency_N)
print("\nThe attributes between {} and {} :".format(frequency_N, frequency_N*2))
print(group2_forvalue3_forfrequency_N)
print("\nThe attributes between {} and {} :".format(frequency_N*2, frequency_N*3))
print(group3_forvalue3_forfrequency_N)
print("\nThe attributes between {} and {} :".format(frequency_N*3, len(value3_N)))
print(group4_forvalue3_forfrequency_N)
x3_1_N = numpy.arange(0,frequency_N)
plt.scatter(x3_1_N, group1_forvalue3_forfrequency_N, color='g')
x3_2_N = numpy.arange(frequency_N,2*frequency_N)
plt.scatter(x3_2_N, group2_forvalue3_forfrequency_N, color='r')
x3_3_N = numpy.arange(2*frequency_N,3*frequency_N)
plt.scatter(x3_3_N, group3_forvalue3_forfrequency_N, color='b')
x3_4_N = numpy.arange(3*frequency_N,4*frequency_N)
plt.scatter(x3_4_N, group4_forvalue3_forfrequency_N, color='hotpink')
plt.axvline(x = frequency_N, color = 'r')
plt.axvline(x = 2*frequency_N, color = 'b')
plt.axvline(x = 3*frequency_N, color = 'hotpink')
plt.title("Equal frequency approach for {}. gene for normalized data: ".format(random3))
plt.xlabel("Frequency")
plt.ylabel("The attribute value of the gene")
plt.show()
Figure 42 - Third Random Gene's Equal Frequency Approach Process for Normalized Data
Equal frequency approach for 126. gene for normalized data:
The attributes between 0 and 125 :
[-0.9, -0.9, -0.9, -1.0, -0.9, -0.6, -0.9, -0.9, -0.8, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.7, -1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -1.0, -0.8, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9]
The attributes between 125 and 250 :
[-0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.8, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.4, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.8, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.8, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -1.0, -1.0]
The attributes between 250 and 375 :
[-1.0, -1.0, -1.0, -1.0, -1.0, -0.8, -1.0, -0.8, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -0.8, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -0.9, 1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.9, -0.4, -0.9, -1.0, -0.9, -0.9, -1.0, -1.0, -0.9, -0.9, -0.7, -0.9, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -1.0, -1.0]
The attributes between 375 and 500 :
[-1.0, -1.0, -0.9, -1.0, -0.9, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -0.4, -0.1, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -1.0, -0.8, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -0.8, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -1.0, -1.0, -0.9, -1.0, -0.8, -1.0, -1.0, -1.0, -0.8, -1.0, -0.9, -0.9, -1.0, -1.0, -1.0, -0.9, -0.9, -0.9, -1.0, -1.0, -1.0, -1.0, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.9, -1.0, -0.9, -0.9, -0.9, -0.8, -0.9, -0.9, -1.0, -0.9, -0.8, -0.9, -0.9, -0.9, -0.9, -0.9, -1.0, -0.9, -1.0, -0.9, -0.8, -1.0, -0.9]
Figure 43 - Output of The Third Random Gene's Equal Frequency Approach Process for Normalized Data
Figure 44 - The Scatterplot of The Third Random Gene's Equal Frequency Approach for Normalized Data
We will apply all the operations we applied in 2.2.1 to the normalized data and evaluate our observations in the conclusion section.
#Question 2 - Attribute Similarity for Normalized Data----------------------------------------------------------
#Part B> The Euclidian Distance for normalized data
value4_N = Normalized_data_list[random4]
value5_N = Normalized_data_list[random5]
#Euclidian Distance Between value1 and value2 for normalized data
answersquare_val1val2_N = 0
for i in range(500):
answersquare_val1val2_N += (value1_N[i]-value2_N[i])**2
answer_val1val2_N = answersquare_val1val2_N**0.5
answer_val1val2_N = round(answer_val1val2_N,1)
print("\n\nEuclidian Distance for Normalized Data-----------------------------------------------------------------")
print("\nEuclidian Distance Between {}. gene and {}. gene for normalized data is : {}".format(random1, random2, answer_val1val2_N ))
#Euclidian Distance Between value1 and value3 for normalized data
answersquare_val1val3_N = 0
for i in range(500):
answersquare_val1val3_N += (value1_N[i]-value3_N[i])**2
answer_val1val3_N = answersquare_val1val3_N**0.5
answer_val1val3_N = round(answer_val1val3_N,1)
print("\nEuclidian Distance Between {}. gene and {}. gene for normalized data is : {}".format(random1, random3, answer_val1val3_N ))
#Euclidian Distance Between value1 and value4 for normalized data
answersquare_val1val4_N = 0
for i in range(500):
answersquare_val1val4_N += (value1_N[i]-value4_N[i])**2
answer_val1val4_N = answersquare_val1val4_N**0.5
answer_val1val4_N = round(answer_val1val4_N,1)
print("\nEuclidian Distance Between {}. gene and {}. gene for normalized data is : {}".format(random1, random4, answer_val1val4_N ))
#Euclidian Distance Between value1 and value5 for normalized data
answersquare_val1val5_N = 0
for i in range(500):
answersquare_val1val5_N += (value1_N[i]-value5_N[i])**2
answer_val1val5_N = answersquare_val1val5_N**0.5
answer_val1val5_N = round(answer_val1val5_N,1)
print("\nEuclidian Distance Between {}. gene and {}. gene for normalized data is : {}".format(random1, random5, answer_val1val5_N ))
#Euclidian Distance Between value2 and value3 for normalized data
answersquare_val2val3_N = 0
for i in range(500):
answersquare_val2val3_N += (value2_N[i]-value3_N[i])**2
answer_val2val3_N = answersquare_val2val3_N**0.5
answer_val2val3_N = round(answer_val2val3_N,1)
print("\nEuclidian Distance Between {}. gene and {}. gene for normalized data is : {}".format(random2, random3, answer_val2val3_N ))
#Euclidian Distance Between value2 and value4 for normalized data
answersquare_val2val4_N = 0
for i in range(500):
answersquare_val2val4_N += (value2_N[i]-value4_N[i])**2
answer_val2val4_N = answersquare_val2val4_N**0.5
answer_val2val4_N = round(answer_val2val4_N,1)
print("\nEuclidian Distance Between {}. gene and {}. gene for normalized data is : {}".format(random2, random4, answer_val2val4_N ))
#Euclidian Distance Between value2 and value5 for normalized data
answersquare_val2val5_N = 0
for i in range(500):
answersquare_val2val5_N += (value2_N[i]-value5_N[i])**2
answer_val2val5_N = answersquare_val2val5_N**0.5
answer_val2val5_N = round(answer_val2val5_N,1)
print("\nEuclidian Distance Between {}. gene and {}. gene for normalized data is : {}".format(random2, random5, answer_val2val5_N ))
#Euclidian Distance Between value3 and value4 for normalized data
answersquare_val3val4_N = 0
for i in range(500):
answersquare_val3val4_N += (value3_N[i]-value4_N[i])**2
answer_val3val4_N = answersquare_val3val4_N**0.5
answer_val3val4_N = round(answer_val3val4_N,1)
print("\nEuclidian Distance Between {}. gene and {}. gene for normalized data is : {}".format(random3, random4, answer_val3val4_N ))
#Euclidian Distance Between value3 and value5 for normalized data
answersquare_val3val5_N = 0
for i in range(500):
answersquare_val3val5_N += (value3_N[i]-value5_N[i])**2
answer_val3val5_N = answersquare_val3val5_N**0.5
answer_val3val5_N = round(answer_val3val5_N,1)
print("\nEuclidian Distance Between {}. gene and {}. gene for normalized data is : {}".format(random3, random5, answer_val3val5_N ))
#Euclidian Distance Between value4 and value5 for normalized data
answersquare_val4val5_N = 0
for i in range(500):
answersquare_val4val5_N += (value4_N[i]-value5_N[i])**2
answer_val4val5_N = answersquare_val4val5_N**0.5
answer_val4val5_N = round(answer_val4val5_N,1)
print("\nEuclidian Distance Between {}. gene and {}. gene for normalized data is : {}".format(random4, random5, answer_val4val5_N ))
Figure 45 - The Process of The Euclidian Distance for Five Random Genes of Normalized Data
Euclidian Distance for Normalized Data-----------------------------------------------------------------
Euclidian Distance Between 72. gene and 179. gene for normalized data is : 4.7
Euclidian Distance Between 72. gene and 126. gene for normalized data is : 4.0
Euclidian Distance Between 72. gene and 161. gene for normalized data is : 3.4
Euclidian Distance Between 72. gene and 18. gene for normalized data is : 2.1
Euclidian Distance Between 179. gene and 126. gene for normalized data is : 2.4
Euclidian Distance Between 179. gene and 161. gene for normalized data is : 2.6
Euclidian Distance Between 179. gene and 18. gene for normalized data is : 3.4
Euclidian Distance Between 126. gene and 161. gene for normalized data is : 1.3
Euclidian Distance Between 126. gene and 18. gene for normalized data is : 2.5
Euclidian Distance Between 161. gene and 18. gene for normalized data is : 2.1
Figure 46 - The Output of The Five Random Genes' Euclidian Distance for Normalized Data
We observe interesting discrepancies between the results of the original data and the results of the normalized data.
Table 1 - Euclidian Distance Comparisons of Original and Normalized Data
Original Datas' Euclidian Distances | Genes | Normalized Datas' Euclidian Distances |
---|---|---|
19255.0 | 72 and 179 | 4.7 |
7832.7 | 72 and 126 | 4.0 |
19255.0 | 72 and 161 | 3.4 |
7832.7 | 72 and 18 | 2.1 |
18567.5 | 179 and 126 | 2.4 |
0.0 | 179 and 161 | 2.6 |
18567.5 | 179 and 18 | 3.4 |
18567.5 | 126 and 161 | 1.3 |
0.0 | 126 and 18 | 2.5 |
18567.5 | 161 and 18 | 2.1 |
The most similar genes, according to our new study, are genes 126 and 161. The most dissimilar genes are genes 72 and 179. Although the two studies do not reach the same conclusion, I think that normalized data will give healthier and clearer results. The results that are consistent are the distance of the 72nd gene to the 179th gene. However, when we looked again at the unnormalized data, we saw that genes 126 and 161 that were not similar according to their euclidian distance were slightly similar according to cosine similarity, but very similar according to the correlation results. We will also consider these consistent and inconsistent results. However, when we normalize the data, we see that there are inconsistencies in the genes that we experienced inconsistency before. And we can say that the gene causing the most inconsistency is the 161st gene.
In our previous study which is 2.3.3, we conducted a study in which we obtained different results with the original data. Now we will use a different method with normalized data and see if we can observe the same difference.
#Part B> Cosine Similarity for normalized data-----------------------------------------------------------
#Cosine Similarity Between Value 1 and Value 2 for normalized data
val1_val2_d1d2_N = 0
val1_val2_d1d1_N = 0
val1_val2_d2d2_N = 0
for i in range(500):
val1_val2_d1d2_N += value1_N[i]*value2_N[i]
val1_val2_d1d1_N += value1_N[i]**2
val1_val2_d2d2_N += value2_N[i]**2
CosSimilarity_val1_val2_N = val1_val2_d1d2_N / ((val1_val2_d1d1_N**0.5) * (val1_val2_d2d2_N**0.5) )
CosSimilarity_val1_val2_N = round(CosSimilarity_val1_val2_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random1,random2,CosSimilarity_val1_val2_N))
#Cosine Similarity Between Value 1 and Value 3 for normalized data
val1_val3_d1d2_N = 0
val1_val3_d1d1_N = 0
val1_val3_d2d2_N = 0
for i in range(500):
val1_val3_d1d2_N += value1_N[i]*value3_N[i]
val1_val3_d1d1_N += value1_N[i]**2
val1_val3_d2d2_N += value3_N[i]**2
CosSimilarity_val1_val3_N = val1_val3_d1d2_N / ((val1_val3_d1d1_N**0.5) * (val1_val3_d2d2_N**0.5) )
CosSimilarity_val1_val3_N = round(CosSimilarity_val1_val3_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random1,random3,CosSimilarity_val1_val3_N))
#Cosine Similarity Between Value 1 and Value 4 for normalized data
val1_val4_d1d2_N = 0
val1_val4_d1d1_N = 0
val1_val4_d2d2_N = 0
for i in range(500):
val1_val4_d1d2_N += value1_N[i]*value4_N[i]
val1_val4_d1d1_N += value1_N[i]**2
val1_val4_d2d2_N += value4_N[i]**2
CosSimilarity_val1_val4_N = val1_val4_d1d2_N / ((val1_val4_d1d1_N**0.5) * (val1_val4_d2d2_N**0.5) )
CosSimilarity_val1_val4_N = round(CosSimilarity_val1_val4_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random1,random4,CosSimilarity_val1_val4_N))
#Cosine Similarity Between Value 1 and Value 5 for normalized data
val1_val5_d1d2_N = 0
val1_val5_d1d1_N = 0
val1_val5_d2d2_N = 0
for i in range(500):
val1_val5_d1d2_N += value1_N[i]*value5_N[i]
val1_val5_d1d1_N += value1_N[i]**2
val1_val5_d2d2_N += value5_N[i]**2
CosSimilarity_val1_val5_N = val1_val5_d1d2_N / ((val1_val5_d1d1_N**0.5) * (val1_val5_d2d2_N**0.5) )
CosSimilarity_val1_val5_N = round(CosSimilarity_val1_val5_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random1,random5,CosSimilarity_val1_val5_N))
#Cosine Similarity Between Value 2 and Value 3 for normalized data
val2_val3_d1d2_N = 0
val2_val3_d1d1_N = 0
val2_val3_d2d2_N = 0
for i in range(500):
val2_val3_d1d2_N += value2_N[i]*value3_N[i]
val2_val3_d1d1_N += value2_N[i]**2
val2_val3_d2d2_N += value3_N[i]**2
CosSimilarity_val2_val3_N = val2_val3_d1d2_N / ((val2_val3_d1d1_N**0.5) * (val2_val3_d2d2_N**0.5) )
CosSimilarity_val2_val3_N = round(CosSimilarity_val2_val3_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random2,random3,CosSimilarity_val2_val3_N))
#Cosine Similarity Between Value 2 and Value 4 for normalized data
val2_val4_d1d2_N = 0
val2_val4_d1d1_N = 0
val2_val4_d2d2_N = 0
for i in range(500):
val2_val4_d1d2_N += value2_N[i]*value4_N[i]
val2_val4_d1d1_N += value2_N[i]**2
val2_val4_d2d2_N += value4_N[i]**2
CosSimilarity_val2_val4_N = val2_val4_d1d2_N / ((val2_val4_d1d1_N**0.5) * (val2_val4_d2d2_N**0.5) )
CosSimilarity_val2_val4_N = round(CosSimilarity_val2_val4_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random2,random4,CosSimilarity_val2_val4_N))
#Cosine Similarity Between Value 2 and Value 5 for normalized data
val2_val5_d1d2_N = 0
val2_val5_d1d1_N = 0
val2_val5_d2d2_N = 0
for i in range(500):
val2_val5_d1d2_N += value2_N[i]*value4_N[i]
val2_val5_d1d1_N += value2_N[i]**2
val2_val5_d2d2_N += value4_N[i]**2
CosSimilarity_val2_val5_N = val2_val5_d1d2_N / ((val2_val5_d1d1_N**0.5) * (val2_val5_d2d2_N**0.5) )
CosSimilarity_val2_val5_N = round(CosSimilarity_val2_val5_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random2,random5,CosSimilarity_val2_val5_N))
#Cosine Similarity Between Value 3 and Value 4 for normalized data
val3_val4_d1d2_N = 0
val3_val4_d1d1_N = 0
val3_val4_d2d2_N = 0
for i in range(500):
val3_val4_d1d2_N += value3_N[i]*value4_N[i]
val3_val4_d1d1_N += value3_N[i]**2
val3_val4_d2d2_N += value4_N[i]**2
CosSimilarity_val3_val4_N = val3_val4_d1d2_N / ((val3_val4_d1d1_N**0.5) * (val3_val4_d2d2_N**0.5) )
CosSimilarity_val3_val4_N = round(CosSimilarity_val3_val4_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random3,random4,CosSimilarity_val3_val4_N))
#Cosine Similarity Between Value 3 and Value 5 for normalized data
val3_val5_d1d2_N = 0
val3_val5_d1d1_N = 0
val3_val5_d2d2_N = 0
for i in range(500):
val3_val5_d1d2_N += value3_N[i]*value5_N[i]
val3_val5_d1d1_N += value3_N[i]**2
val3_val5_d2d2_N += value5_N[i]**2
CosSimilarity_val3_val5_N = val3_val5_d1d2_N / ((val3_val5_d1d1_N**0.5) * (val3_val5_d2d2_N**0.5) )
CosSimilarity_val3_val5_N = round(CosSimilarity_val3_val5_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random3,random5,CosSimilarity_val3_val5_N))
#Cosine Similarity Between Value 4 and Value 5 for normalized data
val4_val5_d1d2_N = 0
val4_val5_d1d1_N = 0
val4_val5_d2d2_N = 0
for i in range(500):
val4_val5_d1d2_N += value4_N[i]*value5_N[i]
val4_val5_d1d1_N += value4_N[i]**2
val4_val5_d2d2_N += value5_N[i]**2
CosSimilarity_val4_val5_N = val4_val5_d1d2_N / ((val4_val5_d1d1_N**0.5) * (val4_val5_d2d2_N**0.5) )
CosSimilarity_val4_val5_N = round(CosSimilarity_val4_val5_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random4,random5,CosSimilarity_val4_val5_N))
Figure 47 - The Process of The Cosine Similarity for Five Random Genes of Normalized Data
Cosine Similarity for normalized data-----------------------------------------------------------------
Cosine Similarity between 72 and 179 for normalized data is : 1.0
Cosine Similarity between 72 and 126 for normalized data is : 1.0
Cosine Similarity between 72 and 161 for normalized data is : 1.0
Cosine Similarity between 72 and 18 for normalized data is : 1.0
Cosine Similarity between 179 and 126 for normalized data is : 1.0
Cosine Similarity between 179 and 161 for normalized data is : 1.0
Cosine Similarity between 179 and 18 for normalized data is : 1.0
Cosine Similarity between 126 and 161 for normalized data is : 1.0
Cosine Similarity between 126 and 18 for normalized data is : 1.0
Cosine Similarity between 161 and 18 for normalized data is : 1.0
Figure 48 - The Output of The Five Random Genes' Cosine Similarity for Normalized Data
Table 2 - Cosine Similarity Comparisons of Original and Normalized Data
Original Data’s Cosine Similarities | Genes | Normalized Data’s Cosine Similarities |
---|---|---|
0.7 | 72 and 179 | 1.0 |
0.9 | 72 and 126 | 1.0 |
0.7 | 72 and 161 | 1.0 |
0.9 | 72 and 18 | 1.0 |
0.7 | 179 and 126 | 1.0 |
1.0 | 179 and 161 | 1.0 |
1.0 | 179 and 18 | 1.0 |
0.7 | 126 and 161 | 1.0 |
1.0 | 126 and 18 | 1.0 |
0.7 | 161 and 18 | 1.0 |
We found all results to be 1.0 with normalized data. It is difficult to make an effective analysis of this study. Because the results are consistent and together with the results in our next study, our observations about the whole study in general will be more inclusive.
We found the results of our correlation study with the original data insufficient. I think that the correlation studies we will do with normalized data will give more effective and clear results.
#Part B> Cosine Similarity for normalized data-----------------------------------------------------------
#Cosine Similarity Between Value 1 and Value 2 for normalized data
val1_val2_d1d2_N = 0
val1_val2_d1d1_N = 0
val1_val2_d2d2_N = 0
for i in range(500):
val1_val2_d1d2_N += value1_N[i]*value2_N[i]
val1_val2_d1d1_N += value1_N[i]**2
val1_val2_d2d2_N += value2_N[i]**2
CosSimilarity_val1_val2_N = val1_val2_d1d2_N / ((val1_val2_d1d1_N**0.5) * (val1_val2_d2d2_N**0.5) )
CosSimilarity_val1_val2_N = round(CosSimilarity_val1_val2_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random1,random2,CosSimilarity_val1_val2_N))
#Cosine Similarity Between Value 1 and Value 3 for normalized data
val1_val3_d1d2_N = 0
val1_val3_d1d1_N = 0
val1_val3_d2d2_N = 0
for i in range(500):
val1_val3_d1d2_N += value1_N[i]*value3_N[i]
val1_val3_d1d1_N += value1_N[i]**2
val1_val3_d2d2_N += value3_N[i]**2
CosSimilarity_val1_val3_N = val1_val3_d1d2_N / ((val1_val3_d1d1_N**0.5) * (val1_val3_d2d2_N**0.5) )
CosSimilarity_val1_val3_N = round(CosSimilarity_val1_val3_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random1,random3,CosSimilarity_val1_val3_N))
#Cosine Similarity Between Value 1 and Value 4 for normalized data
val1_val4_d1d2_N = 0
val1_val4_d1d1_N = 0
val1_val4_d2d2_N = 0
for i in range(500):
val1_val4_d1d2_N += value1_N[i]*value4_N[i]
val1_val4_d1d1_N += value1_N[i]**2
val1_val4_d2d2_N += value4_N[i]**2
CosSimilarity_val1_val4_N = val1_val4_d1d2_N / ((val1_val4_d1d1_N**0.5) * (val1_val4_d2d2_N**0.5) )
CosSimilarity_val1_val4_N = round(CosSimilarity_val1_val4_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random1,random4,CosSimilarity_val1_val4_N))
#Cosine Similarity Between Value 1 and Value 5 for normalized data
val1_val5_d1d2_N = 0
val1_val5_d1d1_N = 0
val1_val5_d2d2_N = 0
for i in range(500):
val1_val5_d1d2_N += value1_N[i]*value5_N[i]
val1_val5_d1d1_N += value1_N[i]**2
val1_val5_d2d2_N += value5_N[i]**2
CosSimilarity_val1_val5_N = val1_val5_d1d2_N / ((val1_val5_d1d1_N**0.5) * (val1_val5_d2d2_N**0.5) )
CosSimilarity_val1_val5_N = round(CosSimilarity_val1_val5_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random1,random5,CosSimilarity_val1_val5_N))
#Cosine Similarity Between Value 2 and Value 3 for normalized data
val2_val3_d1d2_N = 0
val2_val3_d1d1_N = 0
val2_val3_d2d2_N = 0
for i in range(500):
val2_val3_d1d2_N += value2_N[i]*value3_N[i]
val2_val3_d1d1_N += value2_N[i]**2
val2_val3_d2d2_N += value3_N[i]**2
CosSimilarity_val2_val3_N = val2_val3_d1d2_N / ((val2_val3_d1d1_N**0.5) * (val2_val3_d2d2_N**0.5) )
CosSimilarity_val2_val3_N = round(CosSimilarity_val2_val3_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random2,random3,CosSimilarity_val2_val3_N))
#Cosine Similarity Between Value 2 and Value 4 for normalized data
val2_val4_d1d2_N = 0
val2_val4_d1d1_N = 0
val2_val4_d2d2_N = 0
for i in range(500):
val2_val4_d1d2_N += value2_N[i]*value4_N[i]
val2_val4_d1d1_N += value2_N[i]**2
val2_val4_d2d2_N += value4_N[i]**2
CosSimilarity_val2_val4_N = val2_val4_d1d2_N / ((val2_val4_d1d1_N**0.5) * (val2_val4_d2d2_N**0.5) )
CosSimilarity_val2_val4_N = round(CosSimilarity_val2_val4_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random2,random4,CosSimilarity_val2_val4_N))
#Cosine Similarity Between Value 2 and Value 5 for normalized data
val2_val5_d1d2_N = 0
val2_val5_d1d1_N = 0
val2_val5_d2d2_N = 0
for i in range(500):
val2_val5_d1d2_N += value2_N[i]*value4_N[i]
val2_val5_d1d1_N += value2_N[i]**2
val2_val5_d2d2_N += value4_N[i]**2
CosSimilarity_val2_val5_N = val2_val5_d1d2_N / ((val2_val5_d1d1_N**0.5) * (val2_val5_d2d2_N**0.5) )
CosSimilarity_val2_val5_N = round(CosSimilarity_val2_val5_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random2,random5,CosSimilarity_val2_val5_N))
#Cosine Similarity Between Value 3 and Value 4 for normalized data
val3_val4_d1d2_N = 0
val3_val4_d1d1_N = 0
val3_val4_d2d2_N = 0
for i in range(500):
val3_val4_d1d2_N += value3_N[i]*value4_N[i]
val3_val4_d1d1_N += value3_N[i]**2
val3_val4_d2d2_N += value4_N[i]**2
CosSimilarity_val3_val4_N = val3_val4_d1d2_N / ((val3_val4_d1d1_N**0.5) * (val3_val4_d2d2_N**0.5) )
CosSimilarity_val3_val4_N = round(CosSimilarity_val3_val4_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random3,random4,CosSimilarity_val3_val4_N))
#Cosine Similarity Between Value 3 and Value 5 for normalized data
val3_val5_d1d2_N = 0
val3_val5_d1d1_N = 0
val3_val5_d2d2_N = 0
for i in range(500):
val3_val5_d1d2_N += value3_N[i]*value5_N[i]
val3_val5_d1d1_N += value3_N[i]**2
val3_val5_d2d2_N += value5_N[i]**2
CosSimilarity_val3_val5_N = val3_val5_d1d2_N / ((val3_val5_d1d1_N**0.5) * (val3_val5_d2d2_N**0.5) )
CosSimilarity_val3_val5_N = round(CosSimilarity_val3_val5_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random3,random5,CosSimilarity_val3_val5_N))
#Cosine Similarity Between Value 4 and Value 5 for normalized data
val4_val5_d1d2_N = 0
val4_val5_d1d1_N = 0
val4_val5_d2d2_N = 0
for i in range(500):
val4_val5_d1d2_N += value4_N[i]*value5_N[i]
val4_val5_d1d1_N += value4_N[i]**2
val4_val5_d2d2_N += value5_N[i]**2
CosSimilarity_val4_val5_N = val4_val5_d1d2_N / ((val4_val5_d1d1_N**0.5) * (val4_val5_d2d2_N**0.5) )
CosSimilarity_val4_val5_N = round(CosSimilarity_val4_val5_N,1)
print("\nCosine Similarity between {} and {} for normalized data is : {}".format(random4,random5,CosSimilarity_val4_val5_N))
Figure 49 - The Process of The Correlation for Five Random Genes of Normalized Data
Correlation for Normalized Data--------------------------------------------------------------------------
Correlation Between 72 and 179 for normalized data is : 59215211.66775865
Correlation Between 72 and 126 for normalized data is : 88860955.85129057
Correlation Between 72 and 161 for normalized data is : 74328446.03061298
Correlation Between 72 and 18 for normalized data is : 54231383.99139777
Correlation Between 179 and 126 for normalized data is : 98264779.24568006
Correlation Between 179 and 161 for normalized data is : 81496122.02069733
Correlation Between 179 and 18 for normalized data is : 55672209.656829946
Correlation Between 126 and 161 for normalized data is : 125662191.8803252
Correlation Between 126 and 18 for normalized data is : 81931257.73940945
Correlation Between 161 and 18for normalized data is : 66455952.76899566
Figure 50 - The Output of The Five Random Genes' Correlation for Normalized Data
Table 3 - Correlation Comparisons of Original and Normalized Data
Original Datas’ Correlations | Genes | Normalized Datas’ Correlations |
---|---|---|
2.74 | 72 and 179 | 59215211.67 |
1.37 | 72 and 126 | 88860955.85 |
2.74 | 72 and 161 | 74328446.03 |
1.37 | 72 and 18 | 54231383.99 |
7.13 | 179 and 126 | 98264779.25 |
3.12 | 179 and 161 | 81496122.02 |
7.13 | 179 and 18 | 55672209.66 |
7.13 | 126 and 161 | 125662191.88 |
3.95 | 126 and 18 | 81931257.74 |
7.13 | 161 and 18 | 66455952.77 |
According to the correlations of the normalized data, the genes closest to each other are the 126th and 161st genes. The most dissimilar genes are genes 72 and 18. We can also detect the most dissimilar genes in the same non-normalized correlation results. However, similarity of genes 126 and 161 was also present at normalized euclidian distances. And we detected this gene pair as least similar in previously unnormalized cosine similarity results. Indeed, 161.gen makes things very difficult.
First, it's important to note that the Euclidean Distance, Cosine Similarity, and Correlation metrics are used to evaluate similarities between data. These metrics are commonly used metrics for comparing data. Euclidean Distance expresses the distance between two data. Lower Euclidean Distance values indicate that the data are closer together. For example, the Euclidean Distance between genes 72 and 126 was calculated to be 7832.7 units. This indicates that these two genes are relatively close to each other. Similarly, the Euclidean Distance between genes 72 and 18 is 7832.7 units. This shows that these two genes are close to each other. In the Normalized Data's Euclidean Distances table, it is seen that your data has been normalized beforehand. Normalization allows data to scale and helps make comparisons more consistent. According to these normalized data, the Euclidean Distance values between genes are smaller. For example, the Euclidean Distance between genes 72 and 179 is 4.7 units. This indicates a lower distance value than normalized data. Cosine Similarity refers to the angular similarity between two genes. Cosine Similarity values range from 0 to 1, with values close to 1 indicating that the two genes are closer together. For example, the Cosine Similarity between genes 72 and 126 was calculated as 0.9, so these two genes are pretty close to each other. Likewise, Cosine Similarity between genes 72 and 18 is 0.9. In the Normalized Data's Cosine Similarities table, it is seen that the Cosine Similarities values are close to 1 according to the normalized data. This indicates that all gene pairs are similar to each other. However, it is important to verify that these results are consistent with real-world data. Correlation refers to the relationship between two genes. Correlation values range from -1 to 1. A positive Correlation value indicates a positive relationship between genes, while a negative Correlation value indicates an inverse relationship between genes. For example, the Correlation value between the 72nd and 179th genes was calculated as 2.74, indicating a positive association. Similarly, the Correlation between the 72nd and 126th genes is 1.37. In the Normalized Data's Correlations table, it is seen that the Correlation values are greater than the normalized data. However, too large of these values can complicate interpretation and be misleading about real-world relationships. Therefore, it is important to carefully evaluate the Correlation results. In general, the Euclidean Distance, Cosine Similarity, and Correlation analyzes are useful metrics for assessing similarities between genes. However, careful interpretation is required to align the nature of your data and analysis results with real-world context. Having your data pre-normalized can help make comparisons more consistent, but it's important to compare with real-world data to confirm the reliability of the results.
[1] “Making a list of evenly spaced numbers in a certain range in python”, stackoverflow.com https://stackoverflow.com/questions/6683690/making-a-list-of-evenly-spaced-numbers-in-a-certain-range-in-python (accessed Nov. 20, 2022)
[2] “Python Figure Reference: layout.xaxis”, plotly.com https://plotly.com/python/reference/layout/xaxis/ (accessed Nov. 20, 2022)
[3] Mert Alabaş, “Matplotlib Kütüphanesi İle Scatter Plot”, https://medium.com/datarunner/matplotlib-k%C3%BCt%C3%BCphanesi-i%CC%87le-scatter-plot-9b8c181fc9ad (accessed Nov. 20, 2022)
[4] Mert Alabaş, “Python İle Veri Görselleştirme: Matplotlib Kütüphanesi-1”, https://medium.com/datarunner/matplotlibkutuphanesi-1-99087692102b (accessed Nov. 20, 2022)
[5] “Correlation Analysis in Data Mining”, javapoint.com https://www.javatpoint.com/correlation-analysis-in-data-mining (accessed Nov. 20, 2022)