Hey Angel - Thanks for the great work on this package!
This issue relates to the ED50 calculation being off from where the curve intersects at y=50. Here, I use normalized data (constrained within 0-100) and attempt to extract an ED50 closer to the ground truth.
library(tidydrc)
#Make a data frame of dose-response results (response signifies pre-normalized %Cell Death values)
#As a ground truth, we have dose of 59.375 equals response of 50.000
mydr <- data.frame("dose"=c(1900,950,475,237.5,118.75,59.375,29.68,14.84,7.42,3.71),
"response"=c(98.38,96.267,93.300,91.1,53.56,50.000,44.2,24.1,15.5,9.01))
#Apply the LL.4 model to fit a log-logistic curve
mydr.LL4 <- tidydrc_model(mydr, dose, response, model=LL.4())
#Extract the EC50 value
mydr_EC50 <- as.data.frame(as.numeric(mydr.LL4[[4]][[1]][4,2]))
#Plot the curve
#ed50=TRUE plots a dashed black line according to the extrapolated EC50
#ground truth lines are in red. y-axis=50, and x-axis=59.375
tidydrc_plot(mydr.LL4,ed50=TRUE) +
scale_x_log10(labels=log10) +
xlab("Log10[concentration]")+
ylab("% Cell Death") +
geom_hline(aes(yintercept=50), linetype=1, color="red") +
geom_vline(aes(xintercept=59.375), linetype=1, color="red")+
geom_point(aes(x=dose, y=response), size=2.4, color="blue") +
geom_text(aes(x=10,y=100,label=paste0("original_EC50: ",round(mydr_EC50,3))))
![image](https://user-images.githubusercontent.com/80007059/113588788-334b5300-95f6-11eb-92b7-cbf9320c2df3.png)
We can see the dashed line is off, calling the ec50 as 67.29, when we know it is 59.375
Could this be due to extrapolating 50% of the curves maximal effect, and not considering the constriants of 0-100 imposed by the pre-normalized data?
#Look at the predicted values of the fit to see if we can get closer to the 'true' ec50 value
mydr_predvals <- mydr.LL4$pred[[1]]
plot(mydr_predvals$pred)
#From the predicted values, get the dose where the response is closest to 50
new_EC50 <- mydr_predvals[which.min(abs(50-mydr_predvals$pred)),1] #note which.min alone returns the *position*, so doing it like this returns the actual value
#Re-make the plot but add new_EC50 (green) and see if it gets any closer
tidydrc_plot(mydr.LL4,ed50=TRUE) +
scale_x_log10(labels=log10) +
xlab("Log10[concentration]")+
ylab("% Cell Death") +
geom_hline(aes(yintercept=50), linetype=1, color="red") +
geom_vline(aes(xintercept=59.375), linetype=1, color="red")+
geom_point(aes(x=dose, y=response), size=2.4, color="blue") +
geom_vline(aes(xintercept=new_EC50), linetype=5, color="darkgreen") +
geom_text(aes(x=10,y=100,label=paste0("original_EC50: ",round(mydr_EC50,3)))) +
geom_text(aes(x=10.9,y=94, label=paste0("new_EC50: ", round(new_EC50,3))))
![image](https://user-images.githubusercontent.com/80007059/113588952-62fa5b00-95f6-11eb-90a4-59c77821d679.png)
The new EC50 gets much closer.
Thoeretically, some datasets may have a response point closer to 50 than exists in the predicted values
#Merge the original response with the predicted response and take whichever value of the combined results is closest to 50:
merged <- data.frame("dose"=abind(mydr_predvals$dose,mydr$dose), "response"=abind(mydr_predvals$pred, mydr$response))
new2_EC50 <- merged[which.min(abs(50-merged$response)),1]
#Re-make the plot and add the new2_EC50 value
tidydrc_plot(mydr.LL4,ed50=TRUE) +
scale_x_log10(labels=log10) +
xlab("Log10[concentration]")+
ylab("% Cell Death") +
geom_hline(aes(yintercept=50), linetype=1, color="red") +
geom_vline(aes(xintercept=59.375), linetype=1, color="red")+
geom_point(aes(x=dose, y=response), size=2.4, color="blue") +
geom_vline(aes(xintercept=new_EC50), linetype=5, color="darkgreen") +
geom_vline(aes(xintercept=new2_EC50), linetype=5, color="cyan", size=1.2)+
geom_text(aes(x=10,y=100,label=paste0("original_EC50: ",round(mydr_EC50,3)))) +
geom_text(aes(x=10.9,y=94, label=paste0("new_EC50: ", round(new_EC50,3)))) +
geom_text(aes(x=10.6,y=88, label=paste0("new2_EC50: ", round(new2_EC50,3))))
![image](https://user-images.githubusercontent.com/80007059/113589032-7d343900-95f6-11eb-9d65-1175dd01c1dc.png)
Now we've achieved the ground truth. I realize that the likelihood of a single point in the observed data being closer to 50 than the best predicted value is pretty low, but merging both datasets and looking for the closest dose where response=50 seems to fix this. Either way, the two values aren't super different.
Best,
Charlie