eimenhmdt / autoresearcher Goto Github PK
View Code? Open in Web Editor NEW⚡ Automating scientific workflows with AI ⚡
License: MIT License
⚡ Automating scientific workflows with AI ⚡
License: MIT License
For some questions that refer to old knowledge, such as questions that have been studied for a very long time, the AI still looks up recent papers (up to 2000), which is likely because previous versions are not widely available in a digital text format, but instead in a PDF.
Perhaps in the cases when it is clear that it is a long-studied topic, the AI should choose papers based not on their top position, which usually indicates the novelty they bring, but instead also based on the number of references or another heuristic, that allows identifying papers with a good overview of the field, such as literature review papers. If not done, the AI struggles to understand the topic well as it does not read the foundational papers, jumping straight to the cutting edge.
Research question: Omnichannel marketing
Auto Researcher initiated!
Generating keyword combinations...
Keyword combinations generated!
Fetching top 20 papers...
Exception Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_22004/801009983.py in
----> 1 researcher = literature_review(research_question)
~\anaconda3\envs\Python38\lib\site-packages\autoresearcher\workflows\literature_review\literature_review.py in literature_review(research_question, output_file)
73 search_query = research_question
74 print(colored("Fetching top 20 papers...", "yellow"))
---> 75 top_papers = SemanticScholar.fetch_and_sort_papers(search_query, keyword_combinations=keyword_combinations, year_range="2000-2023")
76 print(colored("Top 20 papers fetched!", "green"))
77
~\anaconda3\envs\Python38\lib\site-packages\autoresearcher\data_sources\web_apis\semantic_scholar_loader.py in fetch_and_sort_papers(self, search_query, limit, top_n, year_range, keyword_combinations, weight_similarity)
25
26 for combination in keyword_combinations:
---> 27 papers.extend(self.fetch_data(combination, limit, year_range))
28
29 max_citations = max(papers, key=lambda x: x['citationCount'])['citationCount']
~\anaconda3\envs\Python38\lib\site-packages\autoresearcher\data_sources\web_apis\semantic_scholar_loader.py in fetch_data(self, search_query, limit, year_range)
16 params["year"] = year_range
17
---> 18 data = self.make_request("", params=params)
19 return data.get('data', [])
20
~\anaconda3\envs\Python38\lib\site-packages\autoresearcher\data_sources\web_apis\base_web_api_data_loader.py in make_request(self, endpoint, params)
18 return data
19 else:
---> 20 raise Exception(f"Failed to fetch data from API: {response.status_code}")
21
Exception: Failed to fetch data from API: 400
I kept getting a JSONDecodeError (raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
) from the function get_citation_by_doi().
I currently worked around it by commenting out the function, the extract citation section, and the logic under 'for papers in papers' related to DOI, so the ctiation is just the paper ("url"), but wanted to flag it in case anyone else had a similar issue.
I think it would be great to be able to configure the following in the literature_review()
function:
year_range
top_n
papersHi, I am not a coder, a friend helped set it up, but autoresearcher is amazing!
My issue is that autoresearcher missed the most obvious articles no matter what I asked.
I asked about drug interactions and the herb Echinacea. I tried various search strings, from complex to a very simple one: "Echinacea, drug interactions"
Each time autoresearchers produced accurate keyword combinations to search for papers such as: 1. Echinacea, medication, interactions, 2. Herbal remedies, drug interactions, Echinacea, 3. Echinacea, prescription drugs, interactions, 4. Echinacea, supplements, drug interactions, 5. Echinacea, adverse effects, drug interactions
However, each time autoresearcher failed to identify and use the two articles listed below. Only if I actually included the very title of the papers in the search string, would autoresearcher include them.
The most obvious articles where not found and some articles with little relevance were included.
My questions are why this is and what I could do about improving the search questions?
thank you very much
These two articles were not included. The first one even has echinacea and drug interactions in its title. The second paper has a detail abstracts and also mentions echinacea and drug interactions.
A critical evaluation of drug interactions with Echinacea spp
Camille Freeman, Kevin Spelman
PMID: 18618481 DOI: 10.1002/mnfr.200700113
Review and Assessment of Medicinal Safety Data of Orally Used Echinacea Preparations
Karin Ardjomand-Woelkart, Rudolf Bauer
PMID: 26441065 DOI: 10.1055/s-0035-1558096
Keyword combinations used to search for papers: 1. Echinacea, medication, interactions, 2. Herbal remedies, drug interactions, Echinacea, 3. Echinacea, prescription drugs, interactions, 4. Echinacea, supplements, drug interactions, 5. Echinacea, adverse effects, drug interactions
Literature Review:
Echinacea is a commonly used herbal remedy for the prevention of common cold, but its efficacy remains inconclusive or contradictory (Izzo et al., 2016). Moreover, it may cause potentially serious adverse events, including herb-drug interactions (Izzo et al., 2016). A study by Qato et al. (2016) found that 15.1% of older adults were at risk for potential major drug-drug interactions, and most of these interactions involved medications and dietary supplements increasingly used in 2010-2011, including echinacea. Sparreboom et al. (2004) reported that echinacea has the potential to significantly modulate the activity of drug-metabolizing enzymes and/or the drug transporter P-glycoprotein, and participates in potential pharmacokinetic interactions with anticancer drugs. Sachar and Ma (2013) suggested that echinacea may cause herb-drug interactions through nuclear receptors (NRs) activation, resulting in NR-mediated HDIs. Parvez and Rishi (2019) warned that there exists a potential risk of herb-drug interactions leading to adverse side effects, including hepatotoxicity.
Despite the potential risks associated with echinacea use, it is still one of the most commonly used herbal remedies in the presurgical population (Tsen et al., 2000). However, the article by Chen et al. (2012) did not mention echinacea in their abstract, which focused on possible pharmacokinetic, pharmacodynamic, and herbal drug interactions occurring in the elderly.
Overall, the literature suggests that echinacea may have potential herb-drug interactions and adverse events, and caution should be exercised when using it in combination with other medications or supplements. Further research is needed to fully understand the mechanisms and clinical implications of echinacea-related herb-drug interactions.
References:
Chen, X.-W., B. Sneed, K., Pan, S.-Y., Cao, C., R. Kanwar, J., Chew, H., & Zhou, S.-F. (2012, May 1). Herb-Drug Interactions and Mechanistic and Clinical Considerations. Current Drug Metabolism. Bentham Science Publishers Ltd. http://doi.org/10.2174/1389200211209050640
Izzo, A. A., Hoon-Kim, S., Radhakrishnan, R., & Williamson, E. M. (2016, February 17). A Critical Approach to Evaluating Clinical Efficacy, Adverse Events and Drug Interactions of Herbal Remedies. Phytotherapy Research. Wiley. http://doi.org/10.1002/ptr.5591
Parvez, M. K., & Rishi, V. (2019, June 11). Herb-Drug Interactions and Hepatotoxicity. Current Drug Metabolism. Bentham Science Publishers Ltd. http://doi.org/10.2174/1389200220666190325141422
Qato, D. M., Wilder, J., Schumm, L. P., Gillet, V., & Alexander, G. C. (2016, April 1). Changes in Prescription and Over-the-Counter Medication and Dietary Supplement Use Among Older Adults in the United States, 2005 vs 2011. JAMA Internal Medicine. American Medical Association (AMA). http://doi.org/10.1001/jamainternmed.2015.8581
Sachar, M., & Ma, X. (2013, January 21). Nuclear receptors in herb–drug interactions. Drug Metabolism Reviews. Informa UK Limited. http://doi.org/10.3109/03602532.2012.753902
Sparreboom, A., Cox, M. C., Acharya, M. R., & Figg, W. D. (2004, June 15). Herbal Remedies in the United States: Potential Adverse Interactions With Anticancer Agents. Journal of Clinical Oncology. American Society of Clinical Oncology (ASCO). http://doi.org/10.1200/jco.2004.08.182
Tsen, L. C., Segal, S., Pothier, M., & Bader, A. M. (2000, July 1). Alternative Medicine Use in Presurgical Patients. Anesthesiology. Ovid Technologies (Wolters Kluwer Health). http://doi.org/10.1097/00000542-200007000-00025
Gujjarlamudi, H. (2016). Polytherapy and drug interactions in elderly. Journal of Mid-life Health. Medknow. http://doi.org/10.4103/0976-7800.191021
Using this research question: "How to optimize the demand response process using Surrogates trained by Active Learning"
I get the following error running the code in Collab:
InvalidRequestError Traceback (most recent call last)
in <cell line: 2>()
1 # Run the Literature Review
----> 2 researcher = literature_review(research_question, output_file=file)
3 researcher()7 frames
/usr/local/lib/python3.9/dist-packages/openai/api_requestor.py in _interpret_response_line(self, rbody, rcode, rheaders, stream)
677 stream_error = stream and "error" in resp.data
678 if stream_error or not 200 <= rcode < 300:
--> 679 raise self.handle_error_response(
680 rbody, rcode, resp.data, rheaders, stream_error=stream_error
681 )InvalidRequestError: This model's maximum context length is 4097 tokens. However, you requested 4255 tokens (2455 in the messages, 1800 in the completion). Please reduce the length of the messages or completion.
I guess this can be easily fixed by bounding the number of tokens in the prompt generated joining the answers (line 48 in autoresearcher/workflows/literature_review/literature_review.py), or updating dynamically the number of tokens requested to OpenAI depending on the number of tokens used in creating the question.
The Discord link https://discord.gg/PnQDR5h9 in the readme does not work!
I like your project as I am building a research GPT. Following features will add value:
Please see if you can implement them. We can also collaborate if you are ready. I am ready to code some of these features.
Cool package!
I am receiving this error:
Research question: What mutations in the N gene of sars cov 2 are involved in rapid antigen test failurre?
Auto Researcher initiated!
Generating keyword combinations...
Keyword combinations generated!
Fetching top 20 papers...
Traceback (most recent call last):
File "", line 1, in
File "/home/amar/miniconda3/envs/write-the/lib/python3.9/site-packages/autoresearcher/workflows/literature_review/literature_review.py", line 85, in literature_review
top_papers = SemanticScholar.fetch_and_sort_papers(search_query, keyword_combinations=keyword_combinations, year_range="2000-2023")
File "/home/amar/miniconda3/envs/write-the/lib/python3.9/site-packages/autoresearcher/data_sources/web_apis/semantic_scholar_loader.py", line 27, in fetch_and_sort_papers
papers.extend(self.fetch_data(combination, limit, year_range))
File "/home/amar/miniconda3/envs/write-the/lib/python3.9/site-packages/autoresearcher/data_sources/web_apis/semantic_scholar_loader.py", line 18, in fetch_data
data = self.make_request("", params=params)
File "/home/amar/miniconda3/envs/write-the/lib/python3.9/site-packages/autoresearcher/data_sources/web_apis/base_web_api_data_loader.py", line 20, in make_request
raise Exception(f"Failed to fetch data from API: {response.status_code}")
Exception: Failed to fetch data from API: 429
My code:
from autoresearcher import literature_review
research_question = "What mutations in the N gene of sars cov 2 are involved in rapid antigen test failurre?"
researcher = literature_review(research_question)
researcher = literature_review(research_question, output_file="my_literature_review.txt")
429 response suggests too many queries.
In some instances, it seems autoreseracher is reading from the same paper multiple times. Sometimes it goes as far as 5+ times. As expected, this happens more often with more specific questions that may have less literature available in the given time window.
May be useful to find a way to avoid repeating papers, even if there aren't enough to satisfy the maximum asked for. Just return something like no more papers found
and break or something.
In the following places, I found that use_gpt4=True
can be passed to change the use of the model from 3.5 to 4:
autoresearcher/llms/openai.py
autoresearcher/workflows/literature_review/combine_answers.py
autoresearcher/workflows/literature_review/extract_answers_from_papers.py
However, how do I do the same when calling literature_review
? For example:
researcher = literature_review(
research_question, output_file="answer.txt"
)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.