mentatpsi / osgenome Goto Github PK
View Code? Open in Web Editor NEWAn Open Source Web Application for Genetic Data (SNPs) using 23AndMe and Data Crawling Technologies
License: GNU General Public License v3.0
An Open Source Web Application for Genetic Data (SNPs) using 23AndMe and Data Crawling Technologies
License: GNU General Public License v3.0
When running the first step, it complains about a missing file/directory:
# python3 SNPedia/DataCrawler.py -f /Users/me/Downloads/genome_my_stuff_here.txt
Traceback (most recent call last):
File "SNPedia/DataCrawler.py", line 139, in <module>
personal = PersonalData(args["filepath"])
File "/Users/me/code/OSGenome/SNPedia/GenomeImporter.py", line 10, in __init__
self.export()
File "/Users/me/code/OSGenome/SNPedia/GenomeImporter.py", line 27, in export
with open(filepath, "w") as jsonfile:
FileNotFoundError: [Errno 2] No such file or directory: './data/snpDict.json'
Everything works fine if I manually run mkdir data
first.
As software currently stands, it does not continue crawl from last known position. A simple fix would be a settings file that can be edited programmatically or a settings class that has export functionality that attains the last known cmcontinue from the crawl and updating the crawler to examine the value and only begin iterarion addition upon reaching last known point. This will allow for a gradual expansion of content with less interaction.
File "DataCrawler.py", line 49, in initcrawl
pp.pprint(self.rsidDict)
File "C:\Python34\lib\pprint.py", line 139, in pprint
self._format(object, self._stream, 0, 0, {}, 0)
File "C:\Python34\lib\pprint.py", line 193, in _format
allowance + 1, context, level)
File "C:\Python34\lib\pprint.py", line 187, in _format
allowance + 1, context, level)
File "C:\Python34\lib\pprint.py", line 268, in _format
write(rep)
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xae' in position 20
: character maps to undefined
Error doesn't impact overall script performance. Prevents print of JSON output, but overall is fine.
PLease take a look at what happens when I try to open the GUI on macos:
Please note that the error appears automatically and there is no way to really select a file or folder.
After I press the ok button applications prints few rs numbers and quits.
$ python3 SNPedia/Datacrawler_GUI.py
2021-11-24 14:18:01.726 python3[45627:9115062] ApplePersistence=NO
2021-11-24 14:18:02.346 python3[45627:9115062] *** Assertion failure in -[NSOpenPanel beginServicePanel:asyncExHandler:], NSVBOpenAndSavePanels.m:1907
2021-11-24 14:18:02.351 python3[45627:9115062] -[NSSavePanel beginWithCompletionHandler:]_block_invoke caught non-fatal NSInternalInconsistencyException '<NSOpenPanel: 0x7f78ccfea2f0> is attempting to advance this Open/Save panel to run phase while another self.advanceToRunPhaseCompletionHandler is in waiting for a previous attempt. An Open/Save panel cannot start to advance more than once.' with user dictionary {
NSAssertFile = "NSVBOpenAndSavePanels.m";
NSAssertLine = 1907;
} and backtrace (
0 CoreFoundation 0x00007ff81b1f1e5b __exceptionPreprocess + 242
1 libobjc.A.dylib 0x00007ff81af52b9d objc_exception_throw + 48
2 Foundation 0x00007ff81c0a8653 -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 267
3 AppKit 0x00007ff81e5ab269 -[NSSavePanel beginServicePanel:asyncExHandler:] + 475
4 AppKit 0x00007ff81e5ac51d -[NSSavePanel runModal] + 297
5 libtk8.6.dylib 0x0000000102343f53 showOpenSavePanel + 171
6 libtk8.6.dylib 0x0000000102343843 Tk_GetOpenFileObjCmd + 2001
7 libtcl8.6.dylib 0x0000000101fe49f4 TclNRRunCallbacks + 79
8 _tkinter.cpython-310-darwin.so 0x0000000101983a60 Tkapp_Call + 480
9 python3.10 0x00000001009939f2 cfunction_call + 130
10 python3.10 0x000000010094f26c _PyObject_Call + 140
11 python3.10 0x0000000100a38241 _PyEval_EvalFrameDefault + 28689
12 python3.10 0x0000000100a31119 _PyEval_Vector + 137
13 python3.10 0x0000000100a3b354 call_function + 420
14 python3.10 0x0000000100a37db1 _PyEval_EvalFrameDefault + 27521
15 python3.10 0x0000000100a31119 _PyEval_Vector + 137
16 python3.10 0x0000000100a3b354 call_function + 420
17 python3.10 0x0000000100a37f03 _PyEval_EvalFrameDefault + 27859
18 python3.10 0x0000000100a31119 _PyEval_Vector + 137
19 python3.10 0x0000000100a31071 PyEval_EvalCode + 129
20 python3.10 0x0000000100a869db _PyRun_SimpleFileObject + 875
21 python3.10 0x0000000100a8648e _PyRun_AnyFileObject + 126
22 python3.10 0x0000000100aa6beb Py_RunMain + 2075
23 python3.10 0x0000000100aa70f3 pymain_main + 403
24 python3.10 0x0000000100aa714b Py_BytesMain + 43
25 dyld 0x00000001088e64fe start + 462
26 ??? 0x0000000000000000 0x0 + 0
27 python3.10 0x0000000100908000 __dso_handle + 0
The export to excel function only exports the content of the first page, so max. 250 entries. Could this be changed? Thank you!
Dennis
When Crawling of individual's Genome Data has been completed... an ideal feature will bold or format the specific variation of the individual to provide focus.
When I run the import
# python3 SNPedia/DataCrawler.py -f /opt/genome.txt
I get this
page|5253393939393433|13372
122
122
['rs7999075', 'rs878854938', 'rs587783831', 'rs796053466', 'rs121918680', 'rs34695944', 'rs202198133', 'rs193922432', 'rs4652795', 'rs10444502']
Traceback (most recent call last):
File "SNPedia/DataCrawler.py", line 165, in <module>
dfCrawl = SNPCrawl(rsids=rsid, filepath=filepath)
File "SNPedia/DataCrawler.py", line 22, in __init__
self.importDict(filepath)
File "SNPedia/DataCrawler.py", line 110, in importDict
self.rsidDict = json.load(jsonfile)
File "/usr/local/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 670356 (char 670355)
Seems like you must change results per page on the bottom left section to refresh all the contents. Seems an issue with Kendo Grid itself.
Is there additional data that needs to be downloaded from somewhere else?
I am using a test data set from: https://my.pgp-hms.org/public_genetic_data?data_type=23andMe
and get this error:
FileNotFoundError: [Errno 2] No such file or directory: './data/snpDict.json'
when running this command:
python SNPedia/DataCrawler.py -f /Users/jjv5/Desktop/osgenome/genome_Daniel_Munro_Full_20141013190727.txt
Was testing with SNPedia sample file and came across error. Raise TypeError(repr(o) + " is not JSON serializable"). Rendering rsidDict partially incomplete.
I've created two different files and about the same size it starts crashing with the following:
USERCOMPUTER:OSGenome-master USER$ Python3 SNPedia/DataCrawler.py -f /USER/OSGenome-master/genome_AnoM1_v4_Full.txt
page|****************|*****
Traceback (most recent call last):
File "SNPedia/DataCrawler.py", line 148, in
sp = GrabSNPs(crawllimit=60, snpsofinterest=snpsofinterest, target=100)
File "/USER/OSGenome-master/SNPedia/SNPGen.py", line 21, in init
self.crawl(snpsofinterest=snpsofinterest, cmcontinue=cmcontinue, target=target)
File "/USER/OSGenome-master/SNPedia/SNPGen.py", line 44, in crawl
cmcontinue = jd["query-continue"]["categorymembers"]["cmcontinue"]
KeyError: 'query-continue'
Work towards crawling all of data contained in raw data dump of 23AndMe
When I try to run DataCrawler.py on my recently downloaded 23andme genome, I get this error.
Traceback (most recent call last):
File "/home/danielmcnally/github/OSGenome/SNPedia/DataCrawler.py", line 168, in <module>
dfCrawl = SNPCrawl(rsids=rsid)
File "/home/danielmcnally/github/OSGenome/SNPedia/DataCrawler.py", line 35, in __init__
self.initcrawl(rsids)
File "/home/danielmcnally/github/OSGenome/SNPedia/DataCrawler.py", line 43, in initcrawl
self.grabTable(rsid)
File "/home/danielmcnally/github/OSGenome/SNPedia/DataCrawler.py", line 63, in grabTable
bs = BeautifulSoup(html, "html.parser")
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/__init__.py", line 228, in __init__
self._feed()
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/__init__.py", line 289, in _feed
self.builder.feed(self.markup)
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/builder/_htmlparser.py", line 167, in feed
parser.feed(markup)
File "/usr/lib/python3.10/html/parser.py", line 110, in feed
self.goahead(0)
File "/usr/lib/python3.10/html/parser.py", line 178, in goahead
k = self.parse_html_declaration(i)
File "/usr/lib/python3.10/html/parser.py", line 269, in parse_html_declaration
self.handle_decl(rawdata[i+2:gtpos])
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/builder/_htmlparser.py", line 112, in handle_decl
self.soup.endData(Doctype)
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/__init__.py", line 365, in endData
self.object_was_parsed(o)
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/__init__.py", line 370, in object_was_parsed
previous_element = most_recent_element or self._most_recent_element
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 1040, in __getattr__
return self.find(tag)
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 1278, in find
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 1299, in find_all
return self._find_all(name, attrs, text, limit, generator, **kwargs)
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 528, in _find_all
strainer = SoupStrainer(name, attrs, text, **kwargs)
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 1596, in __init__
self.text = self._normalize_search_value(text)
File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 1601, in _normalize_search_value
if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
AttributeError: module 'collections' has no attribute 'Callable'
I cheated and added options to display up to 100,000 variants to my own HTML page as I wanted to be able to view all at once. Very nice project.
pageSizes: [25, 50, 100, 250, 500, 1000, 5000, 10000, 50000, 100000],
Future feature allowing grouping based on categories ranging from Preventive to Cognition to Aging
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.