Coder Social home page Coder Social logo

mentatpsi / osgenome Goto Github PK

View Code? Open in Web Editor NEW
115.0 7.0 17.0 5.89 MB

An Open Source Web Application for Genetic Data (SNPs) using 23AndMe and Data Crawling Technologies

License: GNU General Public License v3.0

Python 82.73% HTML 17.27%
23andme snps genome data-crawling python genetic-data kendo flask genetics snpedia

osgenome's People

Contributors

dependabot[bot] avatar dgrahn avatar mentatpsi avatar sangaman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

osgenome's Issues

`SNPedia/DataCrawler.py` fails to create `data` directory

When running the first step, it complains about a missing file/directory:

# python3 SNPedia/DataCrawler.py -f /Users/me/Downloads/genome_my_stuff_here.txt
Traceback (most recent call last):
  File "SNPedia/DataCrawler.py", line 139, in <module>
    personal = PersonalData(args["filepath"])
  File "/Users/me/code/OSGenome/SNPedia/GenomeImporter.py", line 10, in __init__
    self.export()
  File "/Users/me/code/OSGenome/SNPedia/GenomeImporter.py", line 27, in export
    with open(filepath, "w") as jsonfile:
FileNotFoundError: [Errno 2] No such file or directory: './data/snpDict.json'

Everything works fine if I manually run mkdir data first.

Continued Crawl

As software currently stands, it does not continue crawl from last known position. A simple fix would be a settings file that can be edited programmatically or a settings class that has export functionality that attains the last known cmcontinue from the crawl and updating the crawler to examine the value and only begin iterarion addition upon reaching last known point. This will allow for a gradual expansion of content with less interaction.

PPrint error

File "DataCrawler.py", line 49, in initcrawl
pp.pprint(self.rsidDict)
File "C:\Python34\lib\pprint.py", line 139, in pprint
self._format(object, self._stream, 0, 0, {}, 0)
File "C:\Python34\lib\pprint.py", line 193, in _format
allowance + 1, context, level)
File "C:\Python34\lib\pprint.py", line 187, in _format
allowance + 1, context, level)
File "C:\Python34\lib\pprint.py", line 268, in _format
write(rep)
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xae' in position 20
: character maps to undefined

Error doesn't impact overall script performance. Prevents print of JSON output, but overall is fine.

Open file broken (macos)

PLease take a look at what happens when I try to open the GUI on macos:

Please note that the error appears automatically and there is no way to really select a file or folder.

After I press the ok button applications prints few rs numbers and quits.

$ python3 SNPedia/Datacrawler_GUI.py
2021-11-24 14:18:01.726 python3[45627:9115062] ApplePersistence=NO
2021-11-24 14:18:02.346 python3[45627:9115062] *** Assertion failure in -[NSOpenPanel beginServicePanel:asyncExHandler:], NSVBOpenAndSavePanels.m:1907
2021-11-24 14:18:02.351 python3[45627:9115062] -[NSSavePanel beginWithCompletionHandler:]_block_invoke caught non-fatal NSInternalInconsistencyException '<NSOpenPanel: 0x7f78ccfea2f0> is attempting to advance this Open/Save panel to run phase while another self.advanceToRunPhaseCompletionHandler is in waiting for a previous attempt. An Open/Save panel cannot start to advance more than once.' with user dictionary {
    NSAssertFile = "NSVBOpenAndSavePanels.m";
    NSAssertLine = 1907;
} and backtrace (
	0   CoreFoundation                      0x00007ff81b1f1e5b __exceptionPreprocess + 242
	1   libobjc.A.dylib                     0x00007ff81af52b9d objc_exception_throw + 48
	2   Foundation                          0x00007ff81c0a8653 -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 267
	3   AppKit                              0x00007ff81e5ab269 -[NSSavePanel beginServicePanel:asyncExHandler:] + 475
	4   AppKit                              0x00007ff81e5ac51d -[NSSavePanel runModal] + 297
	5   libtk8.6.dylib                      0x0000000102343f53 showOpenSavePanel + 171
	6   libtk8.6.dylib                      0x0000000102343843 Tk_GetOpenFileObjCmd + 2001
	7   libtcl8.6.dylib                     0x0000000101fe49f4 TclNRRunCallbacks + 79
	8   _tkinter.cpython-310-darwin.so      0x0000000101983a60 Tkapp_Call + 480
	9   python3.10                          0x00000001009939f2 cfunction_call + 130
	10  python3.10                          0x000000010094f26c _PyObject_Call + 140
	11  python3.10                          0x0000000100a38241 _PyEval_EvalFrameDefault + 28689
	12  python3.10                          0x0000000100a31119 _PyEval_Vector + 137
	13  python3.10                          0x0000000100a3b354 call_function + 420
	14  python3.10                          0x0000000100a37db1 _PyEval_EvalFrameDefault + 27521
	15  python3.10                          0x0000000100a31119 _PyEval_Vector + 137
	16  python3.10                          0x0000000100a3b354 call_function + 420
	17  python3.10                          0x0000000100a37f03 _PyEval_EvalFrameDefault + 27859
	18  python3.10                          0x0000000100a31119 _PyEval_Vector + 137
	19  python3.10                          0x0000000100a31071 PyEval_EvalCode + 129
	20  python3.10                          0x0000000100a869db _PyRun_SimpleFileObject + 875
	21  python3.10                          0x0000000100a8648e _PyRun_AnyFileObject + 126
	22  python3.10                          0x0000000100aa6beb Py_RunMain + 2075
	23  python3.10                          0x0000000100aa70f3 pymain_main + 403
	24  python3.10                          0x0000000100aa714b Py_BytesMain + 43
	25  dyld                                0x00000001088e64fe start + 462
	26  ???                                 0x0000000000000000 0x0 + 0
	27  python3.10                          0x0000000100908000 __dso_handle + 0

Kendo Grid - Variation Highlighting

When Crawling of individual's Genome Data has been completed... an ideal feature will bold or format the specific variation of the individual to provide focus.

Extra data in json

When I run the import

# python3 SNPedia/DataCrawler.py -f /opt/genome.txt

I get this

page|5253393939393433|13372
122
122
['rs7999075', 'rs878854938', 'rs587783831', 'rs796053466', 'rs121918680', 'rs34695944', 'rs202198133', 'rs193922432', 'rs4652795', 'rs10444502']
Traceback (most recent call last):
  File "SNPedia/DataCrawler.py", line 165, in <module>
    dfCrawl = SNPCrawl(rsids=rsid, filepath=filepath)
  File "SNPedia/DataCrawler.py", line 22, in __init__
    self.importDict(filepath)
  File "SNPedia/DataCrawler.py", line 110, in importDict
    self.rsidDict = json.load(jsonfile)
  File "/usr/local/lib/python3.8/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.8/json/decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 670356 (char 670355)

Crashes after building ~20,000 snps

I've created two different files and about the same size it starts crashing with the following:

USERCOMPUTER:OSGenome-master USER$ Python3 SNPedia/DataCrawler.py -f /USER/OSGenome-master/genome_AnoM1_v4_Full.txt
page|****************|*****
Traceback (most recent call last):
File "SNPedia/DataCrawler.py", line 148, in
sp = GrabSNPs(crawllimit=60, snpsofinterest=snpsofinterest, target=100)
File "/USER/OSGenome-master/SNPedia/SNPGen.py", line 21, in init
self.crawl(snpsofinterest=snpsofinterest, cmcontinue=cmcontinue, target=target)
File "/USER/OSGenome-master/SNPedia/SNPGen.py", line 44, in crawl
cmcontinue = jd["query-continue"]["categorymembers"]["cmcontinue"]
KeyError: 'query-continue'

AttributeError: module 'collections' has no attribute 'Callable'

When I try to run DataCrawler.py on my recently downloaded 23andme genome, I get this error.

Traceback (most recent call last):
  File "/home/danielmcnally/github/OSGenome/SNPedia/DataCrawler.py", line 168, in <module>
    dfCrawl = SNPCrawl(rsids=rsid)
  File "/home/danielmcnally/github/OSGenome/SNPedia/DataCrawler.py", line 35, in __init__
    self.initcrawl(rsids)
  File "/home/danielmcnally/github/OSGenome/SNPedia/DataCrawler.py", line 43, in initcrawl
    self.grabTable(rsid)
  File "/home/danielmcnally/github/OSGenome/SNPedia/DataCrawler.py", line 63, in grabTable
    bs = BeautifulSoup(html, "html.parser")
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/__init__.py", line 228, in __init__
    self._feed()
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/__init__.py", line 289, in _feed
    self.builder.feed(self.markup)
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/builder/_htmlparser.py", line 167, in feed
    parser.feed(markup)
  File "/usr/lib/python3.10/html/parser.py", line 110, in feed
    self.goahead(0)
  File "/usr/lib/python3.10/html/parser.py", line 178, in goahead
    k = self.parse_html_declaration(i)
  File "/usr/lib/python3.10/html/parser.py", line 269, in parse_html_declaration
    self.handle_decl(rawdata[i+2:gtpos])
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/builder/_htmlparser.py", line 112, in handle_decl
    self.soup.endData(Doctype)
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/__init__.py", line 365, in endData
    self.object_was_parsed(o)
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/__init__.py", line 370, in object_was_parsed
    previous_element = most_recent_element or self._most_recent_element
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 1040, in __getattr__
    return self.find(tag)
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 1278, in find
    l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 1299, in find_all
    return self._find_all(name, attrs, text, limit, generator, **kwargs)
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 528, in _find_all
    strainer = SoupStrainer(name, attrs, text, **kwargs)
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 1596, in __init__
    self.text = self._normalize_search_value(text)
  File "/home/danielmcnally/github/OSGenome/venv/lib/python3.10/site-packages/bs4/element.py", line 1601, in _normalize_search_value
    if (isinstance(value, str) or isinstance(value, collections.Callable) or hasattr(value, 'match')
AttributeError: module 'collections' has no attribute 'Callable'

Suggestion: Add options to show all variants

I cheated and added options to display up to 100,000 variants to my own HTML page as I wanted to be able to view all at once. Very nice project.

pageSizes: [25, 50, 100, 250, 500, 1000, 5000, 10000, 50000, 100000],

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.