krmaxwell / maltrieve Goto Github PK
View Code? Open in Web Editor NEWA tool to retrieve malware directly from the source for security researchers.
License: GNU General Public License v3.0
A tool to retrieve malware directly from the source for security researchers.
License: GNU General Public License v3.0
I'm seeing exceptions for unknown url type errors for some requests. Also, once one of these exceptions is thrown, it appears that processing tries to continue in some cases, but maltrieve no longer seems to respond to Ctrl-C. I end up killing the process.
Here's an example from my latest run:
2013-03-07 17:48:04 140032845281024 Fetched URL http://213.229.106.32:8088/get/67ad970fbbc4f9b29bfeca40b0b4a54f.exe from queue
2013-03-07 17:48:05 140032845281024 urlopen() returned error [Errno 111] Connection refused
2013-03-07 17:48:05 140032845281024 Fetched URL from queue
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "./maltrieve.py", line 45, in get_malware
mal = get_URL(url)
File "/home/gnpendergast/maltrieve/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url.encode("utf8"))
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 393, in open
protocol = req.get_type()
File "/usr/lib/python2.7/urllib2.py", line 255, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type:
2013-03-07 17:48:21 140032925841152 urlopen() returned error [Errno 111] Connection refused
2013-03-07 17:48:21 140032925841152 Fetched URL from queue
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "./maltrieve.py", line 45, in get_malware
mal = get_URL(url)
File "/home/gnpendergast/maltrieve/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url.encode("utf8"))
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 393, in open
protocol = req.get_type()
File "/usr/lib/python2.7/urllib2.py", line 255, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type:
2013-03-07 17:48:22 140032934233856 urlopen() returned error [Errno 111] Connection refused
2013-03-07 17:48:22 140032934233856 Fetched URL from queue
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "./maltrieve.py", line 45, in get_malware
mal = get_URL(url)
File "/home/gnpendergast/maltrieve/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url.encode("utf8"))
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 393, in open
protocol = req.get_type()
File "/usr/lib/python2.7/urllib2.py", line 255, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type:
^C^C^C^C^C2013-03-07 17:50:27 140032836888320 Found file b6b36d3cdd4e616c74a6a1f55537a0db at URL http://91.121.28.146:8080/get/67ad970fbbc4f9b29bfeca40b0b4a54f.exe
2013-03-07 17:50:27 140032836888320 Going to put file in directory /tmp/malware
2013-03-07 17:50:27 140032836888320 Stored b6b36d3cdd4e616c74a6a1f55537a0db in /tmp/malware
2013-03-07 17:50:27 140032836888320 Fetched URL from queue
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "./maltrieve.py", line 45, in get_malware
mal = get_URL(url)
File "/home/gnpendergast/maltrieve/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url.encode("utf8"))
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 393, in open
protocol = req.get_type()
File "/usr/lib/python2.7/urllib2.py", line 255, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type:
Quit on a Ctrl-C
It'd be handy if you could include a quick setup guide (esp for deps) in your README. On Ubuntu, used:
$ sudo apt-get install libxml2-dev libxslt-dev python-dev lib32z1-dev python-dev python-lxml
$ sudo pip install beautifulsoup4
Sure would be nice to have a Docker container to make deployment easy.
Just added Maltrieve to my newly reinstalled Raspberry Pi system and having an error. It successfully runs and downloads files up to a point, but then quits and locks up my Putty ssh connection with the following (url's modified to not have a live link here):
2013-06-25 23:30:58 -1242839952 Found file e03a7f89a6cbc45144aafac2779c7b6d at U RL hxxp://educacionfinanciera.fovissste.gob.mx/elearning/materiales/install_flas hplayer.exe
2013-06-25 23:30:58 -1242839952 Going to put file in directory /tmp/malware
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(_self.__args, *_self.__kwargs)
File "maltrieve.py", line 60, in get_malware
with open(os.path.join(dumpdir, md5), 'wb') as f:
IOError: [Errno 2] No such file or directory: '/tmp/malware/e03a7f89a6cbc45144aa fac2779c7b6d'
2013-06-25 23:31:51 -1251228560 urlopen() returned error [Errno 110] Connection timed out
2013-06-25 23:31:51 -1251228560 Fetched URL hxxp://down.signkey.co.kr/olive/signkey.exe from queue
2013-06-25 23:31:54 -1251228560 Found file 208220dbe46b1a97afab4c8e0dfbd6a6 at URL hxxp://down.signkey.co.kr/olive/signkey.exe
2013-06-25 23:31:54 -1251228560 Going to put file in directory /tmp/malware
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(_self.__args, *_self.__kwargs)
File "maltrieve.py", line 60, in get_malware
with open(os.path.join(dumpdir, md5), 'wb') as f:
IOError: [Errno 2] No such file or directory: '/tmp/malware/208220dbe46b1a97afab4c8e0dfbd6a6'
I'm not sure what causes the problem, so thought I'd submit it here.
Thanks!
Ken
Did a fresh git today
sudo python maltrieve.py
Traceback (most recent call last):
File "maltrieve.py", line 23, in
import feedparser
ImportError: No module named feedparser
Traceback (most recent call last):
File "maltrieve.py", line 333, in <module>
main()
File "maltrieve.py", line 308, in main
md5 = save_malware(each, cfg['dumpdir'], ignore_list)
File "maltrieve.py", line 104, in save_malware
mime_type = magic.from_buffer(data, mime=True)
AttributeError: 'module' object has no attribute 'from_buffer'
unsure whether I have the wrong module or some other problem
Hi,
http://vxvault.siri-urz.net/URL_List.php is http errror code 500 at the time of writing.
It makes maltrieve.py tracebacks:
2014-03-26 12:09:49 -1216760128 urlopen() returned error Internal Server Error
Traceback (most recent call last):
File "maltrieve.py", line 275, in
main()
File "maltrieve.py", line 236, in main
for url in get_URL('http://vxvault.siri-urz.net/URL_List.php'):
TypeError: 'bool' object is not iterable
Cheers,
pello
Hello Good sir!
I've been using this flawlessly for a couple weeks now. I used to have an issue where it would freeze due to all the threads getting hung on a download issue or whatever so I changed the threads to 500 and haven't heard a peep from it. However, Today it's been "finishing" prematurely at this exact same spot - URL 569. I'm really not the expert here, but this specific line references issues with not knowing the type on some file from the mcbrtext feed. I tried commenting out the entire mcbrtext feed and it froze again at URL 569 referencing cleanmxtext. I commented that feed out and it and then it just freezes at URL 569. I'm not sure if this is just some file that both of these feeds are hosting that this crawler doesn't like, but if that's the case then this issues shouldn't last long as I'm sure they'll take it out of their feed after a few days.
Thanks for your help in advance!
Gel
2013-10-08 10:14:56 4036 Adding new URL to queue: http://1380677010.keaitz.com/c
hat/bosom/lb_bosom_6.exe
2013-10-08 10:14:57 4036 urlopen() returned error Gone
Traceback (most recent call last):
File "F:\Crawler\maltrieve.py", line 269, in
main()
File "F:\Crawler\maltrieve.py", line 246, in main
for url in mcbrtext.read().splitlines():
AttributeError: 'NoneType' object has no attribute 'read'
F:\Crawler>_
On a run with over 1000 URIโs I find that each thread seems to eventually cause an exception: - example below. Is the being caused by malformed URLs without HTTP:// or is this coincidental?
1. 2014-05-22 16:36:49 140342549030656 1026 items remaining in queue
2. Exception in thread Thread-1:
3. Traceback (most recent call last):
4. File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
5. self.run()
6. File "/usr/lib64/python2.6/threading.py", line 484, in run
7. self.__target(*self.__args, **self.__kwargs)
8. File "maltrieve.py", line 51, in get_malware
9. mal = get_URL(url)
10. File "/malware/maltrieve/malutil.py", line 17, in get_URL
11. response.getcode())
12. UnboundLocalError: local variable 'response' referenced before assignment
13.
14. 2014-05-22 16:37:12 140342538540800 urlopen() returned error [Errno 110] Connection timed out
15.
16. 2014-05-22 16:37:12 140342538540800 Fetched URL julieandrews.us/ from queue
17. 2014-05-22 16:37:12 140342538540800 1025 items remaining in queue
18. Exception in thread Thread-2:
19. Traceback (most recent call last):
20. File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
21. self.run()
22. File "/usr/lib64/python2.6/threading.py", line 484, in run
23. self.__target(*self.__args, **self.__kwargs)
24. File "maltrieve.py", line 51, in get_malware
25. mal = get_URL(url)
26. File "/malware/maltrieve/malutil.py", line 17, in get_URL
27. response.getcode())
28. UnboundLocalError: local variable 'response' referenced before assignment
kmaxwell@leibniz:~/src/maltrieve$ date
Thu Aug 28 11:33:45 CDT 2014
kmaxwell@leibniz:~/src/maltrieve$ tail -f maltrieve.log
2014-08-28 09:56:42 139828685420352 "GET /dmjqxshzxk HTTP/1.1" 200 281
2014-08-28 09:56:42 139828685420352 "GET /yinjingdaxiao HTTP/1.1" 200 281
2014-08-28 09:56:42 139828685420352 "GET /chzgjqvod HTTP/1.1" 200 281
2014-08-28 09:56:42 139828685420352 "GET /xxrttk HTTP/1.1" 200 281
2014-08-28 09:56:43 139828685420352 "GET /hhchrxyx HTTP/1.1" 200 281
2014-08-28 10:06:26 139828685420352 "GET /ddqs HTTP/1.1" 200 281
2014-08-28 10:06:28 139828685420352 "GET /rbavanyxat HTTP/1.1" 200 281
2014-08-28 10:06:29 139828685420352 "GET /rbdmmntpw HTTP/1.1" 200 281
2014-08-28 10:06:32 139828685420352 "GET /kbllqjllswdyhmfdy HTTP/1.1" 200 281
2014-08-28 10:06:36 139828685420352 "GET /yzhsqqvod HTTP/1.1" 200 281
^C
kmaxwell@leibniz:~/src/maltrieve$
running on Ubuntu 13.04 (running on VMware Workstation)
it downloaded just short of 500 items
and than:
2013-07-15 17:37:55 -1239418048 Fetched URL hXXp://removebugs.com/4YNN66tx.exe from queue
2013-07-15 17:37:56 -1239418048 Fetched URL hXXp://angelibo.com/1372809886_0.10307000.exe from queue
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "maltrieve.py", line 51, in get_malware
mal = get_URL(url)
File "/home/beamzer/maltrieve-master/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url.encode("utf8"))
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1187, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 373, in _read_status
raise BadStatusLine(line)
BadStatusLine: ''
followed by a couple more downloads and another:
2013-07-15 17:37:57 -1239418048 Fetched URL hXXp://host0r.net/xs.exe from queue
2013-07-15 17:37:58 -1239418048 Fetched URL hXXp://www.sineglu.it/jUnejSe.exe from queue
2013-07-15 17:37:58 -1239418048 Fetched URL hXXp://sourcehonduras.net/load/magic_school_bus_videos_online_free.exe from queue
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "maltrieve.py", line 51, in get_malware
mal = get_URL(url)
File "/home/beamzer/maltrieve-master/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url.encode("utf8"))
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1187, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1045, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 373, in _read_status
raise BadStatusLine(line)
BadStatusLine: ''
and after that it seems to hang :-(
lsof shows:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
python 10353 beamzer cwd DIR 8,1 4096 525583 /home/beamzer/maltrieve-master
python 10353 beamzer rtd DIR 8,1 4096 2 /
python 10353 beamzer txt REG 8,1 2688640 1048748 /usr/bin/python2.7
python 10353 beamzer mem REG 8,1 161156 917896 /lib/i386-linux-gnu/libexpat.so.1.6.0
python 10353 beamzer mem REG 8,1 83816 917882 /lib/i386-linux-gnu/libresolv-2.17.so
python 10353 beamzer mem REG 8,1 22100 917886 /lib/i386-linux-gnu/libnss_dns-2.17.so
python 10353 beamzer mem REG 8,1 9660 917838 /lib/libnss_mdns4_minimal.so.2
python 10353 beamzer mem REG 8,1 47080 917774 /lib/i386-linux-gnu/libnss_files-2.17.so
python 10353 beamzer mem REG 8,1 153048 917593 /lib/i386-linux-gnu/liblzma.so.5.0.0
python 10353 beamzer mem REG 8,1 537016 917908 /lib/i386-linux-gnu/libgcrypt.so.11.7.0
python 10353 beamzer mem REG 8,1 1396864 1053656 /usr/lib/i386-linux-gnu/libxml2.so.2.9.0
python 10353 beamzer mem REG 8,1 83204 1055168 /usr/lib/i386-linux-gnu/libexslt.so.0.8.16
python 10353 beamzer mem REG 8,1 243152 1055174 /usr/lib/i386-linux-gnu/libxslt.so.1.1.27
python 10353 beamzer mem REG 8,1 9664 917840 /lib/libnss_mdns4.so.2
python 10353 beamzer mem REG 8,1 58344 132814 /usr/lib/python2.7/lib-dynload/pyexpat.i386-linux-gnu.so
python 10353 beamzer mem REG 8,1 1255612 139605 /usr/lib/python2.7/dist-packages/lxml/etree.so
python 10353 beamzer mem REG 8,1 88476 131818 /usr/lib/python2.7/lib-dynload/datetime.i386-linux-gnu.so
python 10353 beamzer mem REG 8,1 350296 917993 /lib/i386-linux-gnu/libssl.so.1.0.0
python 10353 beamzer mem REG 8,1 1734784 917994 /lib/i386-linux-gnu/libcrypto.so.1.0.0
python 10353 beamzer mem REG 8,1 26256 1182510 /usr/lib/i386-linux-gnu/gconv/gconv-modules.cache
python 10353 beamzer mem REG 8,1 48328 132825 /usr/lib/python2.7/lib-dynload/_json.i386-linux-gnu.so
python 10353 beamzer mem REG 8,1 2932160 1180296 /usr/lib/locale/locale-archive
python 10353 beamzer mem REG 8,1 267816 917771 /lib/i386-linux-gnu/libm-2.17.so
python 10353 beamzer mem REG 8,1 1770984 917776 /lib/i386-linux-gnu/libc-2.17.so
python 10353 beamzer mem REG 8,1 95764 917900 /lib/i386-linux-gnu/libz.so.1.2.7
python 10353 beamzer mem REG 8,1 9816 917883 /lib/i386-linux-gnu/libutil-2.17.so
python 10353 beamzer mem REG 8,1 13856 917880 /lib/i386-linux-gnu/libdl-2.17.so
python 10353 beamzer mem REG 8,1 124637 917784 /lib/i386-linux-gnu/libpthread-2.17.so
python 10353 beamzer mem REG 8,1 13644 918512 /lib/i386-linux-gnu/libgpg-error.so.0.8.0
python 10353 beamzer mem REG 8,1 32788 132671 /usr/lib/python2.7/lib-dynload/_ssl.i386-linux-gnu.so
python 10353 beamzer mem REG 8,1 15336 132662 /usr/lib/python2.7/lib-dynload/_hashlib.i386-linux-gnu.so
python 10353 beamzer mem REG 8,1 134376 917884 /lib/i386-linux-gnu/ld-2.17.so
python 10353 beamzer 0u CHR 136,1 0t0 4 /dev/pts/1
python 10353 beamzer 1u CHR 136,1 0t0 4 /dev/pts/1
python 10353 beamzer 2u CHR 136,1 0t0 4 /dev/pts/1
python 10353 beamzer 3u IPv4 1273846 0t0 TCP 192.168.253.128:33955->61.187.182.21:http (ESTABLISHED)
python 10353 beamzer 4u IPv4 1272841 0t0 TCP 192.168.253.128:46383->222.186.33.73:http (ESTABLISHED)
Via email:
Im trying to run maltrieve on my raspberry, worked fine before update and lxml but now i get an error, looks like this:
root@raspberrypi:/home/maltrieve# python maltrieve.py
Traceback (most recent call last):
File "maltrieve.py", line 39, in <module>
from lxml import etree
ImportError: /usr/local/lib/python2.7/dist-packages/lxml/etree.so: undefined symbol: clock_gettime
I have installed python-lxml_2.3.2-1_armhf.deb with apt-get install python-lxml
and i have beautifulsoup4 installed.
2013-10-19 08:41:51 139698606126848 470 items remaining in queue
2013-10-19 08:41:51 139698698569536 urlopen() returned error Not Found
2013-10-19 08:41:52 139698698569536 urlopen() returned error [Errno -2] Name or service not known
Traceback (most recent call last):
File "maltrieve.py", line 264, in <module>
main()
File "maltrieve.py", line 240, in main
urlquerysoup=BeautifulSoup(urlquerytext)
File "/usr/local/lib/python2.7/dist-packages/bs4/__init__.py", line 162, in __init__
elif len(markup) <= 256:
TypeError: object of type 'NoneType' has no len()
user@host:/opt/maltrieve$
2013-07-28 17:29:30 140439226496768 Found file c69f3c87af23eabc25c7f820b89c2a0d at URL http://yaishhu.ru/download/AlawarUniversalCrack2012.exe
2013-07-28 17:29:30 140439226496768 Going to put file in directory /tmp/malware
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(_self.__args, *_self.__kwargs)
File "maltrieve.py", line 60, in get_malware
with open(os.path.join(dumpdir, md5), 'wb') as f:
IOError: [Errno 2] No such file or directory: '/tmp/malware/c69f3c87af23eabc25c7f820b89c2a0d'
2013-07-28 17:29:44 140439234889472 Found file d54ada4546030349eede3bf40f4d889b at URL http://yourtube.eb2a.com/Wan.gif
2013-07-28 17:29:44 140439234889472 Going to put file in directory /tmp/malware
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(_self.__args, *_self.__kwargs)
File "maltrieve.py", line 60, in get_malware
with open(os.path.join(dumpdir, md5), 'wb') as f:
IOError: [Errno 2] No such file or directory: '/tmp/malware/d54ada4546030349eede3bf40f4d889b'
2013-07-28 17:29:52 140439218104064 urlopen() returned error [Errno -2] Name or service not known
2013-07-28 17:29:52 140439218104064 Fetched URL http://ximxamli.ru/angrim2.exe from queue
2013-07-28 17:29:52 140439209711360 urlopen() returned error [Errno -2] Name or service not known
2013-07-28 17:29:52 140439209711360 Fetched URL http://ximxamli.ru/kecik02.exe from queue
2013-07-28 17:30:13 140439218104064 urlopen() returned error [Errno -2] Name or service not known
2013-07-28 17:30:13 140439218104064 Fetched URL http://ximxamli.ru/jabinv1.exe from queue
2013-07-28 17:30:13 140439209711360 urlopen() returned error [Errno -2] Name or service not known
2013-07-28 17:30:13 140439209711360 Fetched URL http://ximxamli.ru/angrim2.exe from queue
2013-07-28 17:30:33 140439218104064 urlopen() returned error [Errno -2] Name or service not known
2013-07-28 17:30:33 140439218104064 Fetched URL http://ximxamli.ru/inkr001.exe from queue
2013-07-28 17:30:33 140439209711360 urlopen() returned error [Errno -2] Name or service not known
2013-07-28 17:30:33 140439209711360 Fetched URL http://ximxamli.ru/jabinv1.exe from queue
2013-07-28 17:30:53 140439209711360 urlopen() returned error [Errno -2] Name or service not known
2013-07-28 17:30:53 140439209711360 Fetched URL http://ximxamli.ru/inkr001.exe from queue
2013-07-28 17:30:53 140439218104064 urlopen() returned error [Errno -2] Name or service not known
2013-07-28 17:30:53 140439218104064 Fetched URL http://ximirsex.ru/rasta01.exe from queue
2013-07-28 17:31:14 140439209711360 urlopen() returned error [Errno -2] Name or service not known
2013-07-28 17:31:14 140439209711360 Fetched URL http://xia.57cx.com/setupp_005.exe from queue
2013-07-28 17:31:14 140439218104064 urlopen() returned error [Errno -2] Name or service not known
2013-07-28 17:31:14 140439218104064 Fetched URL http://x.uzzf.com/Hash.exe from queue
2013-07-28 17:31:18 140439218104064 Found file aaad24486871657504efeaf56600f3cb at URL http://x.uzzf.com/Hash.exe
2013-07-28 17:31:18 140439218104064 Going to put file in directory /tmp/malware
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(_self.__args, *_self.__kwargs)
File "maltrieve.py", line 60, in get_malware
with open(os.path.join(dumpdir, md5), 'wb') as f:
IOError: [Errno 2] No such file or directory: '/tmp/malware/aaad24486871657504efeaf56600f3cb'
2013-07-28 17:37:09 140439209711360 Found file b0f28c542f727314ee918a7883a25249 at URL http://xia.57cx.com/setupp_005.exe
2013-07-28 17:37:09 140439209711360 Going to put file in directory /tmp/malware
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(_self.__args, *_self.__kwargs)
File "maltrieve.py", line 60, in get_malware
with open(os.path.join(dumpdir, md5), 'wb') as f:
IOError: [Errno 2] No such file or directory: '/tmp/malware/b0f28c542f727314ee918a7883a25249'
Send HTML pages to thug for analysis.
Add http://www.nictasoft.com/ace/malware-urls/ as a source
IOError: [Errno 24] Too many open files: 'urls.json
It seems that the process is running too many simultaneous file downloads?
This can be fixed with extending /etc/security/limits.conf to 5000 current is:
$ ulimit -n = 1000
maybe queuing system or making sure files are closed after download?
Allow user to specify a file for input (perhaps of URLs to fetch and process).
Check the public block lists from Select Real Security for possible other sources.
The one thing I'd most love to see added, even knowing it's a huge pain, would be pulling samples from Contagio. I know it wouldn't be easy, but this is the single most valuable public source I know.
Feel free to punt this back to me, I don't mean to too much, I can submit a pull request like a big boy, but I wanted it on the list in case someone else is more ambitious than I am before the weekend.
๐ great tool! Definitely going on one of my VPSs soon.
Add http://support.clean-mx.de/clean-mx/xmlviruses.php? as a source.
Samples sometimes stored in working directory, not dump directory. Need repro steps.
Use setuptools or similar.
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "maltrieve.py", line 46, in get_malware
mal = get_URL(url)
File "/home/ubuntu/maltrieve/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1174, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib/python2.7/httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 790, in send
self.sock.sendall(data)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc3' in position 40: ordinal not in range(128)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "maltrieve.py", line 46, in get_malware
mal = get_URL(url)
File "/home/ubuntu/maltrieve/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1174, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib/python2.7/httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 790, in send
self.sock.sendall(data)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 40-42: ordinal not in range(128)
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "maltrieve.py", line 46, in get_malware
mal = get_URL(url)
File "/home/ubuntu/maltrieve/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1174, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib/python2.7/httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 790, in send
self.sock.sendall(data)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 30: ordinal not in range(128)
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "maltrieve.py", line 46, in get_malware
mal = get_URL(url)
File "/home/ubuntu/maltrieve/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 400, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 418, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1174, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/usr/lib/python2.7/httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 790, in send
self.sock.sendall(data)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc3' in position 33: ordinal not in range(128)
Testing things out the maltrieve.cfg
didn't seem to successfully set all options. Specifically it causes an error in generating the dumpdir, and it always defaults to /tmp/malware
.
Am I missing something?
When processing is done, does not exit cleanly.
support for database insertion of data from the sources
Add http://www.malware.com.br/cgi/submit?action=list as a source.
I currently didn't see an option to specify the file type PE file(.exe, .dll), images(.png,) and document(.doc, .pdf). It would be useful to retrieve only a certain type of malware based on command line arg. I am working on it and will submit a pull request.
Update docs to show required versions of Python and any dependencies. (Port to Python 3.x?)
Please add possibility to stop downloading with CTRL+C in console. Also it would be good to write log in working directory by default. Thanks, Sl8v
via email
Rather than just store our data in pickled objects, investigate using an embedded DB like SQLite3 for storing hashes and URLs.
In case it's not running locally.
lxml is not playing well with requests. Data incoming.
Command-line options are useful in testing and trying new things, but for general usage we should have the ability to use a configuration file.
If not stuffing into a DB directly, sort into bins by file type.
Rather than just dump into a local directory, integrate with a dedicated malware database (e.g. VxCage or similar).
When pulling down malware, we should also log all relevant headers and similar metadata from the distributor for later analysis.
Add http://malwareurls.joxeankoret.com/normal.txt as a source
When trying to run with Python 2.7.6 with command line of only ./maltrieve.py, I get this output:
./maltrieve.py: line 19: import: command not found
./maltrieve.py: line 20: import: command not found
./maltrieve.py: line 21: import: command not found
./maltrieve.py: line 22: import: command not found
./maltrieve.py: line 23: import: command not found
./maltrieve.py: line 24: import: command not found
./maltrieve.py: line 25: import: command not found
./maltrieve.py: line 26: import: command not found
./maltrieve.py: line 27: import: command not found
./maltrieve.py: line 28: import: command not found
./maltrieve.py: line 29: import: command not found
./maltrieve.py: line 30: import: command not found
./maltrieve.py: line 31: import: command not found
from: can't read /var/mail/MultiPartForm
from: can't read /var/mail/threading
from: can't read /var/mail/Queue
from: can't read /var/mail/lxml
from: can't read /var/mail/bs4
./maltrieve.py: line 41: syntax error near unexpected token (' ./maltrieve.py: line 41:
def get_malware(q, dumpdir):'
If I run it as python ./maltrieve.py I get:
File "./maltrieve.py", line 171
global config = ConfigParser.ConfigParser()
^
SyntaxError: invalid syntax
Handle exceptions like the following:
2013-08-10 20:34:53 -1270569872 6003 items remaining in queue
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "maltrieve.py", line 50, in get_malware
mal = get_URL(url)
File "/home/pi/maltrieve-master/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url.encode("utf8"))
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 401, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 419, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1211, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.7/socket.py", line 447, in readline
data = self._sock.recv(self._rbufsize)
error: [Errno 104] Connection reset by peer
Exception in thread Thread-5:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "maltrieve.py", line 50, in get_malware
mal = get_URL(url)
File "/home/pi/maltrieve-master/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url.encode("utf8"))
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 401, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 419, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1211, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.7/socket.py", line 447, in readline
data = self._sock.recv(self._rbufsize)
error: [Errno 104] Connection reset by peer
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "maltrieve.py", line 50, in get_malware
mal = get_URL(url)
File "/home/pi/maltrieve-master/malutil.py", line 7, in get_URL
response = urllib2.urlopen(url.encode("utf8"))
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 401, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 419, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1211, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1034, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.7/socket.py", line 447, in readline
data = self._sock.recv(self._rbufsize)
error: [Errno 104] Connection reset by peer
Hi, any help would be great.
I just pulled down a new build today and when I run the following command.
"sudo python maltrieve.py"
I get the follow error
File "maltrieve.py", line 171
global config = ConfigParser.ConfigParser()
^
SyntaxError: invalid syntax
I'm running on Ubuntu 1404 LTS Desktop
my install process is
sudo apt-get install libxml2-dev libxslt-dev python-dev lib32z1-dev python-dev python-lxml
sudo apt-get install python-pip
sudo pip install beautifulsoup4
sudo apt-get install git-core
sudo git clone https://github.com/technoskald/maltrieve.git
Thank you,
Add http://urlquery.net as a source.
is there a reason i keep getting these exceptions when running the tool.
Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(_self.__args, *_self.__kwargs)
File "/usr/local/bin/maltrieve.py", line 52, in get_malware
mal = get_URL(url)
File "/usr/local/maltrieve/malutil.py", line 17, in get_URL
response.getcode())
UnboundLocalError: local variable 'response' referenced before assignment
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.