Comments (10)
Hey,
I can answer the question above. This really isn't about an IOC (it is an IOC) but more about obfuscation.
It's an obfuscated url.
‘FhFtFtp://cFa.tFrFadeFlaFtFinosF.Fco/jFsF90F.FbinF?’
= http://ca.tradelatinos.co/js90.bin?
from iocextract.
Some unicode issues, looks like the regex needs tightened:
https://secure.comodo.net/CPS0C��U���<0:08�6�4�2http://crl.comodoca.com/COMODORSACodeSigningCA.crl0t+�����h0f0>+��0��2http://crt.comodoca.com/COMODORSACodeSigningCA.crt0$+��0���http://ocsp.comodoca.com0���U����0���[email protected]
http://crl.comodoca.com/COMODORSACertificationAuthority.crl0q+�����e0c0;+��0��/http://crt.comodoca.com/COMODORSAAddTrustCA.crt0$+��0���http://ocsp.comodoca.com0
https://www.digicert.com/CPS0��d+����0��V���RAny
http://crl3.digicert.com/DigiCertAssuredIDCA-1.crl08�6�4�2http://crl4.digicert.com/DigiCertAssuredIDCA-1.crl0w+�����k0i0$+��0���http://ocsp.digicert.com0A+��0��5http://cacerts.digicert.com/DigiCertAssuredIDCA-1.crt0
http://www.digicert.com/ssl-cps-repository.htm0��d+����0��V���RAny
http://ocsp.digicert.com0C+��0��7http://cacerts.digicert.com/DigiCertAssuredIDRootCA.crt0����U���z0x0:�8�6�4http://crl3.digicert.com/DigiCertAssuredIDRootCA.crl0:�8�6�4http://crl4.digicert.com/DigiCertAssuredIDRootCA.crl0���U�������+������ߢ�W
from iocextract.
Similar to "Extracts part of the match as a second URL" cases above:
185.189.58[.]222
Extracts as:
http://58.222
from iocextract.
Some more information on some of the bugs we're seeing here:
Actual output | Expected output | Bug description |
---|---|---|
http:// NOTICE |
None | Not sure if we can fix this, it does match the regex. |
https://redacted.sf-api.eu/</BaseUrl |
https://redacted.sf-api.eu/ |
See if we can get this working with the existing punctuation filter |
https://ln.sync[.]com/dl/f6772eb20/d8yt6kez-9q7eef3m-ai27ebms-8zcufi5f (Please |
https://ln.sync[.]com/dl/f6772eb20/d8yt6kez-9q7eef3m-ai27ebms-8zcufi5f |
Extra cruft after the URL |
http://as rsafinderfirewall[.]com/Es3tC0deR3name.exe): |
http://rsafinderfirewall[.]com/Es3tC0deR3name.exe |
Unicode space (\xa0) should end the URL; end punctuation not being stripped |
http://domain rsafinderfirewall[.]com |
http://rsafinderfirewall[.]com |
Unicode space should end the URL |
http://example,\xa0c0pywins.is-not-certified[.]com |
http://c0pywins.is-not-certified[.]com |
Unicode space should end the URL |
webClient.DownloadString(‘https://a.pomf[.]cat/ntluca.txt |
https://a.pomf[.]cat/ntluca.txt |
Junk getting through the bracket regex before the prefix |
http://HtTP:\\193[.]29[.]187[.]49\qb.doc\u201d |
HtTP:\\193[.]29[.]187[.]49\qb.doc |
Handle backslashes as a defang/refang; include unicode quote as punctuation in regexes |
http://tintuc[.]vietbaotinmoi[.]com\u201d |
http://tintuc[.]vietbaotinmoi[.]com |
include unicode quote as punctuation in regexes |
espn[.]com.\u201d |
include unicode quote as punctuation in regexes | |
http://calendarortodox[.]ro/serstalkerskysbox.png” |
include unicode quote as punctuation in regexes | |
tFtp://cFa.tFrFa |
??? | No idea... investigate the source to see what this was supposed to be |
h\u2013p://dl[.]dropboxusercontent[.]com/s/rlqrbc1211quanl/accountinvoice.htm |
This is actually correct, but the refang function needs to handle unicode em-dash. | |
hxxp://paclficinsight.com\xa0POST /new1/pony/gate.php |
hxxp://paclficinsight.com |
Just stop on the \xa0 unicode space |
http://at\xa0redirect.turself-josented[.]com |
||
KDFB.DownloadFile('hxxps://authenticrecordsonline[.]com/costman/dropcome.exe', |
||
at\xa0hxxp://paclficinsight[.]com/new1/pony/china.jpg |
||
hxxp://<redacted>/28022018/pz.zip.\xa0 |
hxxp://<redacted>/28022018/pz.zip |
No way to recover the redacted unfortunately... just drop the \xa0 and pass the rest even though this is useless as an IOC |
hxxp:// 23.89.158.69/gtop |
Same \xa0 issue | |
h00p://bigdeal.my/gH9BUAPd/js.js"\uff1e\uff1c/script\uff1e |
h00p://bigdeal.my/gH9BUAPd/js.js |
More unicode regex tightening |
hxxp://smilelikeyoumeanit2018[.]com[.]br/contact-server/, |
Comma should be stripped | |
hxxp:// feeds.rapidfeeds[.]com/88604/ |
||
hxxp://www.xxx.xxx.xxx.gr/1.txt\u2019 |
||
h00p://119 |
Piece of a IP URL... should probably filter these out somehow, maybe this is solved by whatever solves the "Extracts part of the match as a second URL" cases | |
h00p://218.84 |
||
hxxp:// "www.hongcherng.com"/rd/rd |
||
http://http%3a%2f%2f117%2e18%2e232%2e200%2f |
Extra scheme for some reason... | |
http://http%3a%2f%2fgaytoday%2ecom%2f |
||
h00p://http://turbonacho(.)com/ocsr.html"\uff1e |
Extra scheme and unicode issues |
from iocextract.
This is the source of the cFa.tFrFa
ioc: https://malware.news/t/technical-teardown-analysing-malspam-attack/11149. There's some obfuscation here that's beyond what we can handle as a defang. I think this one can be ignored. The real indicator is listed later in the post anyway.
from iocextract.
Thanks :) Unfortunately the way we're getting this text, it's split up so that we can't regex out the full obfuscated URL:
‘iFlFe(‘FhFtFtp://cFa.tFrFa’ +
‘deFlaFtFinosF.Fco/jF’ +
On top of that, the every-other-character obfuscation is more complicated than the simple defangs this library was meant to cover, so there's no good way to parse it out. That said, the deobfuscated URL is also contained later in the same text, so we do parse that out correctly - we just get an extra false-positive URL coming through as tFtp://cFa.tFrFa
that an analyst would have to manually remove/ignore. Not a big issue, just something I noticed while combing through some test data.
from iocextract.
Oh, to clarify, we're not looking at/extracting from the original file here, only the RSS feeds of a bunch of security blogs. That probably wasn't clear at all in the issue context.
from iocextract.
No problem and agreed, it appears to be outside of the scope of the tool. Good job, I'll use this in the future I'm sure so 😀.
As a side note. If you want some good regex's check out the source code of cyber chef, GCHQs tool. You have many covered already though. I'll contribute where I can.
from iocextract.
Thanks for the tip!
CyberChef regex for future reference: https://github.com/gchq/CyberChef/blob/master/src/core/operations/Extract.js. The IPv6 seems more advanced than ours for sure.
from iocextract.
Closing via #24, which fixes most of the remaining bugs from this issue.
from iocextract.
Related Issues (20)
- base64 strings HOT 2
- Binary Extraction HOT 1
- Extracting URLs that have been base64 encoded HOT 2
- 'https' scheme values defanged as HXXPS are refanged as 'http' HOT 3
- PyPi License Mismatch HOT 1
- catastrophic backtracking in BACKSLASH_URL_RE HOT 2
- extract_unencoded_url is too greedy when parsing Windows command lines HOT 4
- Improve documentation
- Improve extraction for non-defanged URLs HOT 6
- Review documentation HOT 1
- URL is not extracted correctly HOT 1
- module 'iocextract' has no attribute 'refang_url' HOT 2
- ModuleNotFoundError: No module named 'iocextract' HOT 6
- Add the function --extract-domains and --extract-subdomains HOT 2
- BUG: --extract-ipv4s does not work HOT 3
- Add a function to import directly from a server and extract IOCs. HOT 2
- how do I add a ioc_type label with the output? HOT 1
- Found IPs being parsed as URLs HOT 1
- Getting Error: binascii.Error: Incorrect padding
- Time is detected as an IP
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from iocextract.