Coder Social home page Coder Social logo

lookyloo / lookyloo Goto Github PK

View Code? Open in Web Editor NEW
655.0 19.0 79.0 5.28 MB

Lookyloo is a web interface that allows users to capture a website page and then display a tree of domains that call each other.

Home Page: https://www.lookyloo.eu

License: Other

Python 69.07% CSS 0.99% JavaScript 6.07% HTML 23.69% Dockerfile 0.14% Shell 0.04%
information-security privacy web-security dfir capture scraping lookyloo

lookyloo's Introduction

Lookyloo icon

Lookyloo is a web interface that captures a webpage and then displays a tree of the domains, that call each other.

Gitter

What's in a name?!

Lookyloo ...

Same as Looky Lou; often spelled as Looky-loo (hyphen) or lookylou

1. A person who just comes to look.
2. A person who goes out of the way to look at people or something, often causing crowds and disruption.
3. A person who enjoys watching other people's misfortune. Oftentimes car onlookers that stare at a car accidents.

In L.A., usually the lookyloos cause more accidents by not paying full attention to what is ahead of them.

Source: Urban Dictionary

No, really, what is Lookyloo?

Lookyloo is a web interface that allows you to capture and map the journey of a website page.

Find all you need to know about Lookyloo on our documentation website.

Here's an example of a Lookyloo capture of the site github.com Screenshot of Lookyloo capturing Github

REST API

The API is self documented with swagger. You can play with it on the demo instance.

Installation

Please refer to the install guide.

Python client

pylookyloo is the recommended client to interact with a Lookyloo instance.

It is avaliable on PyPi, so you can install it using the following command:

pip install pylookyloo

For more details on pylookyloo, read the overview docs, the documentation of the module itself, or the code in this GitHub repository.

Notes regarding using S3FS for storage

Directory listing

TL;DR: it is slow.

If you have namy captures (say more than 1000/day), and store captures in a s3fs bucket mounted with s3fs-fuse, doing a directory listing in bash (ls) will most probably lock the I/O for every process trying to access any file in the whole bucket. The same will be true if you access the filesystem using python methods (iterdir, scandir...))

A workaround is to use the python s3fs module as it will not access the filesystem for listing directories. You can configure the s3fs credentials in config/generic.json key s3fs.

Warning: this will not save you if you run ls on a directoy that contains a lot of captures.

Versioning

By default, a MinIO bucket (backend for s3fs) will have versioning enabled, wich means it keeps a copy of every version of every file you're storing. It becomes a problem if you have a lot of captures as the index files are updated on every change, and the max amount of versions is 10.000. So by the time you have > 10.000 captures in a directory, you'll get I/O errors when you try to update the index file. And you absolutely do not care about that versioning in lookyloo.

To check if versioning is enabled (can be either enabled or suspended):

mc version info <alias_in_config>/<bucket>

The command below will suspend versioning:

mc version suspend <alias_in_config>/<bucket>

I'm stuck, my file is raising I/O errors

It will happen when your index was updated 10.000 times and versioning was enabled.

This is how to check you're in this situation:

  • Error message from bash (unhelpful):
$ (git::main) rm /path/to/lookyloo/archived_captures/Year/Month/Day/index
rm: cannot remove '/path/to/lookyloo/archived_captures/Year/Month/Day/index': Input/output error
  • Check with python
from lookyloo.default import get_config
import s3fs

s3fs_config = get_config('generic', 's3fs')
s3fs_client = s3fs.S3FileSystem(key=s3fs_config['config']['key'],
                                secret=s3fs_config['config']['secret'],
                                endpoint_url=s3fs_config['config']['endpoint_url'])

s3fs_bucket = s3fs_config['config']['bucket_name']
s3fs_client.rm_file(s3fs_bucket + '/Year/Month/Day/index')
  • Error from python (somewhat more helpful):
OSError: [Errno 5] An error occurred (MaxVersionsExceeded) when calling the DeleteObject operation: You've exceeded the limit on the number of versions you can create on this object
  • Solution: run this command to remove all older versions of the file
mc rm --non-current --versions --recursive --force <alias_in_config>/<bucket>/Year/Month/Day/index

Contributing to Lookyloo

To learn more about contributing to Lookyloo, see our contributor guide.

Code of Conduct

At Lookyloo, we pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community. You can access our Code of Conduct here or on the Lookyloo docs site.

Support

  • To engage with the Lookyloo community contact us on Gitter.
  • Let us know how we can improve Lookyloo by opening an issue.
  • Follow us on Twitter.

Security

To report vulnerabilities, see our Security Policy.

Credits

Thank you very much Tech Blog @ willshouse.com for the up-to-date list of UserAgents.

License

See our LICENSE.

lookyloo's People

Contributors

adrima01 avatar adulau avatar antoniabk avatar arhamyss avatar buildbricks avatar cudeso avatar dependabot[bot] avatar docarmorytech avatar dssecret avatar fafnerkeyzee avatar felalex57 avatar numbuh474 avatar of-cag avatar rafiot avatar steveclement avatar sw-mschaefer avatar th4nat0s avatar vmdhhh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lookyloo's Issues

SVG interactions

Main hostname tree:

  • click on icon (i.e. JS) -> displays box with all URLs loading a JS
  • click on hostname -> display all the related URLs (same format as hostnames: line 1: URL, Line 2: icons)

Overlay box:

  • click on icon (i.e. JS) -> download the content

CSV export

Will be nice ( yes again )....

To have the capacity to export the data, json is an option but most of the time CSV is the usable by most people.

HIT, Called by, [type... javascript, cookie, etc..]

vala :)

Collapse/expand tree/pop up window ambiguity

expand/collapse tree current links to windows, but text controls pop up window. Put text and tree circle on the same horizontal rule, and give them both a similar border, drop the inheritance like from between them. (or possibly from the right hand side of the new border?)

Option to disable or rename session cookies

LookyLoo sets a session cookie (boringly named session). This is an issue if LookyLoo is being used behind a reverse proxy with an access authorization system that also happens to set a cookie named session -- the effect is that:

  1. request comes to the reverse proxy; reverse proxy does its magic and sets its session cookie to persist the authorization status;
  2. request is sent further to the upstream (i.e. LookyLoo).
  3. LookyLoo sets its own session cookie, since the one set by the reverse proxy does not conform to whatever LookyLoo expects
  4. response is returned to the client -- with the LookyLoo session cookie overwriting the reverse proxy cookie
  5. upon the next request, the whole dance starts over

This results in no session persistence and LookyLoo not working properly behind such a reverse proxy. It would be swell if it were possible to change the name of the session cookie set by LookyLoo so as not to clash with potential reverse proxy.

The cookie seems not necessary -- blocking Set-Cookie on the reverse proxy (so that it does not reach the browser) does not seem to result in loss of functionality.


For the record, a quick and dirty workaround for nginx is:

  1. make sure the reverse proxy session cookie is not sent back to LookyLoo upstream;
  2. make sure that any Set-Cookie header set by LookyLoo is blocked from reaching the user browser.

There does not seem to be a way of modifying cookie headers sent to upstreams directly in nginx config), so point 1. would either have to use Lua (like in our case) or some other method; point 2. can be done with proxy_hide_header Set-Cookie; nginx config directive.

Documentation: where does LookyLoo keep the scraped data

It would be helpful to have information where does LookyLoo keep the scraped data -- this would be required, for example, to set up volume-mounts in the docker volume so that scraped data persists across containers being recreated.

Mockups

  • Heritable display of tree node (two types: URL & type) -> need to represent inheritance from host-name node
  • Confirmation box for save

A Folding search

Hello,

It would be nice to have a "search" which will find and unfold only the relevant path to the result of the search.

Screenshots

It would be an amazing improvement if screenshots of each of the HTML pages retrieved in the process of scraping were available via the interface for inspection (this would be very informative when researching a targeting phishing attack, for instance).

MISP Integration

Lookups:

  • Domains
  • URLs & Part of URL
  • Hashes of JS/exe, ...
  • Cookies

Push:

  • Domains
  • URLs & Part of URL
  • Any content (JS/exe, ...)
  • Cookies

Duplicates

  • Same cookies set by multiple websites
  • Same JavaScript / Executable / Json / ...

Nginx Gateway Timeout

Hello,

I am running Lookyloo in Production, and have nginx running.

Whenever I submit a URL for scanning, I get a page returned saying:

504 Gateway Time-out
nginx/1.14.0 (Ubuntu)

Here is the settings under vim /etc/nginx/sites-enabled/lookyloo

server {
    listen 80;
    server_name lookyloo;

    location / {
        proxy_pass_header Server;
        proxy_set_header Host $http_host;
        proxy_redirect off;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Scheme $scheme;
        proxy_connect_timeout 10;
        proxy_read_timeout 10;
        proxy_pass http://localhost:5100/;
    }
}

I can't find a solution to this issue, are you able to assist?

BS4 missing from requirements

In a pristine Debian stable python3 installation lookyloo is not able to start since the Beautiful Soup 4 python module is missing from the requirements.

Export all domains

It would be nice to export all the domains at once to compare them between runs.

Report lookup redirects to index despite tree_uuid created

I observed the following behavior using https://www.circl.lu/urlabuse/

  1. Go to https://www.circl.lu/urlabuse/
  2. Insert a Link and hit Run lookup
  3. Click the Link 'See on Lookyloo'
  4. You are redirected to the index

The link contains a valid tree_uuid but it seems that lookup_report_dir doesn't return a valid report_dir and thus redirects you to the index.

After some moments the report is viewable.

Expected behavior:
Show an in progress notice while keeping the url intact to enable manuel refresh (F5) or redirect to the finished report once it is done.

Integration of URL Abuse

The goal is to asynchronously fire requests to URL Abuse after the scraping is over and while the tree is displayed:

  • Every URL will be sent to every relevant endpoints
  • Every domain will be resolved and sent to every relevant endpoints

Errors when setting up lookyloo.service

Hello,
Is anyone able to share their copy of /etc/systemd/system/lookyloo.service ?

Here is mine:

[Unit]
Description=uWSGI instance to serve lookyloo
After=network.target

[Service]
User=root
Group=root
WorkingDirectory=/opt/lookyloo
Environment=PATH="/usr/bin/python"
ExecStart=/opt/lookyloo/bin/start.py
Environment=LOOKYLOO_HOME=/opt/lookyloo

[Install]
WantedBy=multi-user.target

And I'm getting the following error:

# sudo systemctl status lookyloo
● lookyloo.service - uWSGI instance to serve lookyloo
   Loaded: loaded (/etc/systemd/system/lookyloo.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2019-04-04 13:47:44 CEST; 2min 48s ago
  Process: 3857 ExecStart=/opt/lookyloo/bin/start.py (code=exited, status=126)
 Main PID: 3857 (code=exited, status=126)

Apr 04 13:47:44 server systemd[1]: Started uWSGI instance to serve lookyloo.
Apr 04 13:47:44 server systemd[1]: lookyloo.service: Main process exited, code=exited, status=126/n/a
Apr 04 13:47:44 server start.py[3857]: /usr/bin/env: ‘python3’: Not a directory
Apr 04 13:47:44 server systemd[1]: lookyloo.service: Failed with result 'exit-code'.

Add basic user agent support

A few user agents, and free text box for folks who want to shoot themselves in the foot. (with a link to info on user agents so they can avoid their feet if they like)

Search box for UUID (hostname or url node)

Each Node (hostname tree and URL tree) has a UUID, adding a searchbox to put a UUID in in he main page -> load the tree and put a red box around the node.

Dependencies:

  • Dump a pickled tree to keep the UUIDs after first generation
  • For each pickle, dump the list of all UUIDs (Hostname/URL) in the directory for searching later

Requirements:

  • Force delete pickle for a tree (needs confirm box)

Link overlay box to source node

When the user clicks on a hostname, or an icon, it loads an overlay box that can be moved around.

The box needs to be connected to the originating node.

Docker-compose failes on initializing Async-scraper

Hi,

today I wanted to setup a docker container and faced the following issue. All previous 16/19 steps went well. Could someone have a look and advise how to fix it? Thank you.

Step 17/19 : run nohup pipenv run async_scrape.py
---> Running in 0197ffd4a2bc
Loading .env environment variables…
09:06:05 AsyncScraper INFO:Initializing AsyncScraper
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/connection.py", line 538, in connect
sock = self._connect()
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/connection.py", line 861, in _connect
sock.connect(self.path)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/bin/async_scrape.py", line 7, in
exec(compile(f.read(), file, 'exec'))
File "/root_lookyloo/lookyloo/bin/async_scrape.py", line 36, in
m = AsyncScraper()
File "/root_lookyloo/lookyloo/bin/async_scrape.py", line 24, in init
self.lookyloo = Lookyloo(loglevel=loglevel, only_global_lookups=only_global_lookups)
File "/root_lookyloo/lookyloo/lookyloo/lookyloo.py", line 45, in init
if not self.redis.exists('cache_loaded'):
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/client.py", line 1307, in exists
return self.execute_command('EXISTS', *names)
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/client.py", line 836, in execute_command
conn = self.connection or pool.get_connection(command_name, **options)
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/connection.py", line 1071, in get_connection
connection.connect()
File "/root/.local/share/virtualenvs/lookyloo-lb761Agm/lib/python3.6/site-packages/redis/connection.py", line 543, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 2 connecting to unix socket: /root_lookyloo/lookyloo/cache/cache.sock. No such file or directory.
ERROR: Service 'lookyloo' failed to build: The command '/bin/sh -c nohup pipenv run async_scrape.py' returned a non-zero code: 1

show redirects vertically rather than horizontally?

Because they don't return resources to the browser I think redirects are qualitatively different from other reference types like script and css sources and iframes, but they currently manifest in the same way as depth in the tree. Since redirects typically happen before resources are loaded there would generally be lots of extra vertical space available in the earlier parts of the tree, so perhaps they could be oriented vertically to emphasize this difference? For example cnn.com (https://lookyloo.circl.lu/tree/5ea5cebb-9223-42db-bdeb-34543b237b05) shows

cnn.com --> www.cnn.com --> www.cnn.com --> edition.cnn.com --> ... resources ...

would it be possible to get them to render more like this

cnn.com
   V
www.cnn.com
   V
www.cnn.com
   V
edition.cnn.com --> ... resources ...

Anonymous submit.

It will be nice to have a "don't remember me " button which allow the scanned website to not be published. ( PORN^WGDPR need )

Missing icons

File types:

  • Text
  • Audio
  • Empty content
  • POSTed in request
  • CSS
  • JSON
  • HTML
  • EXE
  • Image
  • Font
  • octet-stream
  • Video
  • Livestream
  • Link comes from an Iframe
  • No Mimetype (empty string)
  • No known type (no corresponding icon)
  • Suspected phishing (#190) -> fish + question mark?

Buttons:

  • Download URL content
  • Display URLs related to the domain

Scraping improvements

  • Proxy support
  • Pass a pre-generated cookie
  • Initial referrer
  • Locale of the browser
  • Login creds <= how to pass them properly in the webpage will be challenging (solved by passing a valid cookie)

Add collections

The possibility to "group" scan results.

Perhaps via tags or similar.

e.g: cdn.foo.example could be a group of all the sites using that cdn.

But perhaps thinking about "real" correlations would be more efficient.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.