Coder Social home page Coder Social logo

ahmia / search Goto Github PK

View Code? Open in Web Editor NEW
153.0 153.0 59.0 25.8 MB

Ahmia - Search Engine for onion services.

License: BSD 3-Clause "New" or "Revised" License

Python 4.98% HTML 78.38% CSS 3.36% JavaScript 11.82% Shell 0.17% TeX 1.17% XSLT 0.09% Perl 0.03%

search's People

Contributors

bsloan avatar chrismacnaughton avatar copiesofcopies avatar dependabot[bot] avatar juhanurmi avatar mdhash avatar razorfinger avatar skrish13 avatar wtf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

search's Issues

Add more statistics

We should start a conversation on what statistics should be added.

What about searches themselves (top keywords) aka "trends project".

Add a screenshot to an indexed item

We should store a screenshot of an indexed page.
I'm not sure whether we should put the image path in the index or the picture as a base64 string.

Index of Onions

Thank you for the project! Does the ahmia.fi project offer a list of working onions you've scraped?
How often do these onions change?

Installation Documentation

This is not a major issue but the documentation is outdated to the last version and there is very little information available as in a guide, would be great to have something more detailed if possible.

Bang syntax

The !bang syntax of duckduckgo is a great feature. It enables to search something in another website that have it's own search engine. We should enable people to propose bangs to search in other hidden services.

Should we make a list of initially supported bangs (for instance, duckduckgo has an .onion service, why not expose it with a !ddg command).

API in CSV, JSON and RSS

Please offer an API so it would be easy to retrieve results using IRC (or XMPP) chat bots and other applications.

Fake Real comparison

there seems to be 1 big cloner (and some smaller ones)
luckily the big cloner also clones linkdirectories where the real onion links are replaced with his portofolio of cloned onions.
simply comparing the diff of the list will give a list of clones.

have forked your site (just trying to learn my way with python and django) and was planning to implement an automated script to regularly check cloned linkdirectories and mark them as "clone" in the same fashion as the "banned" sites, if time permits that is.
below example is by Daniel Winzen, operator of the real link directory in the script

<?php
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_PROXY, '127.0.0.1:9050');
curl_setopt($ch, CURLOPT_PROXYTYPE, 7);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 25);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
curl_setopt($ch, CURLOPT_URL,
'http://tt3j2x4k5ycaa5zt.onion/onions.php?format=scamtest');
$links=explode("\n", curl_exec($ch));
curl_setopt($ch, CURLOPT_URL,
'http://tt3j277rncfaqmj7.onion/onions.php?format=scamtest');
$scam_links=explode("\n", curl_exec($ch));
$i=$scam=0;
if(count($links)===count($scam_links)){
        foreach($links as $link){
        if($link!==$scam_links[$i]){
            preg_match('~(^(https?://)?([a-z2-7]{16})(\.onion(/.*)?)?$)~i',
$link, $addr);
            $address=strtolower($addr[3]); //real address
            preg_match('~(^(https?://)?([a-z2-7]{16})(\.onion(/.*)?)?$)~i',
$scam_links[$i], $addr);
            $scam_address=strtolower($addr[3]); //clone
            //add clone to database
            ++$scam;
        }
        ++$i;
    }
}
echo "$i onions checked\n";
echo "$scam onions were scam\n";
?>

Make a google-trend like interface

It is related with #22
This issue concerns only making the interface. Any idea of a good chart library?
Since this tool should be dynamic, what about making it a one page web-app?

Add stats to indexed items to compute a pertinence score

The idea is to being able to compute a score for an indexed onion site/page.
I'm not sure whether the score should be computed at indexation time or search time. I need to do some research firsts.
The stats we have (popularity, backlink, number of clicks) can be useful to compute that score.
Also, icey proposed to work on the uptime stat, which could be useful.

Evaluate repositories organization

I'm thinking of the following structure:
ahmia/ <- Org name
ahmia <- Repository containing the django app of ahmia.fi, documentation on how to install it and how to run it (with apache, nginx config samples)
onion-elastic-bot <- Crawler and install guide

I'm open to suggestions. Is the tools directory still used?

Write more tests

Tests need to be written for the django app and the crawler.
The coverage should be displayed on each project index.

json_html description update

hi there,

have forked your site, and checking it out.
all seems to be working fine except for one issue,
not sure if its the code here on github which is not updated since it seems to work on ahmia.fi, or that there is something wrong with my ubuntu 14.04 python setup.

the test_hidden_services.py updates the official description.json perfectly,

but the json_html seems to be problematic,
the log indicates that the description is updated... the only thing however filled in postgresql description table is the "title" column with http://blahblah.onion instead of the title extracted from html, other fields are blank instead of NULL after the update.

am trying to find what is wrong (have limited skills but enjoying the exercise), but been on a goosehunt for a while now and can't find the source of the problem.

when i test "def analyze_front_page(raw_html):" manually it seems to produce the correct json output,.
again, not sure if it a known problem or simply my setup

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.