Coder Social home page Coder Social logo

ahmia-site's People

Contributors

chamalis avatar dependabot[bot] avatar iriahi avatar juhanurmi avatar kriyszig avatar mdhash avatar mik317 avatar obliviousparadigm avatar razorfinger avatar skrish13 avatar wtf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ahmia-site's Issues

Dependency conflicts

ERROR: Cannot install -r requirements/common.txt (line 6) and cffi==1.7.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested cffi==1.7.0
    cryptography 39.0.1 depends on cffi>=1.12

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
ERROR: Cannot install -r requirements/common.txt (line 28) and idna==2.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested idna==2.1
    requests 2.28.2 depends on idna<4 and >=2.5
    The user requested idna==2.1
    requests 2.28.1 depends on idna<4 and >=2.5
    The user requested idna==2.1
    requests 2.28.0 depends on idna<4 and >=2.5
    The user requested idna==2.1
    requests 2.27.1 depends on idna<4 and >=2.5; python_version >= "3"
    The user requested idna==2.1
    requests 2.27.0 depends on idna<4 and >=2.5; python_version >= "3"
    The user requested idna==2.1
    requests 2.26.0 depends on idna<4 and >=2.5; python_version >= "3"
    The user requested idna==2.1
    requests 2.25.1 depends on idna<3 and >=2.5
    The user requested idna==2.1
    requests 2.25.0 depends on idna<3 and >=2.5
    The user requested idna==2.1
    requests 2.24.0 depends on idna<3 and >=2.5
    The user requested idna==2.1
    requests 2.23.0 depends on idna<3 and >=2.5
    The user requested idna==2.1
    requests 2.22.0 depends on idna<2.9 and >=2.5
    The user requested idna==2.1
    requests 2.21.0 depends on idna<2.9 and >=2.5
    The user requested idna==2.1
    requests 2.20.1 depends on idna<2.8 and >=2.5
    The user requested idna==2.1
    requests 2.20.0 depends on idna<2.8 and >=2.5

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

updating python packages

Installing the requirements through pip to the virtualenv currently fails (as can be seen in the travis's output #16) with Error: could not determine PostgreSQL version from '10.1', similarly to psycopg #594 .

Moreover cryptography version seems to also need an updated version.

Updating to psycopg2==2.7.4 and cryptography==2.1.4 fixes this issue.

Should I make a new branch in order to test these versions after ahmia setup, or submit directly to master?

Blacklist

Where is the blacklist of hashes at?

I'm trying to make a tool that a person enters in a Onion URL that automatically hashes it and then searches that you have generates on your website.

Server is down!

Not sure if this is the right place to report it, but ahmia.fi replies with 502 error

search speed

the search is too slow these days on both tor or clearnet. In most cases id does not return anything for long while

Doesn't seem to populate the database all the way

For unknown reason, it seems that the postgres database is not populated with all of the tables when running python3 ahmia/manage.py migrate

django.db.utils.ProgrammingError: relation "ahmia_searchquery" does not exist
LINE 1: ...urrences", "ahmia_searchquery"."search_term" FROM "ahmia_sea...

Consider Updating?

From what i see the versions of dependencies you guys are using extremly outdated versions of dependincies.
Python 3.6 opposed 3.12
Django 5.0 opposed to 1.11
Elastic 6 opposed to 8
If there is a particular choice behind this choice(Assuming later versions dont work nicely with Tor) i would love to be informed because i am actively considering contributing to this project.

Try out weighted fields search and compare with current results

Current search uses a copy_to field that combines title, meta, anchors equally (as far as I can tell). [1]

Then multi_match search is performed on that composed field, discarding stemming, etc [2]

Try to order results based on weighted coefficients, e.g

...
"multi_match": {
    "query": query,
    "type": "most_fields",
    "fields": [
           #"fancy",
           #"fancy.stemmed",
           #"fancy.shingles",
          'title^4', 'meta^2', 'content^2',' anchors^2'
          # TODO find a way to use stemmed, shingles filters here 
    ],
    "minimum_should_match": "75%",
    "cutoff_frequency": 0.01,
}
...

and compare the results.

[1] https://github.com/ahmia/ahmia-index/blob/master/mappings_tor.json#L139
[2] https://github.com/ahmia/ahmia-site/blob/master/ahmia/search/views.py#L82

Blacklist testing

Hi !
I'm working on a 100% legal-content deepnet search engine and would like to use your blacklist to kick unwanted websites.

Fact is, when crawling the deepnet, no onion URL did never match the blacklist. Either my crawler never meets any banned stuff (sadly doubtful), either I really suck at coding (quite probable).

Did someone know a real banned onion url to test it against the blacklist ? (an old and unaccessible V2 onion URL would be perfect for that) ?

Or would it be possible to add a dummy onion url in the blacklist for testing purpose, like "trytoencodethistoseeifitsmatchahmiasblacklist.onion" ?

Thank you for the answer and for the project :)

Improve exceptions logging

The way logger.exception(e) writes to the logs is somehow bloated and suboptimal.

  • Avoid writing the whole stack trace
  • Include the timestamp

Improve Add onion form code

The onion add form returns same generic message for both "already exists" case and "invalid" case.

It would be better to distinguish between them.
Also replace TemplateView with FormView, that better fits for this use case.

Adjust results ordering based on popularity (backlinks)

Include a popularity metrics algorithm, that will have some influence on the results ordering.

This will use the backlinks from onion addresses. If spam link farmers are detected, we could use some sort of detection mechanism to reduce the influence of those spam websites.

We have to find an appropriate formula to combine that rating with the Elasticsearch's ordering score that's already applied.

Compare with the current results to find out if we managed to improve results ordering.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.