Coder Social home page Coder Social logo

jazzband / django-robots Goto Github PK

View Code? Open in Web Editor NEW
459.0 459.0 97.0 333 KB

A Django app for managing robots.txt files following the robots exclusion protocol

Home Page: https://django-robots.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Python 97.89% HTML 2.11%

django-robots's Introduction

Django Robots

Jazzband

This is a basic Django application to manage robots.txt files following the robots exclusion protocol, complementing the Django Sitemap contrib app.

For installation instructions, see the documentation install section; for instructions on how to use this application, and on what it provides, see the file "overview.txt" in the "docs/" directory or on ReadTheDocs: https://django-robots.readthedocs.io/

Supported Django versions

  • Django 4.0
  • Django 3.2
  • Django 3.1
  • Django 2.2

For older Django versions (1.6-1.10) use django-robots==3.0. for Django 2 and above, use django-robots>=4.0.0.

Supported Python version

  • Python 3.7, 3.8, 3.9, 3.10

django-robots's People

Contributors

alekam avatar barttc avatar blag avatar browniebroke avatar freakboy3742 avatar gbezyuk avatar hwkns avatar i-trofimtschuk avatar jan-szejko-steelseries avatar jazzband-bot avatar jezdez avatar jnns avatar jpadilla avatar jscott1971 avatar kragniz avatar lpomfrey avatar martync avatar mattaustin avatar mkai avatar msamoylov avatar nautatva avatar petrdlouhy avatar pilt avatar pre-commit-ci[bot] avatar scream4ik avatar sergioisidoro avatar smithdc1 avatar tony avatar umarmughal824 avatar yakky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-robots's Issues

Needs version bump and new deploy to PyPi

Just pip installed this but it doesn't have the fix for fields fix in robots/forms.py.

I've worked around this by changing the entry in my requirements.txt file to -e git://github.com/jezdez/django-robots.git#egg=robots to install it from GitHub.

Django 1.4 compatibility

Unfortunately I'm still stuck with Django 1.4 for one of my projects, and the latest release (1.1) breaks compatibility. The specific thing that breaks it: 41ac348

Beyond that it seems to work just fine so it would be great if that setting would be version dependent.

Howto use?

i'm not find in docs how to use django-robots rules.
I need next robots.txt:
User-agent: *
Disallow: /
Allow: /about
Allow: /catalog

what should I do to get such content robots.txt?
Thank you.

Garbaled text file instead of robots.txt in browser

This package is configured properly on my local machine. However when I pushed it on our staging server (apache + uwsgi), it is behaving weirdly.

When I go to https://test.mysite.com/robots.txt, it actually downloads the file of same name(robots.txt), but I cannot open it in gedit. cat command shows garbled texts in it.

Googled about this but couldn't find specific info.

What could be the problem here?

Feature request: wagtail support

The current django-robots implementation is dependent on django.contrib.sites. Wagtail uses its own Site model, meaning django-robots currently is not compatible with Wagtail. It would be a nice feature to be able to use django-robots for django applications that cannot make use of the 'sites' framework, such as Wagtail-sites.

3.1.0 pypi release version

Hi guys, i'm trying to install the new version but i can't find it.

Why the new version still isn't in pypi?

Host

Hello.
Please, if possible, add the Host field.
SEO specialists is ask.

     User-agent: *
     Disallow: /admin/
     + --> Host: example.com
     Sitemap: http://example.com/sitemap.xml

sitemap not being included in robots.txt

I have tried both default discovery as well as specifying it manually using the ROBOTS_SITEMAP_URLS setting but I am still unable to get the site map to show up on the robots.txt

Firefox Character Encoding

Firefox console shows this error:

"The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature."

Current code:

url(r'^robots.txt/', include('robots.urls')),


Alternatives?

path('robots.txt', TemplateView.as_view(template_name="robots.txt", content_type='text/plain')),
url(r'^robots.txt', lambda x: HttpResponse("User-Agent: *\nDisallow:", content_type="text/plain"), name="robots_file"),


How do you combine include('robots.urls') with the alternatives or is it not possible?

It should probably be:

content="text/html; charset=utf-8" or "text/plain; charset=UTF8" not 'text/plain'​​

curl -s -D - 127.0.0.1:8000/robots.txt/
HTTP/1.1 200 OK
Date: Tue, 14 Apr 2020 01:05:52 GMT
Server: WSGIServer/0.2 CPython/3.7.3
Content-Type: text/plain

Django 2.0 support

https://github.com/develtech/django-robots/tree/django-2.0

  • Update tox 8f5b5f6

  • Update travis 8f5b5f6

  • #82 Drop 1.10 and below? 86cf942

  • b85ae48 #79 urlresolvers update

  • Test fix: d6acbfd

      File "/Users/me/work/python/django-robots/tests/test_utils/urls.py", line 16, in <module>
        url(r'^jsi18n/(?P<packages>\S+?)/$', django.views.i18n.javascript_catalog),  # NOQA
    AttributeError: module 'django.views.i18n' has no attribute 'javascript_catalog'
    
  • Test fix: 6a5baf7

      File "/Users/me/work/python/django-robots/tests/test_utils/urls.py", line 16, in <module>
     url(r'^admin/', include(admin.site.urls)),  # NOQA
      File "/Users/me/work/python/django-robots/.venv/lib/python3.6/site-packages/django/urls/conf.py", line 27, in include
        'provide the namespace argument to include() instead.' % len(arg)
     django.core.exceptions.ImproperlyConfigured: Passing a 3-tuple to include() is not supported. Pass a 2-tuple containing the list of patterns and app_name, and provide the namespace argument to include() instead.
    
  • ab18b86 Inside tests: Using User.is_authenticated() and User.is_anonymous() as methods rather than properties is no longer supported. source:

    ERROR: test_view_site_2 (tests.test_views.ViewTest)
    Traceback (most recent call last):
      File "/Users/me/work/python/django-robots/tests/test_views.py", line 82, in test_view_site_2
        request = self.get_request(path='/', user=AnonymousUser(), lang='en')
      File "/Users/me/work/python/django-robots/tests/base.py", line 31, in get_request
        if user.is_authenticated():
    TypeError: 'bool' object is not callable
    

    It should be valid to use it as a property in Django 1.10+ source

Warning from django1.7.7

/Library/Python/2.7/site-packages/robots/forms.py:7: RemovedInDjango18Warning: Creating a ModelForm without either the 'fields' attribute or the 'exclude' attribute is deprecated - form RuleAdminForm needs updating
class RuleAdminForm(forms.ModelForm):

Django deprecation warning

When using django-robots under Django 3.1.7 I'm getting the following deprecation warning:

  File "/home/.../lib/python3.8/site-packages/robots/models.py", line 9, in <module>
    class Url(models.Model):
  File "/home/.../lib/python3.8/site-packages/robots/models.py", line 16, in Url
    _("pattern"),
  File "/home/.../lib/python3.8/site-packages/django/utils/translation/__init__.py", line 144, in ugettext_lazy
    warnings.warn(
django.utils.deprecation.RemovedInDjango40Warning: django.utils.translation.ugettext_lazy() is deprecated in favor of django.utils.translation.gettext_lazy().

Add tests

Project currently lacks tests, it would be beneficial to add some

Drop 1.10 and below?

source: https://docs.djangoproject.com/en/2.0/releases/2.0/

Following the release of Django 2.0, we suggest that third-party app authors drop support for all versions of Django prior to 1.11. At that time, you should be able to run your package’s tests using python -Wd so that deprecation warnings do appear. After making the deprecation warning fixes, your app should be compatible with Django 2.0.

Backstory is Django 1.11 is LTS (Supported until April 2020).

Need to run syncdb twice?

Hi,

I recently added this to my app and it seems like robots_* tables aren't created the first time syncdb is done, but it is the second time.

Is this correct? I don't have a problem with this, I'm just wondering if this is normal or not.

Thanks,
Shige

Sitemap url scheme in robots.txt

Hi,

I use https on project. In my robots.txt file I cant get sitemap url to have https scheme.

In site admin I have domain name without scheme, and name with https scheme. Even if both have no scheme, I have sitemap url with http.
ROBOTS_USE_SCHEME_IN_HOST = True
How can I setup sitemap to hame https?

(but sitemap looks proper with protocol = 'https')

Why use South

First, sorry by my english.

South is excellent to work with version control.
Consider the following:
So far the project django-robots is correct. But if tomorrow you need to delete a field from some model and add another non-null and without a default
Syncdb not matter. Syncdb does exclusion processes. And it does not add a default on new fields.
When I download the application, of course Databaseerror throws an exception, since I am the fields that you have added to my bank ever created.
With the South, I give a ready and migrate.

I hope you understand.

See more: http://south.aeracode.org/docs/about.html

django-robots without SITE_ID set gives an error on django 2.0

To repeat this.

  1. Make sure django goversion is 2.X or above
  2. Make sure SITE_ID is not set.
  3. Make sure ROBOTS_SITE_BY_REQUEST is not set on settings.

/robots.txt endpoint gives following error:

You're using the Django "sites framework" without having set the SITE_ID setting. Create a site in your database and set the SITE_ID setting or pass a request to Site.objects.get_current() to fix this error.

It would be nice if SITE_ID is not set django's default behaviour would work.
Error is coming from this line:

return Site.objects.get_current()

Django 2.0: No module named 'django.core.urlresolvers'

File ".../lib/python3.6/site-packages/robots/views.py", line 3, in <module>
    from django.core.urlresolvers import NoReverseMatch, reverse
ModuleNotFoundError: No module named 'django.core.urlresolvers'

This errors happens in Django 2.0

Django 2.0 warning importing from urlresolvers

With robots 3.0 and Django 1.11 with deprecation warnings enabled, I see

robots/views.py:3: RemovedInDjango20Warning: Importing from django.core.urlresolvers is deprecated in favor of django.urls.

Feature request: ordering of allow and disallow rules

Allowing to set the order of allow and disallow rules would be helpful.
As far as I know google does not care for the order, but other do, as speicied in the draft for robots.txt:
http://www.robotstxt.org/norobots-rfc.txt -> 3.2.2 The Allow and Disallow lines

To evaluate if access to a URL is allowed, a robot must attempt to
match the paths in Allow and Disallow lines against the URL, in the
order they occur in the record. The first match found is used.

Is it possible to have permissive default?

We've installed the app on our production site where we did not want to disallow any crawling.

Unfortunately - with the disallow all default we had an issue with Google, which basically stopped all search traffic just upon upgrading the application. We've lost 60% of traffic instantly.

Swappable site model

Hi,

As stated in issue #63 adding support for wagtail isn't in scope of this package, I agree on that.. I was wondering if it's feasible to have the site model via a setting (just like django's swappable User model). That way we can easily integrate with whatever package which uses a different model for its Sites.

Hope you'll agree on this? If so, we can provide a PR for this..

Django 3.0 Support

Thanks for this package and your work maintaining it.

  • OS version and name: Ubuntu 18.04
  • python version: 3.8
  • django version: 3.0
  • django-robots version: 3.1.0

I am attempting to upgrade my django project to python 3.8 and django 3.0, and when I attempt to run my test suite, I receive this error from robots/models.py on line 3.

ImportError: cannot import name 'python_2_unicode_compatible' from 'django.utils.encoding'

I believe that Django 3.0 removes this due to the drop in Python 2 support.

https://docs.djangoproject.com/en/3.0/releases/3.0/#removed-private-python-2-compatibility-apis

To achieve Django 3.0 support, I believe this will have to be removed from django-robots.

"ImportError: No module named robots"

Hello. In testing phase works ok, but in the deployment have this error: "ImportError: No module named robots".

I'm using Django 1.6
Otherwise it works great.

Not Compatible With Content Security Policy

In Chrome browsers, inspecting element of robots.txt shows:

<pre style="word-wrap: break-word; white-space: pre-wrap;">User-agent: *
Disallow:

Host: example.com

</pre>

Since this is inline CSS, it violates content security policy.

But I noticed this is a problem with all robots.txt files from other websites and viewed with Chrome inspect.

Maybe something to alleviate this would be to hide possible CSP errors on the console.

Otherwise don't worry about doing anything with this issue. Just wanted people to be aware.

Feature request: @block_robots decorator for views

It would be nice if django-robots included a decorator to block robots from views based on User-agent (like robots.txt). It would help django apps outright prevent robots - even mis-behaving ones that don't follow robots.txt - from accessing views that they shouldn't.

Possible to generate incompatible format with Python RobotFileParser

I've been parsing my own robots.txt file with Python and found an interesting compatibility scenario:

If you create multiple Robot records with the same user-agent, they are spaced apart by a blank line, causing Python's RobotFileParser to miss subsequent lines if you read it in. I'm looking at Robots v3 and Python 3.5. Is this something you'd want to change or document?

https://github.com/python/cpython/blob/3.5/Lib/urllib/robotparser.py

Example robots.txt generated:

User-agent: *
Disallow: /one

Disallow: /two

Host: example.com

The work-around is simple -- you create a single Robot record with both rules so that robots.txt has no blank line:

User-agent: *
Disallow: /one
Disallow: /two

Host: example.com

To reproduce:

from urllib.robotparser import RobotFileParser
robots = RobotFileParser('http://example.com/robots.txt')
robots.read()
robots.can_fetch(useragent='', url='/two')

No module name 'six'

File "C:\Users\user\Project\Developments\Web\myproject\venv\lib\site-packages\robots\models.py", line 5, in
from six import python_2_unicode_compatible, u <------

[Warning][Django1.10]RemovedInDjango110Warning

Hi man, i test today this package and i get the next warnings:

RemovedInDjango110Warning: Support for string view arguments to url() is deprecated and will be removed in Django 1.10 (got rules_list). Pass the callable instead.
  url(r'^$', 'rules_list', name='robots_rule_list'),

/run/media/salahaddin/Data/Proyectos/Trabajo/ccct/lib/python3.5/site-packages/robots/urls.py:8: RemovedInDjango110Warning: django.conf.urls.patterns() is deprecated and will be removed in Django 1.10. Update your urlpatterns to be a list of django.conf.urls.url() instances instead.
  url(r'^$', 'rules_list', name='robots_rule_list'),

Thnks for your work, :D

Tests no longer passes in older Django versions

When you have Django 1.5.* and you run tests, django-robots 1.1 fails with message:

File "/lib/python2.7/site-packages/robots/forms.py", line 7, in <module>
  class RuleAdminForm(forms.ModelForm):
File "/lib/python2.7/site-packages/django/forms/models.py", line 221, in __new__
  raise FieldError(message)
FieldError: Unknown field(s) (a, l, _) specified for Rule

This is related to #30 where __all__ was added to support Django 1.8 which causes to break in older versions.

SITEMAP_URLS

Hi, i have multi hosts project. And i need one sitemap for current host.
When i open "example.com" i have "example.com/sitemap" - it's good, but when open "example2.com" i have "example.com/sitemap".

What do you can help me with it?

[BUG] HOST url scheme without https

Hello!
I am using django + nginx + https
if set ROBOTS_USE_SCHEME_IN_HOST = True

I get this result in the robots.txt: Host: http://site.com
But expected: Host: https://site.com

Maybe it happens because of using nginx.
nginx proxies traffic to gunicorn via http

location / {
        proxy_pass http://web:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

How i can fix it?

Field deprecated warning

/home/tulipan/Proyectos/IspanyolHaber/lib/python3.4/site-packages/robots/forms.py:7: RemovedInDjango18Warning: Creating a ModelForm without either the 'fields' attribute or the 'exclude' attribute is deprecated - form RuleAdminForm needs updating
  class RuleAdminForm(forms.ModelForm):

I have this warning when i run the django server.
Pay atettion to it please.

Python 3 Support

Hi guys,

is python 3 supported? If yes, maybe would be nice to add it to setup.py so that others would know and script could detect that :)

Add Local doc testing (My bad on pushing to master)

My bad on 7303dff 5e81712 6122cdd. I am new to jazzband projects and realized I can push to master, but can't amend+force. I rarely make this mistake, but it happened this morning. Mea culpa on that.

Next time I'll keep it in a PR.

Let's get local sphinx builds working so we can preview doc changes without having to wait for RTD to update

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.