Coder Social home page Coder Social logo

django-seo-js's Introduction

This is codebase for stevenskoczen.com, including the Steven Manual. This is the project where I've taken those code skills I use to great effect professionally and artistically, and turned them on myself, and my life. Nothing here is generalized, or for some broader purpose. This is my project to be selfish, to help me wring every bit out of the seconds of my life.

You might find it useful, but I'd bet probably not. See, unlike most of my other projects, the Steven Manual has only one code rule: this code is never going to be reused, repurposed, or serve some greater general good. Hacks are totally fine here. Duct-tape is great. Hideous memory usage, design, and abuse of programming paradigms are all done and applauded. There is only one user, and they're quite understanding - because they're me.

django-seo-js's People

Contributors

bhoop77 avatar chazcb avatar haos616 avatar mattrobenolt avatar pcraciunoiu avatar pidelport avatar rchrd2 avatar sarahboyce avatar skoczen avatar yjaaidi avatar zekzekus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-seo-js's Issues

Remove Googlebot from user agent check to prevent cloaking penalty

https://github.com/skoczen/django-seo-js/blob/master/django_seo_js/settings.py#L49

Googlebot, Yahoo, bingbot should be removed from that list as soon as possible to prevent anyone from being penalized for cloaking. Those three crawlers support the escaped fragment protocol so that you do not have to match them by user agent. If you do match them by user agent, they could think you are cloaking and penalize your website in the search results.

Add Django 4.x support

Breaks with Django 4.x

TypeError: MiddlewareMixin.__init__() missing 1 required positional argument: 'get_response'

Django with django-seo-js returning "invalid code length" error to browser and curl

Hey,

All of the sudden, Google, Facebook, and other crawlers started reporting my website as being unavailable. We're using django-seo-js==0.2.4 and the paid version of Prerender.io. Testing using _escaped_fragment_ in the browser and using cURL showed that responses are for some reason returning invalid:
image

curl: (61) Error while processing content unencoding: invalid code lengths set

I made no changes to anything affecting Prerender.io nor this configuration of this library; this just started happening out of the blue.

Doing some deeper walkthroughs through the code, Prerender.io appears to cache the content correctly, and calling self.backend.get_response_for_url(url) also returns a response with the correctly rendered HTML content, including getting the response from Prerender and transforming the requests response into a Django HttpResponse object.

When that gets returned, though, for some reason both the browser and curl think it's invalid.

I've done plenty of debugging but I'm a bit at a loss here; all I can think of is that base.py:56 is too naive with r['content-length'] = len(response.content), or it's some type of gzip issue, where somehow headers or encodings or getting passed on that shouldn't be.

Ultimately, though, my site is currently not crawlable, and that's obviously a major issue for us.

Allow non-200 status codes.

v0.1.3 refuses to pass on any non-200 status code responses. This isn't right, since things like 304s, 404s, etc should also be passed through.

The big unknown as of yet is what to do about 500s.

django-seo-js does not appear to work with Facebook

I have been trying to use django-seo-js and prerender.io to display a preview of my React SPA on Facebook. However, it appears that the Facebook crawler is not rerouted to prerender.io.

I have the following code in my setting.py file:

SEO_JS_USER_AGENTS = [
    "Googlebot",
    "Yahoo",
    "bingbot",
    "Badiu",
    "Ask Jeeves",
    "facebookexternalhit/1.1",
    "facebookexternalhit",
    "twitterbot"
]

Googlebot

Googlebot agent does not seem to be rerouted to the prerender solution (hosted or default)

According to Prerender.io

In May of 2018, Google introduced "Dynamic Rendering" for serving normal JavaScript to your users and serving Prerendered pages to search engines. This is exactly what we do at Prerender.io.

The change introduced by Google for Dynamic Rendering means they will stop crawling the ?escaped_fragment= URLs and you can now serve a prerendered page to Googlebot by checking their user agent directly. That might sound like cloaking, but Google introduced a policy change where they are allowing you to send prerendered pages to Googlebot by checking their user agent.

Due to this Dynamic Rendering announcement, we have changed our middleware to add Googlebot to the list of user agents being checked directly.

That being said when looking at the useragent.py file

     # These first three should be disabled, since they support escaped 
      fragments, and
    # and leaving them enabled will penalize a website as "cloaked".
    # "Googlebot",
    # "Yahoo",
    # "bingbot",

Googlebot is still commented out in django-seo-js, to fix this I had to add it to the list manually in settings.py

     SEO_JS_USER_AGENTS=(
    "Googlebot",
    # "Yahoo",
    # "bingbot",

    "Ask Jeeves",
    "baiduspider",
    "facebookexternalhit",
    "twitterbot",
    "rogerbot",
    "linkedinbot",
    "embedly",
    "quoralink preview'",
    "showyoubot",
    "outbrain",
    "pinterest",
    "developersgoogle.com/+/web/snippet",
    )

Since the variable value is not appended to an existing list we have to add them all.

Add html5 push state middleware

It looks like the hashbang middleware checks for escaped_fragment but the html5 push state section of the escaped fragment protocol says that URLs like:

http://www.example.com/user/1 with the <meta name="fragment" content="!"> meta tag will be accessed like:

http://www.example.com/user/1?_escaped_fragment_=

The current hashbang middleware might catch that case just fine but I had someone ask a question where it wasn't working because they took the hashbang middleware hour since they used html5 push state.

If the hashbang middleware would work for this case, then maybe that middleware should just be changed to escapedfragment or something instead of hashbang.

Thanks!

UserAgentMiddleware causes issues with sitemap.xml

If the UserAgentMiddleware is enabled, then requests by bots for sitemap.xml are passed to the prerender backend, which doesn't seem to fetch sitemaps correctly and just returns empty HTML instead. This means that Google etc cannot view the sitemap. Perhaps this behaviour should be mentioned in the readme? Or a setting made available which allows the UserAgentMiddleware to be bypassed for particular URLs?

Should default user agents be updated to include certain social media sites?

Could be good to add to the default user agents the following?

'facebookexternalhit',
'twitterbot',
'linkedinbot',
'embedly',
'showyoubot',
'pinterest',
'developers.google.com/+/web/snippet'

In cases where you want to share on some of these networks, you need to manually add these bots so that they are treated correctly. I thought it may be a good idea to add a couple of them, in particular pintrest, facebook, twitter and linkedin.

Others that prerender.io uses can be seen here, but I'm sure not all of them are required for a 'default' install: https://github.com/prerender/prerender-node/blob/master/index.js#L31

SEO_JS_IGNORE_URLS is misleading

Hi,

I ran into an issue trying to use the setting SEO_JS_IGNORE_URLS.

for each of them it checks if if url in request.path, shouldn't it be if url == request.path?

If for example you want to exclude exactly the page 'foo', you would define SEO_JS_IGNORE_URLS=['/foo'], but it would match any page containing 'foo' in the url.

Also a regex based option would be useful.

my test doesn't work

I have a prerender server running locally.

i am trying to debug it before deploy

here is my django urls.py

urlpatterns = patterns('',
    url(r'^$', IndexView.as_view()),
    url(r'partials/(?P<template_name>.+\.html?$)', PartialView.as_view()),
    url(r'^admin/', include(admin.site.urls)),
    url(r'^accounts/', include('allauth.urls')),
    url(r'^tinymce/', include('tinymce.urls')),
    url(r'^api/', include('api.urls')),
    url(r'^dashboard/', include('dashboard_urls')),
)

and my settings.py

MIDDLEWARE_CLASSES = (
    'django_seo_js.middleware.HashBangMiddleware',  # If you're using #!
    #'django_seo_js.middleware.UserAgentMiddleware',  # If you want to detect by user agent
) + MIDDLEWARE_CLASSES

INSTALLED_APPS += ('django_seo_js',)

SEO_JS_ENABLED = True

SEO_JS_BACKEND = "django_seo_js.backends.PrerenderHosted"
SEO_JS_PRERENDER_URL = "http://127.0.0.1:3000"
SEO_JS_PRERENDER_RECACHE_URL = "http://127.0.0.1:3000/recache"

and this is what i am putting in my browser url

http://localhost:8000/?_escaped_fragment_=/e/aoidufy/

but it returns me

http://localhost:8000/?_escaped_fragment_=/e/aoidufy/#!/

which shouldn't be the cafe

Can't update S3 Cache with update_cache_for_url

I can't have the update_cache_for_url helper work.
I'm using the self-hosted version of Prerender.

According to prerender.py, the update_url method calls the RECACHE_URL as defined in the settings and passes the url to update as a POST parameter.

However according to the prerender doc, you just have to do a POST request to the prerender server without any POST parameter:

POST http://my-prerender-server.com/<url-to-update>

Link : https://github.com/prerender/prerender#s3htmlcache

This worked when I tried it with curl.
Is this something specific to the self-hosted version?

Also I don't understand what is the purpose of the RECACHE_URL (I assume it's for other providers ?)

301 redirects shouldn't be followed

I ran into an issue while trying to handle 301 redirects:

When the prerender server (either self-hosted or on prerender.io) sends a 301 redirection, django-seo-js responds with a 200, it seems to follow redirections.

IMHO this is not the expected behaviour, Google needs to know that the url has changed. I think the simple fix should be to add allow_redirects=False in the call to request.get :

r = self.requests.get(render_url, headers=headers, allow_redirects=False)

Is there any scenario where we want redirects to be followed? If so, we can add a global setting, otherwise this should be disabled by default.

Moving django-seo-js forward! :)

Hey everyone,

So it's super-clear that I haven't had time to keep this project rolling forward and that's hurting a lot of people who use this project. It's also clear that we have smart, talented people who are interested and have time to take it forward.

So I'm doing the obvious thing: giving you all of them the ability to move it forward, and getting out of your way.

@pjdelport and @pcraciunoiu - I've just added you as contributors to this repo.

I'm happy to have you all decide how and who should carry this forward, and I'm very open to moving it to Jazzband as mentioned in #26.

Please also pass along a pypi username, so I can add one/both of you for releases. Thank you - and everyone for your patience in some too-long delays.

In terms of the current code, I've merged in a few of the biggest PRs, but in digging into testing, I've realized that I have no business testing this, since I don't currently run django-seo-js in any production environments.

Please let me know what else would be helpful in getting people unstuck, and I'm super happy to do it. Thank you all for your patience and understanding. I appreciate you.

prerender not working

I install django-seo-js in my django project, and i can see some request records in "crawl-stats", which user agent is "python-requests/2.2.1 CPython/2.7.6 Linux/3.13.0-32-generic",it seems that it works,but when i test page catch in baidu site platform, it‘s not working,the catch result still contain template tags like " {{ xx.xx }} ".

1

Just installed, but get error "AttributeError: 'WSGIRequest' object has no attribute 'ENABLED'":

Hello. This library looks promising and like a good time saver. I just installed it on my localhost, and I am getting this error.

"AttributeError: 'WSGIRequest' object has no attribute 'ENABLED'" referring to "File "/venv/lib/python2.7/site-packages/django_seo_js/middleware/useragent.py", line 18, in process_request"

Not sure what the context is, but maybe it needs hasattr(request, 'ENABLED') prepended to the if statement? ie:

        if not hasattr(request, 'ENABLED') or not request.ENABLED:
            return

Python 3 Support?

I'm getting the following error, python v 3.4:

from base import SEOBackendBase, RequestsBasedBackend
ImportError: No module named 'base'

Is this due to lack of Python 3 support? A bit confused at this error.

Why is the requests dependency pinned?

Currently, this library pins requests to requests==2.2.1.

Is there a reason for this? If not, this should be loosened to just requests: it's not a good idea for a library to pin versions that might conflict with downstream projects' requirements.

Request to prerender is failed

I get the following error, when trying to test PrerednerIO backends:

Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/django_seo_js/middleware/escaped_fragment.py", line 22, in process_request
     return self.backend.get_response_for_url(url, request)
   File "/usr/local/lib/python3.7/site-packages/django_seo_js/backends/prerender.py", line 36, in get_response_for_url
     assert r.status_code < 500
 AssertionError

and also get the following error, when trying to test self-hosted backends:

 During handling of the above exception, another exception occurred:

 Traceback (most recent call last):
   File "/usr/local/lib/python3.7/site-packages/django_seo_js/middleware/escaped_fragment.py", line 22, in process_request
    return self.backend.get_response_for_url(url, request)
   File "/usr/local/lib/python3.7/site-packages/django_seo_js/backends/prerender.py", line 35, in get_response_for_url
    r = self.session.get(render_url, headers=headers, allow_redirects=False)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
     resp = self.send(prep, **send_kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
     r = adapter.send(request, **kwargs)
   File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 516, in send
     raise ConnectionError(e, request=request)
 requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=3000): Max retries exceeded with url: /http://127.0.0.1:8000/?_escaped_fragment_=127.0.0.1:8000 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7da6835190>: Failed to establish a new connection: [Errno 111] Connection refused'))

My settings for the first case:

# apps
INSTALLED_APPS += ['django_seo_js']

# middlewares
MIDDLEWARE += [
    'django_seo_js.middleware.EscapedFragmentMiddleware',
    'django_seo_js.middleware.UserAgentMiddleware'
]

# prerender specific settings
SEO_JS_PRERENDER_TOKEN = "my prerender token"

# Whether to run the middlewares and update_cache_for_url.
# Useful to set False for unit testing.
SEO_JS_ENABLED = True # Defaults to *not* DEBUG.

# User-agents to render for, if you're using the UserAgentMiddleware
# Defaults to the most popular.  If you have custom needs, pull from the full list:
# http://www.robotstxt.org/db.html
SEO_JS_USER_AGENTS = [
    "Googlebot",
    "Yahoo",
    "bingbot",
    "Baidu",
    "Ask Jeeves",
]

and for the second:

# apps
INSTALLED_APPS += ['django_seo_js']

# middlewares
MIDDLEWARE += [
    'django_seo_js.middleware.EscapedFragmentMiddleware',
    'django_seo_js.middleware.UserAgentMiddleware'
]

# prerender specific settings
SEO_JS_BACKEND = "django_seo_js.backends.PrerenderHosted"
SEO_JS_PRERENDER_URL = "http://127.0.0.1:3000/"  # Note trailing slash.
SEO_JS_PRERENDER_RECACHE_URL = "http://127.0.0.1:3000/recache"

# Whether to run the middlewares and update_cache_for_url.
# Useful to set False for unit testing.
SEO_JS_ENABLED = True # Defaults to *not* DEBUG.

# User-agents to render for, if you're using the UserAgentMiddleware
# Defaults to the most popular.  If you have custom needs, pull from the full list:
# http://www.robotstxt.org/db.html
SEO_JS_USER_AGENTS = [
    "Googlebot",
    "Yahoo",
    "bingbot",
    "Baidu",
    "Ask Jeeves",
]

I use this prerender docker-image for self-hosted: https://github.com/tvanro/prerender-alpine

What am I doing wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.