Comments (4)
Also, this header quite reliably causes blocking by Zillow (PerimeterX). This code:
import { gotScraping } from 'got-scraping';
console.log(
await gotScraping({
headers: { 'Content-Type': 'application/json', 'sec-ch-ua': '' },
body: '{"searchQueryState":{"isMapVisible":true,"filterState":{"sortSelection":{"value":"globalrelevanceex"},"isAllHomes":{"value":true}},"mapBounds":{"north":36.18115,"east":-86.666132,"south":35.807142,"west":-86.891423},"isListVisible":true,"mapZoom":12,"regionSelection":[{"regionId":72192,"regionType":7}],"pagination":{}},"wants":{"cat1":["listResults","mapResults"],"cat2":["total"]},"requestId":1,"isDebugRequest":false}',
url: 'https://www.zillow.com/async-create-search-page-state',
method: 'PUT',
}),
);
quite reliably works, but if you remove the forced sec-ch-ua=""
, you get almost always blocked
from fingerprint-suite.
huh, the code above doesn't seem to work either anymore, is PerimeterX learning our fingerprints? (for context, I pushed a fix for ZIP Search scraper that used the fix above, started a run of ~3000 ZIP codes, which was producing results for a while, but now everything is 403 again)
this still works though:
curl -i -X PUT "https://www.zillow.com/async-create-search-page-state" \
--data '{"searchQueryState":{"isMapVisible":true,"filterState":{"sortSelection":{"value":"globalrelevanceex"},"isAllHomes":{"value":true}},"mapBounds":{"north":36.18115,"east":-86.666132,"south":35.807142,"west":-86.891423},"isListVisible":true,"mapZoom":12,"regionSelection":[{"regionId":72192,"regionType":7}],"pagination":{}},"wants":{"cat1":["listResults","mapResults"],"cat2":["total"]},"requestId":1,"isDebugRequest":false}' \
-H "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:122.0) Gecko/20100102 Firefox/122.0" \
-H "Accept: */*" \
-H "Content-Type: application/json"
from fingerprint-suite.
Looks like the value is supposed to be random:
https://stackoverflow.com/questions/64413275/why-does-chrome-use-sec-ch-ua-not-abrandv-99
Maybe we should really just strip that from our data (or maybe just the generated fingerprints).
from fingerprint-suite.
It seems to me that this issue has forked into two different things:
-
The "weird" user agent client hint: This is expected behavior (see Martin's linked post, but also e.g. this draft here). The user agent string is purposefully random to make servers not depend on the actual value of the ua string too closely. This originally stems from the GREASE principle in TLS.
- I'm still standing on my hill - we shouldn't manipulate this, as that would differentiate us from the actual browsers.
-
Zillow blocking
got-scraping
: it blocked my Chrome (andcurl-impersonate
impersonating Chrome, too). Passed with Firefox (andcurl-impersonate
impersonating Firefox, too). Blockedgot-scraping
with (probably) Chrome fingerprint. Passed with @mvolfik 's curl with Firefox UA.
Guess what happens when I run got-scraping
with this config:
await gotScraping({
headers: { 'Content-Type': 'application/json' },
body: '{"searchQueryState":{"isMapVisible":true,"filterState":{"sortSelection":{"value":"globalrelevanceex"},"isAllHomes":{"value":true}},"mapBounds":{"north":36.18115,"east":-86.666132,"south":35.807142,"west":-86.891423},"isListVisible":true,"mapZoom":12,"regionSelection":[{"regionId":72192,"regionType":7}],"pagination":{}},"wants":{"cat1":["listResults","mapResults"],"cat2":["total"]},"requestId":1,"isDebugRequest":false}',
url: 'https://www.zillow.com/async-create-search-page-state',
method: 'PUT',
headerGeneratorOptions: {
browsers: ['firefox'],
devices: ['mobile'], // originally not needed, but I had better probability of not getting blocked with this
}
})
If I had to guess the root of this, my bet would be on a skewed prior distribution between Chrome and Firefox fingerprints in the PerimeterX database - they just have more Chrome examples, so they can be more specific with the checks. And there, we're still losing with the non-genuine TLS stack etc., see my message from the other thread below:
TLDR: I understand that defeating the antiblocking scripts can frustrating, but looking into an automatically generated JSON file and trying to point out "weird" looking data is not the way forward. Creating confirmation bias without testing things properly is not the way forward either.
Let's be systematic about this, base our decisions on actual specifications and data, experiment, take notes, and see what works best.
from fingerprint-suite.
Related Issues (20)
- Headers customization HOT 2
- Opera User Agent Missing. HOT 1
- Prompt browser is not safe, how to solve this problem? HOT 1
- Invalid filename after Electron build HOT 1
- Random useragent HOT 1
- Invalid Filename Error on Instantiating `FingerprintGenerator` in AWS Lambda HOT 8
- Bug in platform generation `"platform": "Win32"` HOT 3
- Invalid Filename Error on Instantiating FingerprintGenerator in AWS Lambda
- Antibot system HOT 1
- Relaxing the constraints & Strict mode HOT 1
- Fingerprint of my browser to fingerprint-suite HOT 1
- Nowsecure.nl does not pass HOT 4
- Bug fonts in fingerprint generator HOT 1
- navigator.userAgentData.getHighEntropyValues() doesn't return the values of base hints HOT 1
- Browser hangs when navigating on shopee.com.my when overriding codecs HOT 4
- Please add documentation
- navigator.mediaDevices maybe nullable HOT 1
- What are all the "fingerprintOptions" options of newInjectedPage() from "fingerprint-injector" HOT 1
- No Docs/API reference? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fingerprint-suite.