Comments (10)
Dockerfile
FROM chetan1111/botasaurus:latest
ENV PYTHONUNBUFFERED=1
COPY requirements.txt .
RUN python -m pip install -r requirements.txt
RUN apt-get update && apt-get install -y lsof
RUN mkdir app
WORKDIR /app
COPY . /app
CMD ["python", "run.py", "backend"]
from botasaurus.
Only solution is to upgrade to latest version, with that this error will not occur. Upgrade by
python -m pip install bota botasaurus botasaurus_api botasaurus_driver bota botasaurus-proxy-authentication botasaurus_server --upgrade
from botasaurus.
on the old version, no way?
from botasaurus.
You need to use new version to resolve it.
from botasaurus.
Same problem on the new version)
requirements.txt
cchardet==2.1.7
botasaurus-requests==4.0.16
bota==4.0.62
botasaurus==4.0.34
botasaurus_api==4.0.4
botasaurus_driver==4.0.30
botasaurus-proxy-authentication==1.0.16
botasaurus_server==4.0.23
deprecated==1.2.14
After every request
root@s# ps -A -ostat,pid,ppid | grep -e '[zZ]'
Z 3388440 3388338
Z 3388441 3388338
Z 3388443 3388338
Z 3388445 3388338
Z 3388450 3388338
Z 3388451 3388338
Z 3388452 3388338
Z 3388630 3388338
And with each request, they increase
from botasaurus.
- Code to reproduce it?
- Which os are you using?
from botasaurus.
code
from botasaurus.browser import browser, Driver
from botasaurus.request import request
from botasaurus_driver.user_agent import UserAgent
from botasaurus_driver.window_size import WindowSize
@request
def scrape_heading_task(requests, botasaurus_request: dict):
@browser(
block_images_and_css=True,
user_agent=botasaurus_request.get("user_agent") or UserAgent.RANDOM,
window_size=botasaurus_request.get("window_size") or WindowSize.RANDOM,
max_retry=botasaurus_request.get("max_retry"),
output=None,
add_arguments=["--disable-dev-shm-usage", "--no-sandbox"],
proxy=botasaurus_request.get("proxy") or None,
)
def scrape(driver: Driver, data):
driver.google_get(
link=botasaurus_request.get("url"),
bypass_cloudflare=bool(botasaurus_request.get("bypass_cloudflare")),
wait=botasaurus_request.get("wait"),
)
return {"text": driver.page_html, "cookies": driver.get_cookies()}
try:
return scrape()
except Exception as e:
return {"error": str(e)}
Dockerfile
FROM chetan1111/botasaurus:latest
ENV PYTHONUNBUFFERED=1
COPY requirements.txt .
RUN python -m pip install -r requirements.txt
RUN apt-get update && apt-get install -y lsof xvfb
RUN mkdir app
WORKDIR /app
COPY . /app
CMD ["python", "run.py", "backend"]
OS: Ubuntu 22.04 LTS x86_64
from botasaurus.
- Are you running it in docker or ubuntu,
- Also kindly share a sample call to function
from botasaurus.
running in docker
scrapers.py
import os
from botasaurus_server.server import Server
from src.scrape_heading_task import scrape_heading_task
Server.rate_limit["browser"] = os.getenv("MAX_BROWSERS", 3)
Server.add_scraper(scrape_heading_task)
scrape_heading_task.js
/**
* @typedef {import('../../frontend/node_modules/botasaurus-controls/dist/index').Controls} Controls
*/
/**
* @param {Controls} controls
*/
function getInput(controls) {
controls.link('url', {isRequired: true})
controls.text('user_agent', {isRequired: false})
controls.listOfTexts('window_size', {isRequired: false})
controls.text('proxy', {isRequired: false})
controls.number('max_retry', {isRequired: false, defaultValue: 2})
controls.number('bypass_cloudflare', {isRequired: false, defaultValue: 0})
controls.number('wait', {isRequired: false, defaultValue: 5})
}
call
api = Api(server_url)
data = self.get_data(botasaurus_request)
task = api.create_async_task(
data=data,
scraper_name="scrape_heading_task",
)
result = self.get_task_result(
api,
task.get("id"),
botasaurus_request.timeout,
botasaurus_request.wait,
)
from botasaurus.
This issue occurs only in Docker, to resolve it run command
python -m pip install bota botasaurus botasaurus_api botasaurus_driver bota botasaurus-proxy-authentication botasaurus_server --upgrade
With this the zombie processes will be periodically purged, and won't reach more than 10 at any point.
from botasaurus.
Related Issues (20)
- Depth limit to set how levels deep links to be crawled.
- ERROR: Package 'botasaurus' requires a different Python: 2.7.18 not in '>=3.7' HOT 2
- Saving in json
- Botasaurus can't pass CF HOT 2
- How to use run_cdp_command ?
- EXCEPTION: ModuleNotFoundError
- when running on server getting error: HTTP server ListenAndServe: listen tcp: lookup tcp/▒H▒▒▒: Servname not supported for ai_socktype HOT 1
- Need Update HOT 1
- "Running" message printed
- NameError: name 'browser' is not defined
- NameError: name 'AntiDetectDriver' is not defined HOT 11
- ActionChains request
- websocket - ERROR - [Errno 61] Connection refused - goodbye with FastAPI HOT 1
- detect_and_bypass_cloudflare function does not work with widget hidden in shadow-dom HOT 1
- Need to find all elements not 1 HOT 1
- XlsxWriter
- from botasaurus.ip_utils import find_ip_details ImportError: cannot import name 'find_ip_details' from 'botasaurus.ip_utils'
- Module Import Error
- Usage
- Cannot select iframe inside of an iframe
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from botasaurus.