Coder Social home page Coder Social logo

zombie processes about botasaurus HOT 10 CLOSED

Kaiden0001 avatar Kaiden0001 commented on July 28, 2024
zombie processes

from botasaurus.

Comments (10)

Kaiden0001 avatar Kaiden0001 commented on July 28, 2024

Dockerfile

FROM chetan1111/botasaurus:latest

ENV PYTHONUNBUFFERED=1

COPY requirements.txt .

RUN python -m pip install -r requirements.txt
RUN apt-get update && apt-get install -y lsof

RUN mkdir app
WORKDIR /app
COPY . /app

CMD ["python", "run.py", "backend"]

from botasaurus.

Chetan11-dev avatar Chetan11-dev commented on July 28, 2024

Only solution is to upgrade to latest version, with that this error will not occur. Upgrade by
python -m pip install bota botasaurus botasaurus_api botasaurus_driver bota botasaurus-proxy-authentication botasaurus_server --upgrade

from botasaurus.

Kaiden0001 avatar Kaiden0001 commented on July 28, 2024

on the old version, no way?

from botasaurus.

Chetan11-dev avatar Chetan11-dev commented on July 28, 2024

You need to use new version to resolve it.

from botasaurus.

Kaiden0001 avatar Kaiden0001 commented on July 28, 2024

Same problem on the new version)

requirements.txt

cchardet==2.1.7
botasaurus-requests==4.0.16
bota==4.0.62
botasaurus==4.0.34
botasaurus_api==4.0.4
botasaurus_driver==4.0.30
botasaurus-proxy-authentication==1.0.16
botasaurus_server==4.0.23
deprecated==1.2.14

After every request

root@s# ps -A -ostat,pid,ppid | grep -e '[zZ]'
Z    3388440 3388338
Z    3388441 3388338
Z    3388443 3388338
Z    3388445 3388338
Z    3388450 3388338
Z    3388451 3388338
Z    3388452 3388338
Z    3388630 3388338

And with each request, they increase

from botasaurus.

Chetan11-dev avatar Chetan11-dev commented on July 28, 2024
  • Code to reproduce it?
  • Which os are you using?

from botasaurus.

Kaiden0001 avatar Kaiden0001 commented on July 28, 2024

code

from botasaurus.browser import browser, Driver
from botasaurus.request import request
from botasaurus_driver.user_agent import UserAgent
from botasaurus_driver.window_size import WindowSize


@request
def scrape_heading_task(requests, botasaurus_request: dict):
    @browser(
        block_images_and_css=True,
        user_agent=botasaurus_request.get("user_agent") or UserAgent.RANDOM,
        window_size=botasaurus_request.get("window_size") or WindowSize.RANDOM,
        max_retry=botasaurus_request.get("max_retry"),
        output=None,
        add_arguments=["--disable-dev-shm-usage", "--no-sandbox"],
        proxy=botasaurus_request.get("proxy") or None,
    )
    def scrape(driver: Driver, data):
        driver.google_get(
            link=botasaurus_request.get("url"),
            bypass_cloudflare=bool(botasaurus_request.get("bypass_cloudflare")),
            wait=botasaurus_request.get("wait"),
        )
        return {"text": driver.page_html, "cookies": driver.get_cookies()}

    try:
        return scrape()
    except Exception as e:
        return {"error": str(e)}

Dockerfile

FROM chetan1111/botasaurus:latest

ENV PYTHONUNBUFFERED=1

COPY requirements.txt .

RUN python -m pip install -r requirements.txt
RUN apt-get update && apt-get install -y lsof xvfb

RUN mkdir app
WORKDIR /app
COPY . /app

CMD ["python", "run.py", "backend"]

OS: Ubuntu 22.04 LTS x86_64

from botasaurus.

Chetan11-dev avatar Chetan11-dev commented on July 28, 2024
  • Are you running it in docker or ubuntu,
  • Also kindly share a sample call to function

from botasaurus.

Kaiden0001 avatar Kaiden0001 commented on July 28, 2024

running in docker

scrapers.py

import os

from botasaurus_server.server import Server
from src.scrape_heading_task import scrape_heading_task

Server.rate_limit["browser"] = os.getenv("MAX_BROWSERS", 3)
Server.add_scraper(scrape_heading_task)

scrape_heading_task.js

/**
 * @typedef {import('../../frontend/node_modules/botasaurus-controls/dist/index').Controls} Controls
 */

/**
 * @param {Controls} controls
 */
function getInput(controls) {
    controls.link('url', {isRequired: true})
    controls.text('user_agent', {isRequired: false})
    controls.listOfTexts('window_size', {isRequired: false})
    controls.text('proxy', {isRequired: false})
    controls.number('max_retry', {isRequired: false, defaultValue: 2})
    controls.number('bypass_cloudflare', {isRequired: false, defaultValue: 0})
    controls.number('wait', {isRequired: false, defaultValue: 5})
}

call

        api = Api(server_url)

        data = self.get_data(botasaurus_request)

        task = api.create_async_task(
            data=data,
            scraper_name="scrape_heading_task",
        )
        result = self.get_task_result(
            api,
            task.get("id"),
            botasaurus_request.timeout,
            botasaurus_request.wait,
        )

from botasaurus.

Chetan11-dev avatar Chetan11-dev commented on July 28, 2024

This issue occurs only in Docker, to resolve it run command

python -m pip install bota botasaurus botasaurus_api botasaurus_driver bota botasaurus-proxy-authentication botasaurus_server --upgrade

With this the zombie processes will be periodically purged, and won't reach more than 10 at any point.

from botasaurus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.