Coder Social home page Coder Social logo

Spider framework about cfadmin HOT 11 CLOSED

cfadmin-cn avatar cfadmin-cn commented on June 1, 2024
Spider framework

from cfadmin.

Comments (11)

CandyMi avatar CandyMi commented on June 1, 2024

Hi, @gcclua
You can try using httpc:multi_request like this:

local Log = require("logging"):new()
local hc = http:new {}
local ok, response_array = hc:multi_request {
  [1] = {
    domain = "http/https://yourdomain/path",
    method = "get/post/json/file",
    headers = {{key1 = value1, key2 = value2, ...}},
    files/body = "your body"
  },
  [2] = { },
  [...] = { }
}
Log:DEBUG(ok, response_array)

This is a concurrent request method that will open multiple coroutines to concurrently request array items, and the return value is also in the array.

All usage examples can be found in script/test_httpc.lua. :)

from cfadmin.

gcclua avatar gcclua commented on June 1, 2024

thanks , I will try it , and i have a new idea , copy multi instance of cf , and write logic circle inside main.lua

while true do 
    -- get spider new task from the center cf 
    local hc = http:new {} 
    local SpiderTaskList = hc:get(CenterUrl) 
    local SpiderTaskSet = ParseAndFetchURLFrom(SpiderTaskList) 
    if IsEmpty(SpiderTaskSet) == false then 
        -- do each task
        for k,v in pairs(SpiderTaskSet) do 
            local code, body = hc:get(v) 
            print(code, body) 
        end 
    else 
        Sleep(2) 
    end 
end

We will get distribute spider ? but all logic will inside core_sys_init, before core_sys_run,
potential problems?

from cfadmin.

CandyMi avatar CandyMi commented on June 1, 2024

Good idea, there is no problem with this use.

You may need to know that hc.class should close the connection when returning nil. : )

from cfadmin.

gcclua avatar gcclua commented on June 1, 2024

OK, thank you

from cfadmin.

CandyMi avatar CandyMi commented on June 1, 2024

@gcclua
Add:
hc.class needs to be called actively when it is used: hc:close() method, otherwise it may cause fd to leak(Even if you actively call collectgarbage , it will not release fd).
Don't forget it. : )

from cfadmin.

gcclua avatar gcclua commented on June 1, 2024

@gcclua
Add:
hc.class needs to be called actively when it is used: hc:close() method, otherwise it may cause fd to leak(Even if you actively call collectgarbage , it will not release fd).
Don't forget it. : )

Way1:

while true do 
    -- get spider new task from the center cf 
    local hc = http:new {} 
    local SpiderTaskList = hc:get(CenterUrl) 
    local SpiderTaskSet = ParseAndFetchURLFrom(SpiderTaskList) 
    if IsEmpty(SpiderTaskSet) == false then 
        -- do each task
        for k,v in pairs(SpiderTaskSet) do 
            local code, body = hc:get(v) 
            print(code, body) 
        end 
    else 
        Sleep(2) 
    end 
    hc:close() --Close fd 
end

Way2:

while true do 
    -- get spider new task from the center cf 
    local hc = http:new {} 
    local SpiderTaskList = hc:get(CenterUrl) 
    local SpiderTaskSet = ParseAndFetchURLFrom(SpiderTaskList) 
    if IsEmpty(SpiderTaskSet) == false then 
        -- do each task
        for k,v in pairs(SpiderTaskSet) do 
            local hc = http:new {} 
            local code, body = hc:get(v) 
            print(code, body) 
            hc:close() --Close fd 
        end 
    else 
        Sleep(2) 
    end 
    hc:close() --Close fd 
end

which is correct way?

from cfadmin.

CandyMi avatar CandyMi commented on June 1, 2024

According to how your code is written, you can write it like this:

local httpc = require "httpc"

local http_cls = require "httpc.class"

while true do 
    -- get spider new task from the center cf 
    local SpiderTaskList = httpc.get(CenterUrl) 
    local SpiderTaskSet = ParseAndFetchURLFrom(SpiderTaskList) 
    if noEmpty(SpiderTaskSet) then 
      local hc = http_cls:new {}  -- new httpc class
      for k,v in pairs(SpiderTaskSet) do 
          local hc = http_cls:new {} 
          local code, body = hc:get(v) 
          print(code, body) 
      end
      hc:close() -- close httpc 
    else 
      Sleep(2) 
    end 
end

And the internal domain must be guaranteed to be the same.

But using multi_request doesn't have this limitation, and you don't have to actively call the hc:close method. like this:

local httpc = require "httpc"

while true do 
  -- get spider new task from the center cf 
  local SpiderTaskList = httpc.get(CenterUrl) 
  local SpiderTaskSet = ParseAndFetchURLFrom(SpiderTaskList) 
  if noEmpty(SpiderTaskSet) then
    local requests = {}
    for k,v in pairs(SpiderTaskSet) do
      requests[#requests + 1] = { domain = v }
    end
    local ok, response_array = httpc.multi_request(requests)
    print(ok, response_array)
  else
    sleep(2)
  end
end

Because the internal will automatically close all fd. (Lua gc will automatically recycle the object) after each request is completed.

This is the difference between them.

from cfadmin.

gcclua avatar gcclua commented on June 1, 2024

great, httpc.multi_request is simple to use, I want the result will return one by one and with timeout limit , timeout as the task fail flag, in one thread so difficult to implement, has cf some help ??

from cfadmin.

CandyMi avatar CandyMi commented on June 1, 2024

httpc.multi_request has a timeout parameter that controls the maximum timeout for each request to time out. like this:

httpc.multi_request {
  {
    domain = "your domain",
    timeout = 15
  }
}

You can refer to this.

from cfadmin.

gcclua avatar gcclua commented on June 1, 2024

OK, I will use httpc.multi_request

from cfadmin.

CandyMi avatar CandyMi commented on June 1, 2024

@gcclua
Very happy to help you! : )
I suggest you use the latest version I just released, which fixes some bugs in the httpc library.

from cfadmin.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.