Coder Social home page Coder Social logo

go-rod / rod Goto Github PK

View Code? Open in Web Editor NEW
4.8K 46.0 315.0 3.69 MB

A Devtools driver for web automation and scraping

Home Page: https://go-rod.github.io

License: MIT License

Go 97.61% HTML 1.47% JavaScript 0.80% CSS 0.01% Dockerfile 0.12%
cdp chrome-headless chrome-devtools chrome-devtools-protocol headless web-scraping automation scraper devtools devtools-protocol

rod's People

Contributors

alexferrari88 avatar alingse avatar andrew-field avatar aurkenb avatar carmel avatar egonelbre avatar fly-playgroud avatar hhhapz avatar infalmo avatar jaekook avatar knowlet avatar kukaki avatar kvii avatar lu4nx avatar lu4p avatar madflow avatar moredure avatar normalpunch avatar oderwat avatar sanfenzuicom avatar t-dynamos avatar tofuliang avatar tyron avatar wings-xue avatar xujinzheng avatar yangjuncode avatar yasarluo avatar youshy avatar ysmood avatar zbysir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rod's Issues

Get Xpath

I'm using this function to get xpath of the element in DOM in Chrome:

function getXPathForElement(element) {
    const idx = (sib, name) => sib 
        ? idx(sib.previousElementSibling, name||sib.localName) + (sib.localName == name)
        : 1;
    const segs = elm => !elm || elm.nodeType !== 1 
        ? ['']
        : elm.id && document.getElementById(elm.id) === elm
            ? [`id("${elm.id}")`]
            : [...segs(elm.parentNode), `${elm.localName.toLowerCase()}[${idx(elm)}]`];
    return segs(element).join('/');
}

Been trying to use page.Eval with Rod but with no luck. Any pointer on how should I proceed?

How get a text of element

I am trying to get text of element via: page.Element(selector).Text(). It returns empty string in my case.
If in browser I use document.getElementById("receipt-result").textContent I will get full text of element.
What method should I use in this case to get similar result?

Getting error while running the binary

I am following the installation steps. While trying to create binary I am getting the below error:-

standard_init_linux.go:211: exec user process caused "no such file or directory"

image

Below is the output of Docker version command
image

I am running it on Ubuntu. I have installed Docker and cloned the repo. Please let me know if I am missing anything.

Clarify how WaitRequestIdle() should be used.

Rod v0.28.0 used.
Hi I am trying to use WaitRequestIdle in following way:

package main

import (
	"fmt"
	"time"

	"github.com/ysmood/rod"
	"github.com/ysmood/rod/lib/launcher"
)

func main() {
	url := launcher.New().
		Headless(false).
		Launch()

	b := rod.New().ControlURL(url).Connect().Timeout(30 * time.Second)
	defer b.Close()

	page := b.Page("https://google.com")
	w := page.WaitRequestIdle()

	page.Navigate("https://wikipedia.org")
	fmt.Println(time.Now())
	w()
	fmt.Println(time.Now())

	page.Navigate("https://gmail.com")
	fmt.Println(time.Now())
	w()
	fmt.Println(time.Now())
}

It seems w() really waits only after first navigation attempt but not on second. If I create second instance before second attempt it will wait twice. But I am not sure that this is intended behaviour.
Am I using it correctly?

Remote monitoring with screenshots from a port for headless mode

When running headless mode, it's better to have a way to see what's going on with the browser.

We can start an HTTP server to serve a debug port to let the user use a browser to watch the screenshots of the headless browser.

The monitor page will have a dropdown list to choose which tab to watch.

how to get cookies ?

hello,

I'm struggling on getting cookies from the page. I've seen on how rod sets the cookie using *rod.Page and see this doc; but I cant manage to pull all the cookie from given url.

Here's the snippet:

func RodGetCookies(p *rod.Page, urls []string) {
	result := p.Call("Network.getCookies", cdp.Object{
		"urls": urls,
	})

	fmt.Printf("%+v", result)
}

...
    page, err := browser.PageE("https://github.com")
	if err != nil {
		panic(err)
	}

	RodGetCookies(page, []string{"*.github.com"})
...

the output:

{"cookies":[]}

thank you

Type helper for low-level cdp request

We don't need full type support, a better way to do it is to add optional type helper to help us type the json objects that chrome uses.

By this optional way, the user can still interact with the latest chrome API even if the type helper is outdated.

The schema to use is on this page: GET /json/protocol/

The way https://github.com/mafredri/cdp or https://github.com/chromedp/chromedp did it is not good enough. Their implementations are tightly coupled with their high-level lib. Therefore their type is hard to be used as a standalone lib when someone only wants to use the type to generate a request JSON payload.

How to use array json for Network.setCookies

Thanks for your program. It's very easy to use.
Now I don't know how to pass cookies in array JSON format to page.call. page.Call("Network.setCookies",cdp.Array{})

arrayJson=`[
    {
        "domain": ".test.com",
        "expirationDate": 158255720,
        "hostOnly": true,
        "httpOnly": true,
        "name": "datr",
        "path": "/",
        "sameSite": "no_restriction",
        "secure": false,
        "session": false,
        "storeId": "0",
        "value": "5798798Q2B-",
        "id": 1
    },
    {
        "domain": ".test.com",
        "expirationDate": 15985720,
        "hostOnly": true,
        "httpOnly": true,
        "name": "bs",
        "path": "/",
        "sameSite": "no_restriction",
        "secure": false,
        "session": false,
        "storeId": "0",
        "value": "8DfXXcNRyf9zIM57nG4Maa1c",
        "id": 2
    }
]`

Launch browser outside docker

Put a launch service inside the docker image ysmood/rod to help launch chrome programmatically via a port.

The service will proxy this port to the real browser devtools port.

Show only iFrame

Good morning, sir,

I would like to make a system that allows the user to solve ReCaptcha himself. I've already taken the iFrame, but I'd like to know if it's possible to display only that (e.g. embed it in an electron app, or just display the captcha alone).

Here is my current code:

frame01 := page.Timeout(3 * time.Minute).ElementX("/html/body/div[3]/div[2]/iframe").Frame()
// Doesn't work
frame01.WindowFullscreen()

Thanks for your help!

Friendly API to hijack requests

Traditionally we can use a dedicated HTTP proxy server to proxy all requests and hijack whatever we want. But when it comes to HTTPs, it will be painful to get around it. This helper will make it totally transparent for the user to deal with any type of request.

The result will be something likes this:

go browser.Hijack("/user/:id", func(ctx rod.ProxyContext) (stop bool) {
    ctx.JSON(my_mock_user)
    return false
})

We will use Fetch domain. For now Fetch.takeResponseBodyAsStream can only handle the response as a stream, also websocket is not supported.

Does rod support custom request headers

I saw a Header() function at launcher, but I do't known how to use it.
I want to add a custom request header x-lpm-country: us in it.
May you add a example for it.
Thank you.

stuck in a `chrome://` page

// clear everything (cookies/caches/localstorage...) of every domains without restarting the browser. 
	u := launcher.New().
		Set("disable-extensions", "true").
		//Set("start-maximized", "true").
		Set("no-sandbox", "true").
		Set("disable-web-security", "true").
		Delete("enable-automation").
		Headless(false).
		Launch()

	browser := rod.New().ControlURL(u).Connect()
	defer browser.Close()

        page := browser.Pages()[0]

	page.Navigate("chrome://settings/privacy")

	wait := page.WaitRequestIdle()
	wait()

        time.Sleep(time.Second)

        // stuck here
	clearBrowsingData := page.Element("#clearBrowsingData")
	clearBrowsingData.Click()

        time.Sleep(time.Second)

	clearBrowsingDataConfirm := page.Element("#clearBrowsingDataConfirm")
	clearBrowsingDataConfirm.Click()

	// page.Screenshot("")

How to use "PageAddScriptToEvaluateOnNewDocument"?

Could you give an example code of "PageAddScriptToEvaluateOnNewDocument"?
I want to change the value of "window.navigator.webdriver"!

The next line will cause automation to stop
page.Eval(() => Object.defineProperties(navigator, 'webdriver', {get: () => undefined}))

Can you add a function to add JS script for all browser's pages/tabs

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
I used this code for fake browser fingerprint ,

proto.PageEnable{}.Call(page)

proto.PageAddScriptToEvaluateOnNewDocument{
	Source: fingerprint,
}.Call(page)

the js code from https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth
I don't known why need proto.PageEnable ,but if I don't write it the proto.PageAddScriptToEvaluateOnNewDocument not works.
proto.PageAddScriptToEvaluateOnNewDocument only works on current page/tab not working on a link click new pages/tabs.

Describe the solution you'd like
A clear and concise description of what you want to happen.

Can you add a function to add JS script for all browser's pages/tabs something like hook browser's page or some network.
Thanks.

page.Call('Page.addScriptToEvaluateOnNewDocument',param) is not working

I want call addScriptToEvaluateOnNewDocument method when i open a new page,But it not working,
`out := page.Call("Page.addScriptToEvaluateOnNewDocument", cdp.Object{

"source" :
"window.navigator.chrome = window.chrome = {...window.chrome, runtime: {},};",
})`

how can i do it like puppeteer

`
const page = await browser.newPage()

await page.evaluateOnNewDocument(
"window.navigator.chrome = window.chrome = {...window.chrome, runtime: {},};"
);
`

HandleAuth, alert not closing

Hello, i want to use proxy with auth, but i can't login.
When i add header in chromium with proxy, after i have an alert with username and password for auth, but if i user your method HandleAuth nothing happens, i need to click submit button to auth.
How can i do it with code?

P.S. Sorry for my English)

Abstraction for global singleton states

To prevent the race condition of using domains, we need a central hub to handle the status of domains.

Events, mouse, keyboard, etc all belong to this class.

Cannot get element inside Iframe.

Cannot get element inside iFrame. When last code line is invoked browser just hangs though i see element in it. When printing FrameID and isFrame I get some ID and true values so I assume that iFrame instance was created. How can I debug it to know the reason?

`url := launcher.New().
Headless(false).
Launch()

browser := rod.New().ControlURL(url).Connect()
defer browser.Close()

page := browser.Page("test_url")
fr := page.Element("#iframe_selector").Frame()

fmt.Println(fr.FrameID)
fmt.Println(fr.IsIframe())

fmt.Println(fr.Element("#element_selector").Describe().Type)`

Can not download chrome at Launch()

Describe the bug
Can not download chrome at Launch()

To Reproduce
Do not set launcher Bin , then it will download chrome.I am from china, the download url will forward taobao.org , but it is not download success.

Expected behavior
opening zip archive for reading: creating reader: zip: not a valid zip file[rod/lib/launcher] Download chromium from: https://npm.taobao.org/mirrors/chromium-browser-snapshots/Win/757680/chrome-win.zip [rod/lib/launcher]
I saw the dictionary https://npm.taobao.org/mirrors/chromium-browser-snapshots/Win/, does not have the version 757680,the latest is 737027.

Rod Version: v0.26.1

OS: Windows

ScreenshotFullPage doesn't work

Describe the bug

Error when calling ScreenshotFullPage or ScreenshotE with FullPage=true

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1718b81]

goroutine 1 [running]:
github.com/ysmood/rod.(*Page).ScreenshotE(0xc001e16090, 0x18ce801, 0xc0004c9c18, 0x0, 0x0, 0x0, 0x0, 0x0)
	/Users/arturkondas/go/pkg/mod/github.com/ysmood/[email protected]/page.go:293 +0x281

To Reproduce

This is the current code:

body, err := page.ScreenshotE(true, &proto.PageCaptureScreenshot{})
if err != nil {
	log.Fatal(err)
}

Expected behavior
The action shouldn't fail.

Rod Version: v0.35.0

OS: Mac

Add method to clear field

May I propose an improvement? Can method to clear field be added?
Or if there is way to do it now - please tell me.

How to drag with rod?

I tried to solve a drag captcha with rod:

The captcha will be loaded in an iframe, I can click buttons in the iframe but cannot drag.

	el:=page.Element(`#iframe`)
	iframe := el.WaitVisible().Frame()

        drag := frame.Element(`#drag`).WaitVisible()

        // copy from the Click 
	box, err := drag.ScrollIntoView().BoxE()
	if err != nil {
		panic(err)
	}

	x := box.Left + box.Width/2
	y := box.Top + box.Height/2

	err = iframe.Mouse.MoveE(x, y, 1)
	if err != nil {
		panic(err)
	}

       iframe.Mouse.Down(proto.InputMouseButtonLeft)
       iframe.Mouse.Move(1, 0)
       iframe.Mouse.Move(1, 0)
       iframe.Mouse.Move(2, 0)
       ...
       iframe.Mouse.Up(proto.InputMouseButtonLeft)

       

It seemed that only the Down and Up success, but the Move do nothing with no error.

Any idea? Thanks!

Support query elements inside a closed shadow DOM

<html>
    <body>
        <p>outside</p>        

        <div id='s'></div>
    </body>
    <script>
        let s = document.querySelector('div').attachShadow({mode: 'closed'})
        let p = document.createElement('p')
        p.innerText = 'inside'
        s.appendChild(p)
    </script>
</html>

Full page screenshot?

Is there a way in which we could get a screenshot of the whole page full size, not a screenshot of the window size?

Some times Chrome will magically freeze to response rpc

Describe the bug
Some times Chrome will magically freeze to response rpc. It might be related to this ticket. Not sure if we can get around it in Rod.

To Reproduce
Not yet be able to stably reproduce it.

Expected behavior
Chrome should crash or always response a rpc.

Using TorBrowser

The FAQ states:

Q: Does it support other browsers like Firefox or Edge

Rod should work with any browser that supports Chrome DevTools Protocol. For now, Firefox is supporting this protocol, and Edge will adopt chromium as their backend, so it seems like most major browsers will support it in the future except for Safari.

So, I wonder how to make Rod use other browsers such as Firefox? Or, Tor Browser (based on Firefox).

Why I need this? I need different IPs for each instance. It is possible to run multiple Tor Browsers and set the proxy but that requires extra memory (Rod chrome + Tor Browser). So, it would be nice to directly run Rod on top of Tor Browser.

How should I use proxy with username and password

I checked that chrome doesn't support Socks5 proxy with auth, but I use http proxy it is still not working.

proxyString:="http://username:password@ip:port"
Set("proxy-server", proxyString) 

If I remove username and pasword, the page show a alert to input username and password,is there any simple method to use proxy with auth.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.