Coder Social home page Coder Social logo

rct567 / domquery Goto Github PK

View Code? Open in Web Editor NEW
130.0 6.0 39.0 304 KB

PHP library for easy 'jQuery like' DOM traversing and manipulation.

License: MIT License

PHP 98.93% Dockerfile 0.40% Shell 0.66%
domquery domdocument jquery php-library html xpath-query xpath dom htmlparser libxml

domquery's Introduction

DomQuery

DomQuery is a PHP library that allows you to easily traverse and modify the DOM (HTML/XML). As a library it aims to provide 'jQuery like' access to the PHP DOMDocument class (http://php.net/manual/en/book.dom.php).

Installation

Install the latest version with

$ composer require rct567/dom-query

Basic Usage

Read attributes and properties:

use Rct567\DomQuery\DomQuery;

$dom = new DomQuery('<div><h1 class="title">Hello</h1></div>');

echo $dom->find('h1')->text(); // output: Hello
echo $dom->find('div')->prop('outerHTML'); // output: <div><h1 class="title">Hello</h1></div>
echo $dom->find('div')->html(); // output: <h1 class="title">Hello</h1>
echo $dom->find('div > h1')->class; // output: title
echo $dom->find('div > h1')->attr('class'); // output: title
echo $dom->find('div > h1')->prop('tagName'); // output: h1
echo $dom->find('div')->children('h1')->prop('tagName'); // output: h1
echo (string) $dom->find('div > h1'); // output: <h1 class="title">Hello</h1>
echo count($dom->find('div, h1')); // output: 2

Traversing nodes (result set):

use Rct567\DomQuery\DomQuery;

$dom = new DomQuery('<a>1</a> <a>2</a> <a>3</a>');
$links = $dom->children('a');

foreach($links as $elm) {
    echo $elm->text(); // output 123
}

echo $links[0]->text(); // output 1
echo $links->last()->text(); // output 3
echo $links->first()->next()->text(); // output 2
echo $links->last()->prev()->text(); // output 2
echo $links->get(0)->textContent; // output 1
echo $links->get(-1)->textContent; // output 3

Factory method (create instance alternative):

use Rct567\DomQuery\DomQuery;

DomQuery::create('<a title="hello"></a>')->attr('title') // hello

Jquery methods available

Traversing > Tree Traversal

  • .find( selector )
  • .children( [selector] )
  • .parent( [selector] )
  • .closest( [selector] )
  • .next( [selector] )
  • .prev( [selector] )
  • .nextAll( [selector] )
  • .prevAll( [selector] )
  • .nextUntil( [selector] )
  • .prevUntil( [selector] )
  • .siblings( [selector] )

Traversing > Miscellaneous Traversing

  • .contents() get children including text nodes
  • .add( selector, [context] ) new result with added elements that match selector
  • .addBack()

Traversing > Filtering

  • .is( selector )
  • .filter ( selector ) reduce to those that match the selector
  • .not( selector ) remove elements from the set of matched elements
  • .has( selector ) reduce to those that have a descendant that matches the selector
  • .first( [selector] )
  • .last( [selector] )
  • .slice( [offset] [, length]) like array_slice in php, not js/jquery
  • .eq( index )
  • .map( callable(elm,i) )

* [selector] can be a css selector or an instance of DomQuery|DOMNodeList|DOMNode

Manipulation > DOM Insertion & removal

  • .text( [text] )
  • .html( [html_string] )
  • .append( [content],... )
  • .prepend( [content],... )
  • .after( [content],... )
  • .before( [content],... )
  • .appendTo( [target] )
  • .prependTo( [target] )
  • .replaceWith( [content] )
  • .wrap( [content] )
  • .wrapAll( [content] )
  • .wrapInner( [content] )
  • .remove( [selector] )

* [content] can be html or an instance of DomQuery|DOMNodeList|DOMNode

Attributes | Manipulation

  • .attr( name [, val] )
  • .prop( name [, val] )
  • .css( name [, val] )
  • .removeAttr( name )
  • .addClass( name )
  • .hasClass( name )
  • .toggleClass ( name )
  • .removeClass( [name] )

* addClass, removeClass, toggleClass and removeAttr also accepts an array or space-separated names

Miscellaneous > DOM Element Methods | Traversing | Storage

  • .get( index )
  • .each ( callable(elm,i) )
  • .data ( key [, val] )
  • .removeData ( [name] )
  • .index ( [selector] )
  • .toArray()
  • .clone()

Supported selectors

  • .class
  • #foo
  • parent > child
  • foo, bar multiple selectors
  • prev + next elements matching "next" that are immediately preceded by a sibling "prev"
  • prev ~ siblings elements matching "siblings" that are preceded by "prev"
  • * all selector
  • [name="foo"] attribute value equal foo
  • [name*="foo"] attribute value contains foo
  • [name~="foo"] attribute value contains word foo
  • [name^="foo"] attribute value starts with foo
  • [name$="foo"] attribute value ends with foo
  • [name|="foo"] attribute value equal to foo, or starting foo followed by a hyphen (-)

Pseudo selectors

  • :empty
  • :even
  • :odd
  • :first-child
  • :last-child
  • :only-child
  • :nth-child(n)
  • :parent elements that have at least one child node
  • :first
  • :last
  • :header selects h1, h2, h3 etc.
  • :not(foo) elements that do not match selector foo
  • :has(foo) elements containing at least one element that matches foo selector
  • :contains(foo) elements that contain text foo
  • :root element that is the root of the document

Other (non jQuery) methods

  • findOrFail( selector ) find descendants of each element in the current set of matched elements, or throw an exception
  • loadContent(content, encoding='UTF-8') load html/xml content
  • xpath(xpath_query) Use xpath to find descendants of each element in the current set of matched elements
  • getOuterHtml() get resulting html describing all the elements (same as (string) $dom, or $elm->prop('outerHTML'))

XML support

  • XML content will automatically be loaded 'as XML' if a XML declaration is found (property xml_mode will be set to true)
  • This in turn will also make saving (rendering) happen 'as XML'. You can set property xml_mode to false to prevent this.
  • To prevent content with a XML declaration loading 'as XML' you can set property xml_mode to false and then use the loadContent($content) method.
  • Namespaces are automatically registered (no need to do it manually)

Escaping meta chars in selector to find elements with namespace:

$dom->find('namespace\\:h1')->text();

About

Requirements

  • Works with PHP 7.2 or above (try v0.8 for older PHP versions)
  • Requires libxml PHP extension (enabled by default)

Inspiration/acknowledgements

domquery's People

Contributors

antalaron avatar jago86 avatar joshua-graham-adelphi avatar rct567 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

domquery's Issues

PHP 8 Support

Are you able to release a version that supports PHP 8

getAttribute on src of img returns base64 (not your fault at all!)

love this library!

46 stars? what a joke. This is the best shit i've come across in a while

following question:

if I do

/** @var \DOMElement $el */
echo $el->getAttribute('src');

I get:


The HTML is though:

<img src="./mysystem/pictures for the system/AMV/image3.jpg">
what I want?

I want to get

./mysystem/pictures for the system/AMV/image3.jpg returned

$element->each() -> create new DomQuery -> children() -> wrong results.

use Rct567\DomQuery\DomQuery;
$dom = new DomQuery('<div><p class="myclass"><span>My words</span></p></div>');

var_dump((string) $dom->find('p')->children());
// Dumps: '<span>My words</span>' - CORRECT.

$dom->find('p')->each(function($node) use ($dom) {
  var_dump((string) DomQuery::create($node)->children());
  // Dumps: '<div><p class="myclass"><span>My words</span></p></div>' - WRONG.
 
  // Workaround.
  var_dump((string) $dom->find($node)->children());
  // Dumps: '<span>My words</span>' - CORRECT.
});

Select Class Error

im try select Class But Get Error My Code:


require '../../Phper/Slcquery/vendor/autoload.php';
use Rct567\DomQuery\DomQuery;

$resp = "<div class='Test'> <a> Hello World </a> </div>";
 
    $dom = new DomQuery("<div>".$resp."</div>");

     $links = $dom->find('.Test');

     echo $links->html(); 
 

Fatal error: Uncaught Exception: Expression //.Test is malformed. in D:\Xampp\htdocs

Latest version on Composer

This is great now I have the latest Git clone - but spent hours trying to get things to work before eventually seeing I was using an old version after installing with Composer

Exception - Class 'DomQuery' not found

Hi,
I have installed this using composer like every other project but on the first line itself I got this error,

Exception - Class 'DomQuery' not found

I guess some namespace needs to be defined ?

Thanks,

Creation of dynamic property Rct567\DomQuery\DomQuery::$dom_xpath is deprecated

Creation of dynamic property Rct567\DomQuery\DomQuery::$dom_xpath is deprecated in vendor/rct567/dom-query/src/Rct567/DomQuery/DomQueryNodes.php on line 171.

Hello, thanks for the library, I got the error when run php 8.2, dynamic property is deprecated (https://php.watch/versions/8.2/dynamic-properties-deprecated).

To trigger the error:

$dom = new DomQuery('<div><h1 class="title">Hello</h1></div>');
$dom->find('randomtag')->html()

Composer

Greetings,

Packagist/Composer list your library as "Abandoned".

See Packagist.

And the latest version composer installs is 0.8 (OLD).

Can you please update your package on composer/packagist?

Also, in your composer.json file the PHP requirements should use a double bar ( ie. || ) to represent OR. The single bar was deprecated many years ago.

Regards,
Wyk

not method not working well

Hello,

the method not ist not working, if no node ist matching the selector.

For Example:

$html2 = "<div>".PHP_EOL;
$html2 .= "  <div>".PHP_EOL;
$html2 .= "  </div>".PHP_EOL;
$html2 .= "</div>".PHP_EOL;

$html3 = "<div>".PHP_EOL;
$html3 .= "</div>".PHP_EOL;

$testDom2->find('div')->not('div div')->append("<span>test</span>");
$testDom3->find('div')->not('div div')->append("<span>test</span>");

this will result in:

<div>
  <div>
  </div>
<span>test</span>
</div>


<div>
</div

but in my opinion the right result would be:

<div>
  <div>
  </div>
<span>test</span>
</div>


<div>
<span>test</span>
</div

a possible fix could be to add all nodes to result list, if no one have to be excluded:

public function not($selector)
   //...
   if ($selection->length > 0) {
      //...
   } else {
       foreach ($this->nodes as $node) {
           $result->addDomNode($node);
        }
   }

Odd new issue: "Uncaught Exception"

I've been using DomQuery for some time, and recently (I'm not sure how recent) this exception reared its head. It looks like it has to do with CSSS selector to Xpath conversion. The original selector was: "ul.specifications li:nth-child(1)"

It seems to happen on every text I submit as a CSS selector.

PHP Fatal error: Uncaught Exception: Expression //ul[contains(concat(' ', normalize-space(@Class), ' '), ' specifications ')]/li:nth-child(1) is malformed. in /var/www/ig/common/ext/vendor/rct567/dom-query/src/Rct567/DomQuery/DomQueryNodes.php:661

Thank you!

replaceWith

replaceWith work with DomNode but not work with DomQuery element

Return wrong html

I have html with script inline:

<script>
{
    const btn = document.querySelector("#test3");
    btn.addEventListener("click", e => {
        document.querySelector("#test3div").innerHTML = `
            <div class="module">
                <h2>lorem ipsum</h2>
                <p>lorem <strong>lorem </strong></p>
                <button class="button">klik</button>
            </div>
        `;
    });
}
</script>

I use your class:

function generateContentTable($content) {
 $dom = new DomQuery($content);
return $dom->find('html')->html();
}

where $content is all html. This generate wrong html:

<script>
{
    const btn = document.querySelector("#test3");
    btn.addEventListener("click", e => {
        document.querySelector("#test3div").innerHTML = `
            <div class="module">
                <h2>lorem ipsum
                <p>lorem <strong>lorem </strong>
                <button class="button">klik
            </script>
</div>
        `;
    });
}

UTF-8 Encoding not applied if content contains inline svg graphics

when scraping a website that contains inlined svg graphics, the loadContent() function fails to apply the correct encoding as the <?xml version="1.0" encoding="UTF-8"?> of the inlined graphic is preventing adding of encoding header in

if (!$this->xml_mode && $encoding && stripos($content, '<?xml') === false) {

a quick fix would be to just rely on the already set $this->xml_print_pi property:

        if (!$this->xml_print_pi && $encoding) {
            $content = '<?xml encoding="'.$encoding.'">'.$content; // add pi node to make libxml use the correct encoding
            $xml_pi_node_added = true;
        }

->each returns DOMElement instead of DomQuery

is that on purpose?

e.g. I cannot do

$dq->find('select[name="myname"]')->each(function ($el, $i){
            /** @var \DOMElement $el */
            $linktag = $el->parent(); //meeep parent doesn't exist on DOMElement

});

Support PHP 5.6?

Your works are awesome. Thanks!

One thing i want to ask.

Could you make DomQuery run on PHP 5.6?

I use it to build a joomla extension, and users are using php 5.6 alot.
I can make a PR if you want, just a few syntax error.

white space causes find to malfunction

problem:

$this->dq->find('[data-id="e1eaziw"]')->eq(1);

works fine, returns 2 instance of that data-id attribute

$this->dq->find('[data-id="e1eaziw "]')->eq(1);

will return first instance.

looks like this bug is generally around...

AppendTo doesn't work properly

$html = new DomQuery('<div class="container">');
$div = DomQuery::create('<div id="el1">')->appendTo($html);
DomQuery::create('<div id="el2">')->appendTo($div);
echo $html;

Expected:

<div class="container"><div id="el1"><div id="el2"></div></div></div>

Got:

<div class="container"><div id="el1"></div><div id="el2"></div></div>

JS function works as expected:

var html = $('<div class="container">');
var div = $('<div id="el1">').appendTo(html);
$('<div id="el2">').appendTo(div);
console.log(html[0].outerHTML);

Fatal error when selector matches a php function name

in the html I needed to query for a <header> element. this results in a fatal exception Fatal error: Uncaught TypeError: header(): Argument #1 ($header) must be of type string, DOMElement given

in filter(), not() and is() the passed selector is checked for is_callable($selector). as there exists a native php function header() the return value of this check is true and as such breaks the code.

probably it would make sense to guard against this by making sure the selector is not a string before checking for is_callable: if (!is_string($selector) && \is_callable($selector)) - this will still alow to pass an actual function or closure while preventing matches with existing builtin functions.

Question

Hi!!!! How Can I parse HTML file wih PHP code inside?
Is possible do this>
$dom = new DomQuery("<div><h1 class="title"><?php echo "Hello";?></h1></div>");
Thanks

clone() does not work with is()

use Rct567\DomQuery\DomQuery;
$dom = new DomQuery('<div><p class="myclass">My words</p></div>');
$dom_clone = $dom->clone();
$is_has_the_class = $dom_clone->find('.myclass')->first()->is('.myclass'));
// Expected: TRUE, but is FALSE

$dom = new DomQuery('<div><p class="myclass">My words</p></div>');
$dom_clone = (string) $dom;
$is_has_the_class = $dom_clone->find('.myclass')->first()->is('.myclass'));
// Expected: TRUE, correct.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.