Coder Social home page Coder Social logo

scotteh / php-dom-wrapper Goto Github PK

View Code? Open in Web Editor NEW
139.0 10.0 33.0 211 KB

Simple DOM wrapper library to manipulate and traverse HTML documents similar to jQuery

License: BSD 3-Clause "New" or "Revised" License

PHP 100.00%
dom-wrapper-library traversal php php-dom-wrapper traverse-html-documents manipulation dom html parser autoloader

php-dom-wrapper's People

Contributors

dependabot-preview[bot] avatar dependabot-support avatar iamyukihiro avatar karneds avatar matthijskooijman avatar monsieurv avatar peter279k avatar rafwell avatar scotteh avatar shaneiseminger avatar wadjei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

php-dom-wrapper's Issues

Your Tests are lying ;)

Hi,

while researching active uses of PHPUnit's assertEqualXMLStructure, i stumbled over the following use in your test:

$this->assertEqualXMLStructure($expected->children()->first(), $doc->children()->first(), true);

There are more uses, but I just reference this as an explicit example.
This test is not testing, what you expect to be testing as assertEqualXMLStructure does not look at the value of the attributes. Because the assertion method only cares for the existence of the attribute, not the content.

Stripped down example, not using your actual code:

<?php

class FooTest extends PHPUnit\Framework\TestCase {

	public function testBar() {
         $expected = new DOMDocument();
         $expected->loadXML('<div class="example foo bar test" />');

         $actual = new DOMDocument();
         $actual->loadXML('<div class="example test" />');

         $this->assertEqualXMLStructure($expected->documentElement, $actual->documentElement, true);
        }
}

Result: It's a warning because the method is deprecated but the (partly implicit) assertions are all passed:

theseer@nyda /tmp/x5 $ ~/storage/php/phpunit/phpunit/phpunit .
PHPUnit 9.5-g85d4c053a by Sebastian Bergmann and contributors.

W                                                                   1 / 1 (100%)

Time: 00:00.066, Memory: 4.00 MB

There was 1 warning:

1) FooTest::testBar
assertEqualXMLStructure() is deprecated and will be removed in PHPUnit 10.

WARNINGS!
Tests: 1, Assertions: 3, Warnings: 1.

children() (and thus html()) does not return (direct child) textnodes

I was working with a few elements, trying to figure out if the element was empty, or creating a new element and copying all contents from the old element into it. For this, I used html(), e.g.

if ($e->html() == '')
    do_stuff...

and

$new_e->html($e->html());

I found that this worked fine for elements containing other elements (e.g. <a><img/></a>), but not for elements containing (only) text (e.g. <a>foo</a>). In the latter case, html() would return the empty string. Text contained inside child nodes was returned properly, it was only text directly below the element examined that is missing.

I dug into this, and found this is because html() iterates over children() and generates HTML for each child, but children() does return text nodes, only elements. I traced this to this line:

$node->findXPath('child::*')

The * selector used there, matches all elements, which does not include text nodes. Changing this line to:

$node->findXPath('child::node()')

fixes this and lets children() (and thus also html()) also return text nodes.

All this was tested using version 0.6.3, but for lack of a ready PHP7.1 installation, I was not able to test with the latest master, or prepare a (tested) pull request. However, the relevant code is identical, so I assume the same applies to master as well.

Should this be fixed? Or is this behaviour intentional?

Fatal Error - Interrupting execution

Hi Im getting this error. I don't know this is a library error or mine but I think it's library related.
I'm using try catch blocks but it's still interrupts execution.
Any ideas why?

Composer.json require line: "scotteh/php-dom-wrapper": "^1.1",

PHP Fatal error: Uncaught Error: Call to a member function contents() on null in /Applications/MAMP/htdocs/grawler/vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php:694
Stack trace:
#0 /Applications/MAMP/htdocs/grawler/vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php(723): DOMWrap\NodeList->getHtml()
#1 /Applications/MAMP/htdocs/grawler/inc/Grawler/Site.php(142): DOMWrap\NodeList->html()
#2 /Applications/MAMP/htdocs/grawler/inc/Grawler/Site.php(278): Site->singlePost('http://www.kadi...')
#3 /Applications/MAMP/htdocs/grawler/vendor/scotteh/php-dom-wrapper/src/NodeList.php(161): Site->{closure}(Object(DOMWrap\Element))
#4 /Applications/MAMP/htdocs/grawler/inc/Grawler/Site.php(279): DOMWrap\NodeList->map(Object(Closure))
#5 /Applications/MAMP/htdocs/grawler/grawler.php(34): Site->pagination()
#6 {main}
thrown in /Applications/MAMP/htdocs/grawler/vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php on line 694

Shorted code:

private function xPathOrCss($exp)
{
return (($exp{0} == "/" or $exp{0} == "(")? "findXPath" : "find");
}
public function pagination()
{
try
{
$this->post->setSourceUrl($url);
$raw = $this->curl->get($this->post->getSourceUrl());
if ($this->curl->error or empty($raw) or $raw == false)
{
msg_error("Socket error on " . CLASS . "/" . METHOD . "():" . LINE . " {$this->post->getSourceUrl()} with code {$this->curl->errorCode} and '{$this->curl->errorMessage}' message");
return false;
}
$doc = new Document();
$doc->html($raw);
$content = "";
$contentSelector = conf("selectors", "selector_content");
if (!empty($contentSelector))
{
$contentSelector = $doc->{$this->xPathOrCss($contentSelector)}($contentSelector);
LINE 142: if (!is_null($contentSelector) and method_exists($contentSelector, "html") and !empty($contentSelector->html())) $content = $contentSelector->html();
}
$this->post->setContent($content);
unset($content);
}
catch (Exception $e)
{
msg_error("Single Post Exception : " . $e->getMessage());
}
}

disable attribute encoding

Hi;
I am trying to set the href attribute to {{modelItem.finalUrl}}, as variable for PHP template.
The library automatically encodes the {characters, anyway, to disable this feature?

$element->attr($attr, "{{modelItem.$bindValue}}");
result:
href="%7B%7BmodelItem.finalUrl%7D%7D"

Suitability question: are relative selectors supported?

use DOMWrap\Document;

$html = <<<HTML
<ul>
    <li>
        <div>
            <p>Outer 1</p>
            <ul>
                <li>Inner</li>
            </ul>
        </div>
    </li>
    <li>Outer 2</li>
    <li>Outer 3</li>
</ul>
HTML;

$doc = new Document();
$doc->html($html);

// Accepts this selector, or errors out?
$nodes = $doc->find('> li');

// Or maybe this one, which is more standard?
$node = $doc->find(':scope > li');

// Returns '3' or '4'?
var_dump($nodes->count());

PHP 8.1 "Deprecated" errors

Hello, when running with php 8.1 we get deprecations reported:

Deprecated: Return type of DOMWrap\Collections\NodeCollection::offsetGet($offset) should either be compatible with ArrayAccess::offsetGet(mixed $offset): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in [...]/vendor/scotteh/php-dom-wrapper/src/Collections/NodeCollection.php on line 56

Deprecated: Return type of DOMWrap\Collections\NodeCollection::current() should either be compatible with Iterator::current(): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in [...]/vendor/scotteh/php-dom-wrapper/src/Collections/NodeCollection.php on line 126

Deprecated: Return type of DOMWrap\Collections\NodeCollection::next() should either be compatible with Iterator::next(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in [...]/vendor/scotteh/php-dom-wrapper/src/Collections/NodeCollection.php on line 146

Deprecated: Return type of DOMWrap\Collections\NodeCollection::key() should either be compatible with Iterator::key(): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in [...]/vendor/scotteh/php-dom-wrapper/src/Collections/NodeCollection.php on line 136

Deprecated: Return type of DOMWrap\Collections\NodeCollection::rewind() should either be compatible with Iterator::rewind(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in [...]/vendor/scotteh/php-dom-wrapper/src/Collections/NodeCollection.php on line 156

Are there any plans to fix this?

Deprecated: mb_convert_encoding

After PHP 8.2 the library produces this warning.

mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead

Using replaceWith across Documents?

Hi,

Is it possible to use replaceWith and other similar methods across Documents? Are there any potential hiccups?

I have some code I'd like to insert which would be far easier to write and insert as html then build each element and set all attributes independently. I'm wondering if I can load it in a separate Document and then transfer or is there an easier way to do this?

Thanks!

mb_convert_encoding(): Unable to detect character encoding

Hi, I get this error for some Urls at this line:

$html = mb_convert_encoding($html, 'UTF-8', $charset);

Here is the stack trace:

ErrorException: mb_convert_encoding(): Unable to detect character encoding in /home3/vendor/scotteh/php-dom-wrapper/src/Document.php:107
Stack trace:
#0 [internal function]: Illuminate\Foundation\Bootstrap\HandleExceptions->handleError(2, 'mb_convert_enco...', '/home3/onir12/f...', 107, Array)
#1 /home3/vendor/scotteh/php-dom-wrapper/src/Document.php(107): mb_convert_encoding('\n<!DOCTYPE html...', 'UTF-8', 'auto')
#2 /home3/vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php(680): DOMWrap\Document->setHtml('\n<!DOCTYPE html...')
#3 /home3/vendor/scotteh/php-goose/src/Crawler.php(84): DOMWrap\Document->html('\n<!DOCTYPE html...')
#4 /home3/vendor/scotteh/php-goose/src/Crawler.php(53): Goose\Crawler->getDocument('\n<!DOCTYPE html...')
#5 /home3/vendor/scotteh/php-goose/src/Client.php(42): Goose\Crawler->crawl('http://www.dail...', '\n<!DOCTYPE html...')
#6 /home3/routes/api.php(491): Goose\Client->extractContent('http://www.dail...')
#7 /home3/vendor/laravel/framework/src/Illuminate/Routing/Route.php(189): Illuminate\Routing\Router->{closure}()
#8 /home3/vendor/laravel/framework/src/Illuminate/Routing/Route.php(163): Illuminate\Routing\Route->runCallable()
#9 /home3/vendor/laravel/framework/src/Illuminate/Routing/Router.php(572): Illuminate\Routing\Route->run()
#10 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(30): Illuminate\Routing\Router->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#11 /home3/vendor/laravel/framework/src/Illuminate/Routing/Middleware/SubstituteBindings.php(41): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#12 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Routing\Middleware\SubstituteBindings->handle(Object(Illuminate\Http\Request), Object(Closure))
#13 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#14 /home3/vendor/laravel/framework/src/Illuminate/Routing/Middleware/ThrottleRequests.php(49): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#15 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Routing\Middleware\ThrottleRequests->handle(Object(Illuminate\Http\Request), Object(Closure), '60', '1')
#16 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#17 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(102): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#18 /home3/vendor/laravel/framework/src/Illuminate/Routing/Router.php(574): Illuminate\Pipeline\Pipeline->then(Object(Closure))
#19 /home3/vendor/laravel/framework/src/Illuminate/Routing/Router.php(533): Illuminate\Routing\Router->runRouteWithinStack(Object(Illuminate\Routing\Route), Object(Illuminate\Http\Request))
#20 /home3/vendor/laravel/framework/src/Illuminate/Routing/Router.php(511): Illuminate\Routing\Router->dispatchToRoute(Object(Illuminate\Http\Request))
#21 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(176): Illuminate\Routing\Router->dispatch(Object(Illuminate\Http\Request))
#22 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(30): Illuminate\Foundation\Http\Kernel->Illuminate\Foundation\Http\{closure}(Object(Illuminate\Http\Request))
#23 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/TransformsRequest.php(30): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#24 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Foundation\Http\Middleware\TransformsRequest->handle(Object(Illuminate\Http\Request), Object(Closure))
#25 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#26 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/TransformsRequest.php(30): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#27 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Foundation\Http\Middleware\TransformsRequest->handle(Object(Illuminate\Http\Request), Object(Closure))
#28 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#29 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/ValidatePostSize.php(27): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#30 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Foundation\Http\Middleware\ValidatePostSize->handle(Object(Illuminate\Http\Request), Object(Closure))
#31 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#32 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/CheckForMaintenanceMode.php(46): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#33 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Foundation\Http\Middleware\CheckForMaintenanceMode->handle(Object(Illuminate\Http\Request), Object(Closure))
#34 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#35 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(102): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#36 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(151): Illuminate\Pipeline\Pipeline->then(Object(Closure))
#37 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(116): Illuminate\Foundation\Http\Kernel->sendRequestThroughRouter(Object(Illuminate\Http\Request))
#38 /home3/onir12/public_html/index.php(53): Illuminate\Foundation\Http\Kernel->handle(Object(Illuminate\Http\Request))
#39 {main}  

Charset detection bug when decoding unformatted html...

A bit of an unusual edge case, but I have found the library can incorrectly interpret the charset if the html content is in a single line and doesn't declare the charset explicitly but does contain a charset declaration within another part of the document, for example, as part of an href property.

I have concocted an example, based on a document which caused us problems here...

<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta property="Content-Type" content="application/xhtml+xml"/> <title>News Story</title></head><body><section class="news-header"><p>By News Provider</p></section><section class="text-headlines"><h1>Lorem Ipsum</h1></section><section class="news-body"><p>Sed laoreet orci vel nunc imperdiet, non ultricies orci bibendum. Fusce mi elit, vehicula non lacinia eu, luctus sed lectus. Donec at finibus mauris, ut fringilla libero. Cras maximus lacus sit amet elementum imperdiet. Interdum et malesuada fames ac ante ipsum primis in faucibus. Proin pellentesque purus in arcu fermentum sagittis. <a href="http://example.com/ExternalLink?id=7104846651&amp;rd=down&amp;charset=UTF-8&amp;affiliate_index=1234567&amp;method=affiliate_data">Suspendisse nisi mi</a>, vulputate eu orci sed, aliquam interdum sem. In fringilla suscipit enim at scelerisque. Integer accumsan tortor aliquet, congue lorem id, sagittis velit. Pellentesque pulvinar lacus ac arcu cursus, vitae eleifend tortor pellentesque. Nunc at elementum risus, fringilla venenatis ante. Morbi maximus lacus non tincidunt tincidunt. Etiam venenatis mattis nisl, non vulputate felis accumsan eget. Duis vel varius libero.</p></section><body></html>

Appending to wrong element

I'm running into an issue with items being appended to the wrong element.

I am trying to wrap the below html into a fieldset and change the label to a legend.

The code being used

$doc = new Document();
$doc->setLibxmlOptions(LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$doc->html( "<label class='label'>Test Label</label><div class='fields'><input/></div>" );

$label = $doc->find('.label')->first();
$fieldset = $doc->createElement('fieldset');
$legend = $doc->createElement( 'legend' );

$legend->append($label->contents());
$label->remove();

$fields = $doc->find('.fields');
$fields->detach();

$fieldset->append($legend);
$fieldset->append($fields);

$doc->append($fieldset);

echo htmlentities($doc->saveHTML($doc) );

die();

The resulting html:

<label class="label">Test Label</label><fieldset><legend>Test Label<div class="fields"><input></div> </legend></fieldset>

But $fields was appended to $fieldset, so why is it inside of the legend?

This is removing text using html() function, I think it's a bug

Using this code

dump($p->html());

$result = $p->html();
$p->html($result);

dump($p->html());

I get this for the first dump:

"<strong>Cine lenses</strong> are a type of lens designed explicitly for videography and filmmaking. While they can also be used for photography, they are not as commonly used for this purpose. <strong>Cinema lenses</strong> are distinguished from regular camera lenses by their construction and features. They typically have a longer focus throw, which allows for more precise focusing, and a de-clicked aperture, which enables smoother transitions between different aperture settings. A Cine lens also typically have a more robust build quality, making them better suited for heavy use. While cine lenses can be used for photography, they are not necessarily the best option for every situation."

And this for the second:
"<strong>Cine lenses</strong><strong>Cinema lenses</strong>"

Is it me or a bug?

How can I use this library to convert a tag-balanced HTML fragment into a node list idiomatically, reliably and 1:1?

What is the idiomatic way to use this library to convert a tag-balanced HTML fragment in a string into a node list, in a reliable 1:1 manner that doesn't require checking for multiple corner cases?

$nodeList = what_goes_here("Some text <span>a tag</span> some more text");

// $node list should now contain the exact structure [ TEXT, <span> [ TEXT ] </span>, TEXT ]
// as starkly opposed to [ <p> [ TEXT, <span> [ TEXT ] </span>, TEXT ] </p> ]
// which is what I obtain from ->create("Some text <span>a tag</span> some more text")

EDIT: the issue seems to be that there is no way to specify LIBXML_HTML_NOIMPLIED as a global policy. Even if you set the option after creating the document and before loading contents, various manipulation functions will create other document objects internally for processing, and they won't propagate the LIBXML_HTML_NOIMPLIED option to them; looks like they couldn't even do that at all, because there is no Document::getLibxmlOptions().

getAttr() returns empty for 0 values

<div data-item-id="0"></div>

Using attr('data-item-id') on this html will return an empty value.

$result = $node->getAttribute($name);
  if (empty($result)) {
      return '';
  }

Access level to DOMWrap\Traits\ManipulationTrait::inputAsNodeList() must be public (as in class DOMWrap\Traits\TraversalTrait)

I might be doing something wrong (that is likely, I'm not a regular PHP developer), but when trying to use the library, I get the following error:
Access level to DOMWrap\Traits\ManipulationTrait::inputAsNodeList() must be public (as in class DOMWrap\Traits\TraversalTrait)

I'm using the library in a Laravel based CMS called Statamic which includes autoload. The error occurs when trying to use the library as per the instructions use DOMWrap\Document; and then doing $doc = new Document();.

This is the whole file, if it can be of any help.

<?php

namespace Statamic\Addons\MyAddon;

use DOMWrap\Document;

use Statamic\Extend\Widget;

class ReadMeWidget extends Widget
{
    public function html()
    {

        $doc = new Document();
    }
}

Thanks in advance for any help!

PHP Docs for type hinting

I've been playing with this lib and it's really helpful. One issue I've experienced:

I am using PHPStorm and notices that the find isn't showing the object type for return which breaks the type-hinting.

en___volumes_code_projects_elasticnews_en__-____cli_scrape-test_php__elastic_news

It seem the problem starts in the class Document. The traits are use in this order
use CommonTrait; use TraversalTrait; use ManipulationTrait;
the find method is in both the traversal and manipulation traits (didn't understand the need) and the ManipulationTrait just has an reference to the TraversalTrait where the find is properly defined.

I think either the ManipulationTrait methods should be fully documented (if there is a case you really need them) or if you use the TraversalTrait last in the Document, it's methods will "win" and the type hinting works as expected.

I would lean towards removing the method from ManipulationTrait as it is defined as abstract and has no implementations except for the TraversalTrait (AFAIK). But you might have some plan I haven't understood.

Loading in html from file

In your example I noticed you used '$doc->html($html);' to load in html from a variable. But I was wondering if you have a built in function for loading in html from a file instead? If not what do you propose I use?

Why do getHtml() and getOuterHtml() only work on the first child?

Today, I tried returning a NodeList of DomElements from a function, to allow callers to either include it in a DOM tree, or call ->getOuterHtml() on it to process the elements as an HTML string.

However, I found that this actually only returns the HTML of the first element of the nodelist:

public function getOuterHtml(): string {
return $this->document()->saveHTML(
$this->collection()->first()
);
}

The same thing happens for getHTML():

public function getHtml(): string {
return $this->collection()->first()->contents()->reduce(function($carry, $node) {
return $carry . $this->document()->saveHTML($node);
}, '');
}

Why is this? Wouldn't it make more sense to add an additional reduce() instead of the first() to iterate all elements of the collection?

parent() returns symfony class and not dom-wrapper class

Hi, thanks for some great code!.

I found an error. The parent() function does not return the correct class.

First example makes an error

foreach($doc->find('img') as $img){
$img->parent()->children();
}

Second one works

foreach($doc->find('img') as $img){
$img->parents()->first()->children();
}

Regards,
Casper

Odd find() behavior

Hi,
First off, I'm just getting started with PHP. Please forgive any sloppy code, I'm just poking around :-) I am familiar with jQuery tho, and this makes that the find() function has some odd behavior in my mind.

I would expect an element when I do a find(). But instead I get the contents of all the elements that match. For my case I specifically need the elements (I need to process their attributes). I have tried some things on the given example code (to eliminate any issue with the gigantic html that I'm providing it)

$html = '<ul>
    <li class="test"><div my-attribute="value">First</div></li>
    <li>Second</li>
    <li class="test">Third</li>
</ul>';

$doc = new Document();
$doc->html($html);
$nodes = $doc->find('.test');
$output = $nodes->attr('my-attribute');

var_dump($nodes);
var_dump($output);
// Returns '3'
var_dump($nodes->count());

Chaining the commands ($nodes = $doc->find('.test')->attr('my-attribute');) gives the same problem. When do the (...)->find('.test'); I would expect it returning <div my-attribute="value">First</div> and Third. Instead it now returns First and Third. This would mean that any further manipulation is impossible after a find().

Is this intended behavior? Is there a workaround available to get the content of any element matching a specific class?

Fatal error: with DOMDocument::prepend(...$nodes): void in

Hi,
I have a following errors,
Fatal error: Declaration of DOMWrap\Traits\ManipulationTrait::prepend($input): DOMWrap\Document must be compatible with DOMDocument::prepend(...$nodes): void in /var/www/.../admin/src/dist/php/Dom/vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php on line 276

Ubuntu 20.04/18.04 parsing doesn't work

I created a crawler using Goutte what worked fine on my Windows machine. After putting it on my Ubuntu 18.04 server stuff didn't work, I tried to solve it for about 2 hours and didn't managed to get it working. I rewrote all in PHP DOM Wrapper and the same behavior appeared. After that I tested it on my WSL Ubuntu 20.04 and again the same behavior appeared.

But what is the problem here? I wanna stick with this package cause it works better but I can't get it to work on my Ubuntu machine/wsl. I do not see errors and php-xml is installed.

2.0 migration guide

Hi,

I'm seeking to update our dependency to the latest major version, but I can't see what may have changed and what code I need to update.
Is there a changelog or migration guide somewhere I can read? I can't seem to find one.

Thank you

Partial HTML?

I'm running a partial html on it. How do I prevent it from wrapping it inside of doctype/html/body tags?

update symfony/css-selector to v5.0

Currently I cannot use scotteh/php-goose on a symfony 5.0 project, because scotteh/php-dom-wrapper has a dependency on symfony/css-selector 4.* only

mb_convert_encoding() throws illegal encoding character code specified when reading specific website

When attempting to run the URL (http://futurememes.blogspot.jp/2017/01/cognitive-easing-human-identity-crisis.html?m=1) through encoding, the following error is generated:

mb_convert_encoding(): Illegal character encoding specified
/..../vendor/scotteh/php-dom-wrapper/src/Document.php:102
/.../vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php:680
/.../vendor/scotteh/php-goose/src/Crawler.php:84

It appears to be because of lined 98 in the php-dom-wrapper/src/Document.php

 if (preg_match('@<meta.*?charset=["]?([^"\s]+)@im', $html, $matches)) {

The aforementioned site has a meta tag as such

<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>

Which resolves to a $matches array of

array(2) {
  [0]=>
  string(40) "<meta content='text/html; charset=UTF-8'"
  [1]=>
  string(6) "UTF-8'"
}

So basically, the regular expression matching doesn't handle single quotes and includes the single quote in the result which is causing the error.

Easier way to add and manipulate new nodes

Every now and then, I find myself writing code that creates a new element, sets some data on it, and adds it to the document. E.g. something like this:

            $div = $doc->createElement('div');
            $div->setAttr('id', 'foo');
            $doc->findXPath('/body')->append($div);

I wonder if this could be more compact with php-dom-wrapper? Jquery seems to have appendTo() for this. Similar to that, I would like to write:

DOMWrap\create('div')->setAttr('id', 'foo')->appendTo($doc->findXPath('/body');

Note that this contains two new parts: DOMWrap\create to create a new element and appendTo to append an element to an existing element.

appendTo is probably easy, the create might need some more thought:

  • Its name might need improvement, something short like jquery's $() is nice, but probably not essential.
  • Maybe it should (also) accept a html string or have a second variant that does so (so you can do things like DOMWrap\create('<div>foo</foo>') as well.
  • Since there is no reference to the existing document, this will need to create a new document, and appendTo will have to move over the elements to the existing documents (which I think already happens with the current append and similar methods).
  • Maybe the existing code could already do this using new DomWrap\Document()->html('<div>'), but that is a bit verbose perhaps?

Love to hear your thoughts on this :-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.