scotteh / php-dom-wrapper Goto Github PK
View Code? Open in Web Editor NEWSimple DOM wrapper library to manipulate and traverse HTML documents similar to jQuery
License: BSD 3-Clause "New" or "Revised" License
Simple DOM wrapper library to manipulate and traverse HTML documents similar to jQuery
License: BSD 3-Clause "New" or "Revised" License
Hi,
while researching active uses of PHPUnit's assertEqualXMLStructure
, i stumbled over the following use in your test:
There are more uses, but I just reference this as an explicit example.
This test is not testing, what you expect to be testing as assertEqualXMLStructure
does not look at the value of the attributes. Because the assertion method only cares for the existence of the attribute, not the content.
Stripped down example, not using your actual code:
<?php
class FooTest extends PHPUnit\Framework\TestCase {
public function testBar() {
$expected = new DOMDocument();
$expected->loadXML('<div class="example foo bar test" />');
$actual = new DOMDocument();
$actual->loadXML('<div class="example test" />');
$this->assertEqualXMLStructure($expected->documentElement, $actual->documentElement, true);
}
}
Result: It's a warning because the method is deprecated but the (partly implicit) assertions are all passed:
theseer@nyda /tmp/x5 $ ~/storage/php/phpunit/phpunit/phpunit .
PHPUnit 9.5-g85d4c053a by Sebastian Bergmann and contributors.
W 1 / 1 (100%)
Time: 00:00.066, Memory: 4.00 MB
There was 1 warning:
1) FooTest::testBar
assertEqualXMLStructure() is deprecated and will be removed in PHPUnit 10.
WARNINGS!
Tests: 1, Assertions: 3, Warnings: 1.
I was working with a few elements, trying to figure out if the element was empty, or creating a new element and copying all contents from the old element into it. For this, I used html()
, e.g.
if ($e->html() == '')
do_stuff...
and
$new_e->html($e->html());
I found that this worked fine for elements containing other elements (e.g. <a><img/></a>
), but not for elements containing (only) text (e.g. <a>foo</a>
). In the latter case, html()
would return the empty string. Text contained inside child nodes was returned properly, it was only text directly below the element examined that is missing.
I dug into this, and found this is because html()
iterates over children()
and generates HTML for each child, but children()
does return text nodes, only elements. I traced this to this line:
php-dom-wrapper/src/Traits/TraversalTrait.php
Line 259 in c104cf5
The *
selector used there, matches all elements, which does not include text nodes. Changing this line to:
$node->findXPath('child::node()')
fixes this and lets children()
(and thus also html()
) also return text nodes.
All this was tested using version 0.6.3, but for lack of a ready PHP7.1 installation, I was not able to test with the latest master, or prepare a (tested) pull request. However, the relevant code is identical, so I assume the same applies to master as well.
Should this be fixed? Or is this behaviour intentional?
I am using this library. i didn't call any remove function pf this lib but it showing this error.
Uncaught Error: Call to a member function isRemoved() on null
Hi Im getting this error. I don't know this is a library error or mine but I think it's library related.
I'm using try catch blocks but it's still interrupts execution.
Any ideas why?
Composer.json require line: "scotteh/php-dom-wrapper": "^1.1",
PHP Fatal error: Uncaught Error: Call to a member function contents() on null in /Applications/MAMP/htdocs/grawler/vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php:694
Stack trace:
#0 /Applications/MAMP/htdocs/grawler/vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php(723): DOMWrap\NodeList->getHtml()
#1 /Applications/MAMP/htdocs/grawler/inc/Grawler/Site.php(142): DOMWrap\NodeList->html()
#2 /Applications/MAMP/htdocs/grawler/inc/Grawler/Site.php(278): Site->singlePost('http://www.kadi...')
#3 /Applications/MAMP/htdocs/grawler/vendor/scotteh/php-dom-wrapper/src/NodeList.php(161): Site->{closure}(Object(DOMWrap\Element))
#4 /Applications/MAMP/htdocs/grawler/inc/Grawler/Site.php(279): DOMWrap\NodeList->map(Object(Closure))
#5 /Applications/MAMP/htdocs/grawler/grawler.php(34): Site->pagination()
#6 {main}
thrown in /Applications/MAMP/htdocs/grawler/vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php on line 694
Shorted code:
private function xPathOrCss($exp)
{
return (($exp{0} == "/" or $exp{0} == "(")? "findXPath" : "find");
}
public function pagination()
{
try
{
$this->post->setSourceUrl($url);
$raw = $this->curl->get($this->post->getSourceUrl());
if ($this->curl->error or empty($raw) or $raw == false)
{
msg_error("Socket error on " . CLASS . "/" . METHOD . "():" . LINE . " {$this->post->getSourceUrl()} with code {$this->curl->errorCode} and '{$this->curl->errorMessage}' message");
return false;
}
$doc = new Document();
$doc->html($raw);
$content = "";
$contentSelector = conf("selectors", "selector_content");
if (!empty($contentSelector))
{
$contentSelector = $doc->{$this->xPathOrCss($contentSelector)}($contentSelector);
LINE 142: if (!is_null($contentSelector) and method_exists($contentSelector, "html") and !empty($contentSelector->html())) $content = $contentSelector->html();
}
$this->post->setContent($content);
unset($content);
}
catch (Exception $e)
{
msg_error("Single Post Exception : " . $e->getMessage());
}
}
Elements get duplicated when using append(), prepend(), before(), after()
Add the previous matched element to the current set.
https://api.jquery.com/addBack/
Like the regular JavaScript (elem.outerHTML), this would get the of the HTML current matched elements, not just the inside of the matched elements.,
Hi;
I am trying to set the href attribute to {{modelItem.finalUrl}}, as variable for PHP template.
The library automatically encodes the {characters, anyway, to disable this feature?
$element->attr($attr, "{{modelItem.$bindValue}}");
result:
href="%7B%7BmodelItem.finalUrl%7D%7D"
use DOMWrap\Document;
$html = <<<HTML
<ul>
<li>
<div>
<p>Outer 1</p>
<ul>
<li>Inner</li>
</ul>
</div>
</li>
<li>Outer 2</li>
<li>Outer 3</li>
</ul>
HTML;
$doc = new Document();
$doc->html($html);
// Accepts this selector, or errors out?
$nodes = $doc->find('> li');
// Or maybe this one, which is more standard?
$node = $doc->find(':scope > li');
// Returns '3' or '4'?
var_dump($nodes->count());
Hello, when running with php 8.1 we get deprecations reported:
Deprecated: Return type of DOMWrap\Collections\NodeCollection::offsetGet($offset) should either be compatible with ArrayAccess::offsetGet(mixed $offset): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in [...]/vendor/scotteh/php-dom-wrapper/src/Collections/NodeCollection.php on line 56
Deprecated: Return type of DOMWrap\Collections\NodeCollection::current() should either be compatible with Iterator::current(): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in [...]/vendor/scotteh/php-dom-wrapper/src/Collections/NodeCollection.php on line 126
Deprecated: Return type of DOMWrap\Collections\NodeCollection::next() should either be compatible with Iterator::next(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in [...]/vendor/scotteh/php-dom-wrapper/src/Collections/NodeCollection.php on line 146
Deprecated: Return type of DOMWrap\Collections\NodeCollection::key() should either be compatible with Iterator::key(): mixed, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in [...]/vendor/scotteh/php-dom-wrapper/src/Collections/NodeCollection.php on line 136
Deprecated: Return type of DOMWrap\Collections\NodeCollection::rewind() should either be compatible with Iterator::rewind(): void, or the #[\ReturnTypeWillChange] attribute should be used to temporarily suppress the notice in [...]/vendor/scotteh/php-dom-wrapper/src/Collections/NodeCollection.php on line 156
Are there any plans to fix this?
After PHP 8.2 the library produces this warning.
mb_convert_encoding(): Handling HTML entities via mbstring is deprecated; use htmlspecialchars, htmlentities, or mb_encode_numericentity/mb_decode_numericentity instead
Hi,
Is it possible to use replaceWith and other similar methods across Documents? Are there any potential hiccups?
I have some code I'd like to insert which would be far easier to write and insert as html then build each element and set all attributes independently. I'm wondering if I can load it in a separate Document and then transfer or is there an easier way to do this?
Thanks!
Hi, I get this error for some Urls at this line:
$html = mb_convert_encoding($html, 'UTF-8', $charset);
Here is the stack trace:
ErrorException: mb_convert_encoding(): Unable to detect character encoding in /home3/vendor/scotteh/php-dom-wrapper/src/Document.php:107
Stack trace:
#0 [internal function]: Illuminate\Foundation\Bootstrap\HandleExceptions->handleError(2, 'mb_convert_enco...', '/home3/onir12/f...', 107, Array)
#1 /home3/vendor/scotteh/php-dom-wrapper/src/Document.php(107): mb_convert_encoding('\n<!DOCTYPE html...', 'UTF-8', 'auto')
#2 /home3/vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php(680): DOMWrap\Document->setHtml('\n<!DOCTYPE html...')
#3 /home3/vendor/scotteh/php-goose/src/Crawler.php(84): DOMWrap\Document->html('\n<!DOCTYPE html...')
#4 /home3/vendor/scotteh/php-goose/src/Crawler.php(53): Goose\Crawler->getDocument('\n<!DOCTYPE html...')
#5 /home3/vendor/scotteh/php-goose/src/Client.php(42): Goose\Crawler->crawl('http://www.dail...', '\n<!DOCTYPE html...')
#6 /home3/routes/api.php(491): Goose\Client->extractContent('http://www.dail...')
#7 /home3/vendor/laravel/framework/src/Illuminate/Routing/Route.php(189): Illuminate\Routing\Router->{closure}()
#8 /home3/vendor/laravel/framework/src/Illuminate/Routing/Route.php(163): Illuminate\Routing\Route->runCallable()
#9 /home3/vendor/laravel/framework/src/Illuminate/Routing/Router.php(572): Illuminate\Routing\Route->run()
#10 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(30): Illuminate\Routing\Router->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#11 /home3/vendor/laravel/framework/src/Illuminate/Routing/Middleware/SubstituteBindings.php(41): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#12 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Routing\Middleware\SubstituteBindings->handle(Object(Illuminate\Http\Request), Object(Closure))
#13 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#14 /home3/vendor/laravel/framework/src/Illuminate/Routing/Middleware/ThrottleRequests.php(49): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#15 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Routing\Middleware\ThrottleRequests->handle(Object(Illuminate\Http\Request), Object(Closure), '60', '1')
#16 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#17 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(102): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#18 /home3/vendor/laravel/framework/src/Illuminate/Routing/Router.php(574): Illuminate\Pipeline\Pipeline->then(Object(Closure))
#19 /home3/vendor/laravel/framework/src/Illuminate/Routing/Router.php(533): Illuminate\Routing\Router->runRouteWithinStack(Object(Illuminate\Routing\Route), Object(Illuminate\Http\Request))
#20 /home3/vendor/laravel/framework/src/Illuminate/Routing/Router.php(511): Illuminate\Routing\Router->dispatchToRoute(Object(Illuminate\Http\Request))
#21 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(176): Illuminate\Routing\Router->dispatch(Object(Illuminate\Http\Request))
#22 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(30): Illuminate\Foundation\Http\Kernel->Illuminate\Foundation\Http\{closure}(Object(Illuminate\Http\Request))
#23 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/TransformsRequest.php(30): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#24 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Foundation\Http\Middleware\TransformsRequest->handle(Object(Illuminate\Http\Request), Object(Closure))
#25 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#26 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/TransformsRequest.php(30): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#27 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Foundation\Http\Middleware\TransformsRequest->handle(Object(Illuminate\Http\Request), Object(Closure))
#28 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#29 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/ValidatePostSize.php(27): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#30 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Foundation\Http\Middleware\ValidatePostSize->handle(Object(Illuminate\Http\Request), Object(Closure))
#31 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#32 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Middleware/CheckForMaintenanceMode.php(46): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#33 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(148): Illuminate\Foundation\Http\Middleware\CheckForMaintenanceMode->handle(Object(Illuminate\Http\Request), Object(Closure))
#34 /home3/vendor/laravel/framework/src/Illuminate/Routing/Pipeline.php(53): Illuminate\Pipeline\Pipeline->Illuminate\Pipeline\{closure}(Object(Illuminate\Http\Request))
#35 /home3/vendor/laravel/framework/src/Illuminate/Pipeline/Pipeline.php(102): Illuminate\Routing\Pipeline->Illuminate\Routing\{closure}(Object(Illuminate\Http\Request))
#36 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(151): Illuminate\Pipeline\Pipeline->then(Object(Closure))
#37 /home3/vendor/laravel/framework/src/Illuminate/Foundation/Http/Kernel.php(116): Illuminate\Foundation\Http\Kernel->sendRequestThroughRouter(Object(Illuminate\Http\Request))
#38 /home3/onir12/public_html/index.php(53): Illuminate\Foundation\Http\Kernel->handle(Object(Illuminate\Http\Request))
#39 {main}
A bit of an unusual edge case, but I have found the library can incorrectly interpret the charset if the html content is in a single line and doesn't declare the charset explicitly but does contain a charset declaration within another part of the document, for example, as part of an href property.
I have concocted an example, based on a document which caused us problems here...
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta property="Content-Type" content="application/xhtml+xml"/> <title>News Story</title></head><body><section class="news-header"><p>By News Provider</p></section><section class="text-headlines"><h1>Lorem Ipsum</h1></section><section class="news-body"><p>Sed laoreet orci vel nunc imperdiet, non ultricies orci bibendum. Fusce mi elit, vehicula non lacinia eu, luctus sed lectus. Donec at finibus mauris, ut fringilla libero. Cras maximus lacus sit amet elementum imperdiet. Interdum et malesuada fames ac ante ipsum primis in faucibus. Proin pellentesque purus in arcu fermentum sagittis. <a href="http://example.com/ExternalLink?id=7104846651&rd=down&charset=UTF-8&affiliate_index=1234567&method=affiliate_data">Suspendisse nisi mi</a>, vulputate eu orci sed, aliquam interdum sem. In fringilla suscipit enim at scelerisque. Integer accumsan tortor aliquet, congue lorem id, sagittis velit. Pellentesque pulvinar lacus ac arcu cursus, vitae eleifend tortor pellentesque. Nunc at elementum risus, fringilla venenatis ante. Morbi maximus lacus non tincidunt tincidunt. Etiam venenatis mattis nisl, non vulputate felis accumsan eget. Duis vel varius libero.</p></section><body></html>
I'm running into an issue with items being appended to the wrong element.
I am trying to wrap the below html into a fieldset and change the label to a legend.
The code being used
$doc = new Document();
$doc->setLibxmlOptions(LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$doc->html( "<label class='label'>Test Label</label><div class='fields'><input/></div>" );
$label = $doc->find('.label')->first();
$fieldset = $doc->createElement('fieldset');
$legend = $doc->createElement( 'legend' );
$legend->append($label->contents());
$label->remove();
$fields = $doc->find('.fields');
$fields->detach();
$fieldset->append($legend);
$fieldset->append($fields);
$doc->append($fieldset);
echo htmlentities($doc->saveHTML($doc) );
die();
The resulting html:
<label class="label">Test Label</label><fieldset><legend>Test Label<div class="fields"><input></div> </legend></fieldset>
But $fields was appended to $fieldset, so why is it inside of the legend?
Using this code
dump($p->html());
$result = $p->html();
$p->html($result);
dump($p->html());
I get this for the first dump:
"<strong>Cine lenses</strong> are a type of lens designed explicitly for videography and filmmaking. While they can also be used for photography, they are not as commonly used for this purpose. <strong>Cinema lenses</strong> are distinguished from regular camera lenses by their construction and features. They typically have a longer focus throw, which allows for more precise focusing, and a de-clicked aperture, which enables smoother transitions between different aperture settings. A Cine lens also typically have a more robust build quality, making them better suited for heavy use. While cine lenses can be used for photography, they are not necessarily the best option for every situation."
And this for the second:
"<strong>Cine lenses</strong><strong>Cinema lenses</strong>"
Is it me or a bug?
What is the idiomatic way to use this library to convert a tag-balanced HTML fragment in a string into a node list, in a reliable 1:1 manner that doesn't require checking for multiple corner cases?
$nodeList = what_goes_here("Some text <span>a tag</span> some more text");
// $node list should now contain the exact structure [ TEXT, <span> [ TEXT ] </span>, TEXT ]
// as starkly opposed to [ <p> [ TEXT, <span> [ TEXT ] </span>, TEXT ] </p> ]
// which is what I obtain from ->create("Some text <span>a tag</span> some more text")
EDIT: the issue seems to be that there is no way to specify LIBXML_HTML_NOIMPLIED as a global policy. Even if you set the option after creating the document and before loading contents, various manipulation functions will create other document objects internally for processing, and they won't propagate the LIBXML_HTML_NOIMPLIED option to them; looks like they couldn't even do that at all, because there is no Document::getLibxmlOptions()
.
$doc = new \DOMWrap\Document();
$doc->append("<div>Tést</div>");
$doc->append($doc->create("<div></div>")->text("Tést"));
echo $doc->html();
Result:
Tést
Tést
<div data-item-id="0"></div>
Using attr('data-item-id') on this html will return an empty value.
$result = $node->getAttribute($name);
if (empty($result)) {
return '';
}
I might be doing something wrong (that is likely, I'm not a regular PHP developer), but when trying to use the library, I get the following error:
Access level to DOMWrap\Traits\ManipulationTrait::inputAsNodeList() must be public (as in class DOMWrap\Traits\TraversalTrait)
I'm using the library in a Laravel based CMS called Statamic which includes autoload. The error occurs when trying to use the library as per the instructions use DOMWrap\Document;
and then doing $doc = new Document();
.
This is the whole file, if it can be of any help.
<?php
namespace Statamic\Addons\MyAddon;
use DOMWrap\Document;
use Statamic\Extend\Widget;
class ReadMeWidget extends Widget
{
public function html()
{
$doc = new Document();
}
}
Thanks in advance for any help!
I've been playing with this lib and it's really helpful. One issue I've experienced:
I am using PHPStorm and notices that the find isn't showing the object type for return which breaks the type-hinting.
It seem the problem starts in the class Document
. The traits are use in this order
use CommonTrait; use TraversalTrait; use ManipulationTrait;
the find
method is in both the traversal and manipulation traits (didn't understand the need) and the ManipulationTrait
just has an reference to the TraversalTrait
where the find
is properly defined.
I think either the ManipulationTrait
methods should be fully documented (if there is a case you really need them) or if you use the TraversalTrait
last in the Document
, it's methods will "win" and the type hinting works as expected.
I would lean towards removing the method from ManipulationTrait
as it is defined as abstract and has no implementations except for the TraversalTrait
(AFAIK). But you might have some plan I haven't understood.
In your example I noticed you used '$doc->html($html);' to load in html from a variable. But I was wondering if you have a built in function for loading in html from a file instead? If not what do you propose I use?
Today, I tried returning a NodeList of DomElements from a function, to allow callers to either include it in a DOM tree, or call ->getOuterHtml()
on it to process the elements as an HTML string.
However, I found that this actually only returns the HTML of the first element of the nodelist:
php-dom-wrapper/src/Traits/ManipulationTrait.php
Lines 684 to 688 in 5243c72
The same thing happens for getHTML()
:
php-dom-wrapper/src/Traits/ManipulationTrait.php
Lines 693 to 697 in 5243c72
Why is this? Wouldn't it make more sense to add an additional reduce()
instead of the first()
to iterate all elements of the collection?
Hi, thanks for some great code!.
I found an error. The parent() function does not return the correct class.
First example makes an error
foreach($doc->find('img') as $img){
$img->parent()->children();
}
Second one works
foreach($doc->find('img') as $img){
$img->parents()->first()->children();
}
Regards,
Casper
Hi,
First off, I'm just getting started with PHP. Please forgive any sloppy code, I'm just poking around :-) I am familiar with jQuery tho, and this makes that the find() function has some odd behavior in my mind.
I would expect an element when I do a find(). But instead I get the contents of all the elements that match. For my case I specifically need the elements (I need to process their attributes). I have tried some things on the given example code (to eliminate any issue with the gigantic html that I'm providing it)
$html = '<ul>
<li class="test"><div my-attribute="value">First</div></li>
<li>Second</li>
<li class="test">Third</li>
</ul>';
$doc = new Document();
$doc->html($html);
$nodes = $doc->find('.test');
$output = $nodes->attr('my-attribute');
var_dump($nodes);
var_dump($output);
// Returns '3'
var_dump($nodes->count());
Chaining the commands ($nodes = $doc->find('.test')->attr('my-attribute');
) gives the same problem. When do the (...)->find('.test');
I would expect it returning <div my-attribute="value">First</div>
and Third
. Instead it now returns First
and Third
. This would mean that any further manipulation is impossible after a find()
.
Is this intended behavior? Is there a workaround available to get the content of any element matching a specific class?
Should we have __toString()
method in each Element object? This way I can just print it for debugging.
I'm coming from the Simple HTML DOM. And it was possible to use it just by requiring a single file simple_html_dom.php
.
Any way to do the same with PHP DOM Wrapper?
I tested it today for my WordPress project and I find that this lib uses symfony/css-selector which required PHP 8.1 https://github.com/symfony/css-selector/blob/0dd5e36b80e1de97f8f74ed7023ac2b837a36443/composer.json
Hi,
I have a following errors,
Fatal error: Declaration of DOMWrap\Traits\ManipulationTrait::prepend($input): DOMWrap\Document must be compatible with DOMDocument::prepend(...$nodes): void in /var/www/.../admin/src/dist/php/Dom/vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php on line 276
I created a crawler using Goutte what worked fine on my Windows machine. After putting it on my Ubuntu 18.04 server stuff didn't work, I tried to solve it for about 2 hours and didn't managed to get it working. I rewrote all in PHP DOM Wrapper and the same behavior appeared. After that I tested it on my WSL Ubuntu 20.04 and again the same behavior appeared.
But what is the problem here? I wanna stick with this package cause it works better but I can't get it to work on my Ubuntu machine/wsl. I do not see errors and php-xml is installed.
Hi,
I'm seeking to update our dependency to the latest major version, but I can't see what may have changed and what code I need to update.
Is there a changelog or migration guide somewhere I can read? I can't seem to find one.
Thank you
I'm running a partial html on it. How do I prevent it from wrapping it inside of doctype/html/body tags?
Currently I cannot use scotteh/php-goose on a symfony 5.0 project, because scotteh/php-dom-wrapper has a dependency on symfony/css-selector 4.* only
When attempting to run the URL (http://futurememes.blogspot.jp/2017/01/cognitive-easing-human-identity-crisis.html?m=1) through encoding, the following error is generated:
mb_convert_encoding(): Illegal character encoding specified
/..../vendor/scotteh/php-dom-wrapper/src/Document.php:102
/.../vendor/scotteh/php-dom-wrapper/src/Traits/ManipulationTrait.php:680
/.../vendor/scotteh/php-goose/src/Crawler.php:84
It appears to be because of lined 98 in the php-dom-wrapper/src/Document.php
if (preg_match('@<meta.*?charset=["]?([^"\s]+)@im', $html, $matches)) {
The aforementioned site has a meta tag as such
<meta content='text/html; charset=UTF-8' http-equiv='Content-Type'/>
Which resolves to a $matches array of
array(2) {
[0]=>
string(40) "<meta content='text/html; charset=UTF-8'"
[1]=>
string(6) "UTF-8'"
}
So basically, the regular expression matching doesn't handle single quotes and includes the single quote in the result which is causing the error.
Every now and then, I find myself writing code that creates a new element, sets some data on it, and adds it to the document. E.g. something like this:
$div = $doc->createElement('div');
$div->setAttr('id', 'foo');
$doc->findXPath('/body')->append($div);
I wonder if this could be more compact with php-dom-wrapper? Jquery seems to have appendTo() for this. Similar to that, I would like to write:
DOMWrap\create('div')->setAttr('id', 'foo')->appendTo($doc->findXPath('/body');
Note that this contains two new parts: DOMWrap\create
to create a new element and appendTo
to append an element to an existing element.
appendTo
is probably easy, the create
might need some more thought:
$()
is nice, but probably not essential.DOMWrap\create('<div>foo</foo>')
as well.new DomWrap\Document()->html('<div>')
, but that is a bit verbose perhaps?Love to hear your thoughts on this :-)
I've never come across this kind of composer install. Is there no option to use composer require
?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.