Comments (3)
@kirillv There are benchmarks that illustrate usage. There's also usage in the HttpFilteringEngine library. What more are you looking for?
from gq.
@TechnikEmpire It woult be great if there would be separate examples in c++ how to find, extract, modify html content (with your selectors). I found that it is possible to serialize to html again in c++ code (gumbo doesnt have such features), but there is no separate examples on this. Thank in advance!
from gq.
@kirillv True, although the serialization is adapted from an official sample in the Gumbo repo. It's actually licensed under another license (that single file), which is the Apache2 I believe, but the original author Kevin Hendrix gave me permission to take it under the MIT (he gave that permission in a bug thread I opened on the Gumbo repo).
Anyway you're right, because the serialization is actually the place where you perform mutations. You use the selectors to grab things and then initiate the serialization. During that serialization, your selected nodes will be given back to you through a simple interface where you can either:
- Modify their values.
- Return nothing, effectively deleting the node and all of its children.
- Inject completely different, hand-written HTML instead.
Anyway I will get to this and StahpIt/HttpFilteringEngine eventually, I'm just swamped with private work right now.
Update
One more thing. This mutation API is rather limited in the sense that it's meant for one-off transformations of parsed HTML. It's not fully dynamic, where you can keep applying sequential mutations. In order to do this, you'd need to do it in passes, where you serialize in a pass, create a new document from that serialized string, rinse and repeat.
The reason for this is that there's some really heavy duty hashmaps and such being constructed when you parse a document, and this only happens once. It's slightly expensive, and static (once compiled for a document, it doesn't get recompiled). The purpose of this is because it speeds up selection dramatically. All tag names, tag property keys and values are indexed through unordered_map
and map
, and also in scoped manner, so that complex selectors are blazing fast (this is where all the speed comes from). The only downside is that it's rigid, only done once per parsed document. Mutations cannot currently be reflected in this tree.
from gq.
Related Issues (13)
- Way too many matches HOT 1
- GQNode can't hold shared_ptr to parents & siblings. HOT 1
- Both `:contains` selector tests fail HOT 1
- Throw by value, catch by reference
- Util::GetNodeTagName(const GumboNode* node) Doesn't handle unknown tags
- Parser::ParseIdentifier(boost::string_ref& selectorStr) Doesn't calculate index
- Node tag names are broken
- Merge Gumbo's sources into single build process
- Possible nth-child selector parsing bug HOT 2
- Either upgrade project to use C++17 string_view or fix missing boost dep
- Invalid HTML while parsing inline HTML string HOT 8
- Speed really isn't that great HOT 17
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gq.