Coder Social home page Coder Social logo

Examples about gq HOT 3 OPEN

kirillv avatar kirillv commented on June 3, 2024
Examples

from gq.

Comments (3)

TechnikEmpire avatar TechnikEmpire commented on June 3, 2024

@kirillv There are benchmarks that illustrate usage. There's also usage in the HttpFilteringEngine library. What more are you looking for?

from gq.

kirillv avatar kirillv commented on June 3, 2024

@TechnikEmpire It woult be great if there would be separate examples in c++ how to find, extract, modify html content (with your selectors). I found that it is possible to serialize to html again in c++ code (gumbo doesnt have such features), but there is no separate examples on this. Thank in advance!

from gq.

TechnikEmpire avatar TechnikEmpire commented on June 3, 2024

@kirillv True, although the serialization is adapted from an official sample in the Gumbo repo. It's actually licensed under another license (that single file), which is the Apache2 I believe, but the original author Kevin Hendrix gave me permission to take it under the MIT (he gave that permission in a bug thread I opened on the Gumbo repo).

Anyway you're right, because the serialization is actually the place where you perform mutations. You use the selectors to grab things and then initiate the serialization. During that serialization, your selected nodes will be given back to you through a simple interface where you can either:

  1. Modify their values.
  2. Return nothing, effectively deleting the node and all of its children.
  3. Inject completely different, hand-written HTML instead.

Anyway I will get to this and StahpIt/HttpFilteringEngine eventually, I'm just swamped with private work right now.

Update
One more thing. This mutation API is rather limited in the sense that it's meant for one-off transformations of parsed HTML. It's not fully dynamic, where you can keep applying sequential mutations. In order to do this, you'd need to do it in passes, where you serialize in a pass, create a new document from that serialized string, rinse and repeat.

The reason for this is that there's some really heavy duty hashmaps and such being constructed when you parse a document, and this only happens once. It's slightly expensive, and static (once compiled for a document, it doesn't get recompiled). The purpose of this is because it speeds up selection dramatically. All tag names, tag property keys and values are indexed through unordered_map and map, and also in scoped manner, so that complex selectors are blazing fast (this is where all the speed comes from). The only downside is that it's rigid, only done once per parsed document. Mutations cannot currently be reflected in this tree.

from gq.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.