Coder Social home page Coder Social logo

scinfu / swiftsoup Goto Github PK

View Code? Open in Web Editor NEW
4.3K 61.0 327.0 1.88 MB

SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)

Home Page: https://scinfu.github.io/SwiftSoup/

License: MIT License

Ruby 0.21% Swift 99.77% C 0.02%
swift swiftsoup parse html-document dom extract selector html

swiftsoup's Introduction

SwiftSoup

StandWithPalestine Platform OS X | iOS | tvOS | watchOS | Linux SPM compatible 🐧 linux: ready Carthage compatible Build Status Version License Twitter

ReadMeSupportPalestine

SwiftSoup is a pure Swift library, cross-platform (macOS, iOS, tvOS, watchOS and Linux!), for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jQuery-like methods. SwiftSoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

  • Scrape and parse HTML from a URL, file, or string
  • Find and extract data, using DOM traversal or CSS selectors
  • Manipulate the HTML elements, attributes, and text
  • Clean user-submitted content against a safe white-list, to prevent XSS attacks
  • Output tidy HTML SwiftSoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; SwiftSoup will create a sensible parse tree.

Swift

Swift 5 >=2.0.0

Swift 4.2 1.7.4

Installation

Cocoapods

SwiftSoup is available through CocoaPods. To install it, simply add the following line to your Podfile:

pod 'SwiftSoup'

Carthage

SwiftSoup is also available through Carthage. To install it, simply add the following line to your Cartfile:

github "scinfu/SwiftSoup"

Swift Package Manager

SwiftSoup is also available through Swift Package Manager. To install it, simply add the dependency to your Package.Swift file:

...
dependencies: [
    .package(url: "https://github.com/scinfu/SwiftSoup.git", from: "2.6.0"),
],
targets: [
    .target( name: "YourTarget", dependencies: ["SwiftSoup"]),
]
...

Try

Try out the simple online CSS selectors site:

SwiftSoup Test Site

Try out the example project opening Terminal and type:

pod try SwiftSoup

SwiftSoup SwiftSoup

To parse an HTML document:

do {
   let html = "<html><head><title>First parse</title></head>"
       + "<body><p>Parsed HTML into a doc.</p></body></html>"
   let doc: Document = try SwiftSoup.parse(html)
   return try doc.text()
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}
  • Unclosed tags (e.g. <p>Lorem <p>Ipsum parses to <p>Lorem</p> <p>Ipsum</p>)
  • Implicit tags (e.g. a naked <td>Table data</td> is wrapped into a <table><tr><td>...)
  • Reliably creating the document structure (html containing a head and body, and only appropriate elements within the head)

The object model of a document

  • Documents consist of Elements and TextNodes
  • The inheritance chain is: Document extends Element extends Node.TextNode extends Node.
  • An Element contains a list of children Nodes, and has one parent Element. They also have provide a filtered list of child Elements only.

Extract attributes, text, and HTML from elements

Problem

After parsing a document, and finding some elements, you'll want to get at the data inside those elements.

Solution

  • To get the value of an attribute, use the Node.attr(_ String key) method
  • For the text on an element (and its combined children), use Element.text()
  • For HTML, use Element.html(), or Node.outerHtml() as appropriate
do {
    let html: String = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>"
    let doc: Document = try SwiftSoup.parse(html)
    let link: Element = try doc.select("a").first()!
    
    let text: String = try doc.body()!.text() // "An example link."
    let linkHref: String = try link.attr("href") // "http://example.com/"
    let linkText: String = try link.text() // "example"
    
    let linkOuterH: String = try link.outerHtml() // "<a href="http://example.com/"><b>example</b></a>"
    let linkInnerH: String = try link.html() // "<b>example</b>"
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

Description

The methods above are the core of the element data access methods. There are additional others:

  • Element.id()
  • Element.tagName()
  • Element.className() and Element.hasClass(_ String className)

All of these accessor methods have corresponding setter methods to change the data.

Parse a document from a String

Problem

You have HTML in a Swift String, and you want to parse that HTML to get at its contents, or to make sure it's well formed, or to modify it. The String may have come from user input, a file, or from the web.

Solution

Use the static SwiftSoup.parse(_ html: String) method, or SwiftSoup.parse(_ html: String, _ baseUri: String).

do {
    let html = "<html><head><title>First parse</title></head>"
        + "<body><p>Parsed HTML into a doc.</p></body></html>"
    let doc: Document = try SwiftSoup.parse(html)
    return try doc.text()
} catch Exception.Error(let type, let message) {
    print("")
} catch {
    print("")
}

Description

The parse(_ html: String, _ baseUri: String) method parses the input HTML into a new Document. The base URI argument is used to resolve relative URLs into absolute URLs, and should be set to the URL where the document was fetched from. If that's not applicable, or if you know the HTML has a base element, you can use the parse(_ html: String) method.

As long as you pass in a non-null string, you're guaranteed to have a successful, sensible parse, with a Document containing (at least) a head and a body element.

Once you have a Document, you can get at the data using the appropriate methods in Document and its supers Element and Node.

Parsing a body fragment

Problem

You have a fragment of body HTML (e.g. div containing a couple of p tags; as opposed to a full HTML document) that you want to parse. Perhaps it was provided by a user submitting a comment, or editing the body of a page in a CMS.

Solution

Use the SwiftSoup.parseBodyFragment(_ html: String) method.

do {
    let html: String = "<div><p>Lorem ipsum.</p>"
    let doc: Document = try SwiftSoup.parseBodyFragment(html)
    let body: Element? = doc.body()
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

Description

The parseBodyFragment method creates an empty shell document, and inserts the parsed HTML into the body element. If you used the normal SwiftSoup(_ html: String) method, you would generally get the same result, but explicitly treating the input as a body fragment ensures that any bozo HTML provided by the user is parsed into the body element.

The Document.body() method retrieves the element children of the document's body element; it is equivalent to doc.getElementsByTag("body").

Stay safe

If you are going to accept HTML input from a user, you need to be careful to avoid cross-site scripting attacks. See the documentation for the Whitelist based cleaner, and clean the input with clean(String bodyHtml, Whitelist whitelist).

Sanitize untrusted HTML (to prevent XSS)

Problem

You want to allow untrusted users to supply HTML for output on your website (e.g. as comment submission). You need to clean this HTML to avoid cross-site scripting (XSS) attacks.

Solution

Use the SwiftSoup HTML Cleaner with a configuration specified by a Whitelist.

do {
    let unsafe: String = "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>"
    let safe: String = try SwiftSoup.clean(unsafe, Whitelist.basic())!
    // now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

If you supply a whole HTML document, with a <head> tag, the clean(_: String, _: String, _: Whitelist) method will just return the cleaned body HTML. You can clean both <head> and <body> by providing a Whitelist for each tags.

do {
    let unsafe: String = """
    <html>
        <head>
            <title>Hey</title>
            <script>console.log('hi');</script>
        </head>
        <body>
            <p>Hello, world!</p>
        </body>
    </html>
    """

    var headWhitelist: Whitelist = {
        do {
            let customWhitelist = Whitelist.none()
            try customWhitelist
                .addTags("meta", "style", "title")
            return customWhitelist
        } catch {
            fatalError("Couldn't init head whitelist")
        }
    }()

    let unsafeDocument: Document = try SwiftSoup.parse(unsafe)
    let safe: String = try SwiftSoup.Cleaner(headWhitelist: headWhitelist, bodyWhitelist: .relaxed())
                            .clean(unsafeDocument)
                            .html()
    // now: <html><head><title>Hey</title></head><body><p>Hello, world!</p></body></html>
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

Discussion

A cross-site scripting attack against your site can really ruin your day, not to mention your users'. Many sites avoid XSS attacks by not allowing HTML in user submitted content: they enforce plain text only, or use an alternative markup syntax like wiki-text or Markdown. These are seldom optimal solutions for the user, as they lower expressiveness, and force the user to learn a new syntax.

A better solution may be to use a rich text WYSIWYG editor (like CKEditor or TinyMCE). These output HTML, and allow the user to work visually. However, their validation is done on the client side: you need to apply a server-side validation to clean up the input and ensure the HTML is safe to place on your site. Otherwise, an attacker can avoid the client-side Javascript validation and inject unsafe HMTL directly into your site

The SwiftSoup whitelist sanitizer works by parsing the input HTML (in a safe, sand-boxed environment), and then iterating through the parse tree and only allowing known-safe tags and attributes (and values) through into the cleaned output.

It does not use regular expressions, which are inappropriate for this task.

SwiftSoup provides a range of Whitelist configurations to suit most requirements; they can be modified if necessary, but take care.

The cleaner is useful not only for avoiding XSS, but also in limiting the range of elements the user can provide: you may be OK with textual a, strong elements, but not structural div or table elements.

See also

  • See the XSS cheat sheet and filter evasion guide, as an example of how regular-expression filters don't work, and why a safe whitelist parser-based sanitizer is the correct approach.
  • See the Cleaner reference if you want to get a Document instead of a String return
  • See the Whitelist reference for the different canned options, and to create a custom whitelist
  • The nofollow link attribute

Set attribute values

Problem

You have a parsed document that you would like to update attribute values on, before saving it out to disk, or sending it on as a HTTP response.

Solution

Use the attribute setter methods Element.attr(_ key: String, _ value: String), and Elements.attr(_ key: String, _ value: String).

If you need to modify the class attribute of an element, use the Element.addClass(_ className: String) and Element.removeClass(_ className: String) methods.

The Elements collection has bulk attribute and class methods. For example, to add a rel="nofollow" attribute to every a element inside a div:

do {
    try doc.select("div.comments a").attr("rel", "nofollow")
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

Description

Like the other methods in Element, the attr methods return the current Element (or Elements when working on a collection from a select). This allows convenient method chaining:

do {
    try doc.select("div.masthead").attr("title", "swiftsoup").addClass("round-box")
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

Set the HTML of an element

Problem

You need to modify the HTML of an element.

Solution

Use the HTML setter methods in Element:

do {
    let doc: Document = try SwiftSoup.parse("<div>One</div><span>One</span>")
    let div: Element = try doc.select("div").first()! // <div>One</div>
    try div.html("<p>lorem ipsum</p>") // <div><p>lorem ipsum</p></div>
    try div.prepend("<p>First</p>")
    try div.append("<p>Last</p>")
    print(div)
    // now div is: <div><p>First</p><p>lorem ipsum</p><p>Last</p></div>
    
    let span: Element = try doc.select("span").first()! // <span>One</span>
    try span.wrap("<li><a href='http://example.com/'></a></li>")
    print(doc)
    // now: <html><head></head><body><div><p>First</p><p>lorem ipsum</p><p>Last</p></div><li><a href="http://example.com/"><span>One</span></a></li></body></html>
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

Discussion

  • Element.html(_ html: String) clears any existing inner HTML in an element, and replaces it with parsed HTML.
  • Element.prepend(_ first: String) and Element.append(_ last: String) add HTML to the start or end of an element's inner HTML, respectively
  • Element.wrap(_ around: String) wraps HTML around the outer HTML of an element.

See also

You can also use the Element.prependElement(_ tag: String) and Element.appendElement(_ tag: String) methods to create new elements and insert them into the document flow as a child element.

Setting the text content of elements

Problem

You need to modify the text content of an HTML document.

Solution

Use the text setter methods of Element:

do {
    let doc: Document = try SwiftSoup.parse("<div></div>")
    let div: Element = try doc.select("div").first()! // <div></div>
    try div.text("five > four") // <div>five &gt; four</div>
    try div.prepend("First ")
    try div.append(" Last")
    // now: <div>First five &gt; four Last</div>
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

Discussion

The text setter methods mirror the [[HTML setter|Set the HTML of an element]] methods:

  • Element.text(_ text: String) clears any existing inner HTML in an element, and replaces it with the supplied text.
  • Element.prepend(_ first: String) and Element.append(_ last: String) add text nodes to the start or end of an element's inner HTML, respectively The text should be supplied unencoded: characters like <, > etc will be treated as literals, not HTML.

Use DOM methods to navigate a document

Problem

You have a HTML document that you want to extract data from. You know generally the structure of the HTML document.

Solution

Use the DOM-like methods available after parsing HTML into a Document.

do {
    let html: String = "<a id=1 href='?foo=bar&mid&lt=true'>One</a> <a id=2 href='?foo=bar&lt;qux&lg=1'>Two</a>"
    let els: Elements = try SwiftSoup.parse(html).select("a")
    for link: Element in els.array() {
        let linkHref: String = try link.attr("href")
        let linkText: String = try link.text()
    }
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

Description

Elements provide a range of DOM-like methods to find elements, and extract and manipulate their data. The DOM getters are contextual: called on a parent Document they find matching elements under the document; called on a child element they find elements under that child. In this way you can window in on the data you want.

Finding elements

  • getElementById(_ id: String)
  • getElementsByTag(_ tag:String)
  • getElementsByClass(_ className: String)
  • getElementsByAttribute(_ key: String) (and related methods)
  • Element siblings: siblingElements(), firstElementSibling(), lastElementSibling(), nextElementSibling(), previousElementSibling()
  • Graph: parent(), children(), child(_ index: Int)

Element data

  • attr(_ key: Strin) to get and attr(_ key: String, _ value: String) to set attributes
  • attributes() to get all attributes
  • id(), className() and classNames()
  • text() to get and text(_ value: String) to set the text content
  • html() to get and html(_ value: String) to set the inner HTML content
  • outerHtml() to get the outer HTML value
  • data() to get data content (e.g. of script and style tags)
  • tag() and tagName()

Manipulating HTML and text

  • append(_ html: String), prepend(html: String)
  • appendText(text: String), prependText(text: String)
  • appendElement(tagName: String), prependElement(tagName: String)
  • html(_ value: String)

Use selector syntax to find elements

Problem

You want to find or manipulate elements using a CSS or jQuery-like selector syntax.

Solution

Use the Element.select(_ selector: String) and Elements.select(_ selector: String) methods:

do {
    let doc: Document = try SwiftSoup.parse("...")
    let links: Elements = try doc.select("a[href]") // a with href
    let pngs: Elements = try doc.select("img[src$=.png]")
    // img with src ending .png
    let masthead: Element? = try doc.select("div.masthead").first()
    // div with class=masthead
    let resultLinks: Elements? = try doc.select("h3.r > a") // direct a after h3
} catch Exception.Error(let type, let message) {
    print(message)
} catch {
    print("error")
}

Description

SwiftSoup elements support a CSS (or jQuery) like selector syntax to find matching elements, that allows very powerful and robust queries.

The select method is available in a Document, Element, or in Elements. It is contextual, so you can filter by selecting from a specific element, or by chaining select calls.

Select returns a list of Elements (as Elements), which provides a range of methods to extract and manipulate the results.

Selector overview

  • tagname: find elements by tag, e.g. a
  • ns|tag: find elements by tag in a namespace, e.g. fb|name finds <fb:name> elements
  • #id: find elements by ID, e.g. #logo
  • .class: find elements by class name, e.g. .masthead
  • [attribute]: elements with attribute, e.g. [href]
  • [^attr]: elements with an attribute name prefix, e.g. [^data-] finds elements with HTML5 dataset attributes
  • [attr=value]: elements with attribute value, e.g. [width=500] (also quotable, like [data-name='launch sequence'])
  • [attr^=value], [attr$=value], [attr*=value]: elements with attributes that start with, end with, or contain the value, e.g. [href*=/path/]
  • [attr~=regex]: elements with attribute values that match the regular expression; e.g. img[src~=(?i)\.(png|jpe?g)]
  • *: all elements, e.g. *

Selector combinations

  • el#id: elements with ID, e.g. div#logo
  • el.class: elements with class, e.g. div.masthead
  • el[attr]: elements with attribute, e.g. a[href]
  • Any combination, e.g. a[href].highlight
  • Ancestor child: child elements that descend from ancestor, e.g. .body p finds p elements anywhere under a block with class "body"
  • parent > child: child elements that descend directly from parent, e.g. div.content > p finds p elements; and body > * finds the direct children of the body tag
  • siblingA + siblingB: finds sibling B element immediately preceded by sibling A, e.g. div.head + div
  • siblingA ~ siblingX: finds sibling X element preceded by sibling A, e.g. h1 ~ p
  • el, el, el: group multiple selectors, find unique elements that match any of the selectors; e.g. div.masthead, div.logo

Pseudo selectors

  • :lt(n): find elements whose sibling index (i.e. its position in the DOM tree relative to its parent) is less than n; e.g. td:lt(3)
  • :gt(n): find elements whose sibling index is greater than n; e.g. div p:gt(2)
  • :eq(n): find elements whose sibling index is equal to n; e.g. form input:eq(1)
  • :has(selector): find elements that contain elements matching the selector; e.g. div:has(p)
  • :not(selector): find elements that do not match the selector; e.g. div:not(.logo)
  • :contains(text): find elements that contain the given text. The search is case-insensitive; e.g. p:contains(swiftsoup)
  • :containsOwn(text): find elements that directly contain the given text
  • :matches(regex): find elements whose text matches the specified regular expression; e.g. div:matches((?i)login)
  • :matchesOwn(regex): find elements whose own text matches the specified regular expression
  • Note that the above indexed pseudo-selectors are 0-based, that is, the first element is at index 0, the second at 1, etc

Examples

To parse an HTML document from String:

let html = "<html><head><title>First parse</title></head><body><p>Parsed HTML into a doc.</p></body></html>"
guard let doc: Document = try? SwiftSoup.parse(html) else { return }

Get all text nodes:

guard let elements = try? doc.getAllElements() else { return html }
for element in elements {
    for textNode in element.textNodes() {
        [...]
    }
}

Set CSS using SwiftSoup:

try doc.head()?.append("<style>html {font-size: 2em}</style>")

Get HTML value

let html = "<div class=\"container-fluid\">"
    + "<div class=\"panel panel-default \">"
    + "<div class=\"panel-body\">"
    + "<form id=\"coupon_checkout\" action=\"http://uat.all.com.my/checkout/couponcode\" method=\"post\">"
    + "<input type=\"hidden\" name=\"transaction_id\" value=\"4245\">"
    + "<input type=\"hidden\" name=\"lang\" value=\"EN\">"
    + "<input type=\"hidden\" name=\"devicetype\" value=\"\">"
    + "<div class=\"input-group\">"
    + "<input type=\"text\" class=\"form-control\" id=\"coupon_code\" name=\"coupon\" placeholder=\"Coupon Code\">"
    + "<span class=\"input-group-btn\">"
    + "<button class=\"btn btn-primary\" type=\"submit\">Enter Code</button>"
    + "</span>"
    + "</div>"
    + "</form>"
    + "</div>"
    + "</div>"
guard let doc: Document = try? SwiftSoup.parse(html) else { return } // parse html
let elements = try doc.select("[name=transaction_id]") // query
let transaction_id = try elements.get(0) // select first element
let value = try transaction_id.val() // get value
print(value) // 4245

How to remove all the html from a string

guard let doc: Document = try? SwiftSoup.parse(html) else { return } // parse html
guard let txt = try? doc.text() else { return }
print(txt)

How to get and update XML values

let xml = "<?xml version='1' encoding='UTF-8' something='else'?><val>One</val>"
guard let doc = try? SwiftSoup.parse(xml, "", Parser.xmlParser()) else { return }
guard let element = try? doc.getElementsByTag("val").first() else { return } // Find first element
try element.text("NewValue") // Edit Value
let valueString = try element.text() // "NewValue"

How to get all <img src>

do {
    let doc: Document = try SwiftSoup.parse(html)
    let srcs: Elements = try doc.select("img[src]")
    let srcsStringArray: [String?] = srcs.array().map { try? $0.attr("src").description }
    // do something with srcsStringArray
} catch Exception.Error(_, let message) {
    print(message)
} catch {
    print("error")
}

Get all href of <a>

let html = "<a id=1 href='?foo=bar&mid&lt=true'>One</a> <a id=2 href='?foo=bar&lt;qux&lg=1'>Two</a>"
guard let els: Elements = try? SwiftSoup.parse(html).select("a") else { return }
for element: Element in els.array() {
    print(try? element.attr("href"))
}

Output:

"?foo=bar&mid&lt=true"
"?foo=bar<qux&lg=1"

Escape and Unescape

let text = "Hello &<> Å å π 新 there ¾ © »"

print(Entities.escape(text))
print(Entities.unescape(text))


print(Entities.escape(text, OutputSettings().encoder(String.Encoding.ascii).escapeMode(Entities.EscapeMode.base)))
print(Entities.escape(text, OutputSettings().charset(String.Encoding.ascii).escapeMode(Entities.EscapeMode.extended)))
print(Entities.escape(text, OutputSettings().charset(String.Encoding.ascii).escapeMode(Entities.EscapeMode.xhtml)))
print(Entities.escape(text, OutputSettings().charset(String.Encoding.utf8).escapeMode(Entities.EscapeMode.extended)))
print(Entities.escape(text, OutputSettings().charset(String.Encoding.utf8).escapeMode(Entities.EscapeMode.xhtml)))

Output:

"Hello &amp;&lt;&gt; Å å π 新 there ¾ © »"
"Hello &<> Å å π 新 there ¾ © »"


"Hello &amp;&lt;&gt; &Aring; &aring; &#x3c0; &#x65b0; there &frac34; &copy; &raquo;"
"Hello &amp;&lt;&gt; &angst; &aring; &pi; &#x65b0; there &frac34; &copy; &raquo;"
"Hello &amp;&lt;&gt; &#xc5; &#xe5; &#x3c0; &#x65b0; there &#xbe; &#xa9; &#xbb;"
"Hello &amp;&lt;&gt; Å å π 新 there ¾ © »"
"Hello &amp;&lt;&gt; Å å π 新 there ¾ © »"

Author

Nabil Chatbi, [email protected]

Note

SwiftSoup was ported to Swift from Java Jsoup library.

License

SwiftSoup is available under the MIT license. See the LICENSE file for more info.

swiftsoup's People

Contributors

0xtim avatar aehlke avatar aloisius avatar beerpiss avatar cendolinside123 avatar chickdan avatar danramteke avatar fassko avatar garthsnyder avatar gemiren avatar hactar avatar jeffreyca avatar kdubb avatar khaptonstall avatar lutzifer avatar magetron avatar matadan avatar mkscrg avatar mshibanami avatar muxinqi avatar paccos avatar ptrkstr avatar rlovelett avatar romanpodymov avatar roslund avatar samalone avatar scinfu avatar siuying avatar snq-2001 avatar valentinperignon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

swiftsoup's Issues

Is it possible to use SwiftSoup from Obj-C?

I'm working on a legacy project that mixes Obj-C and Swift code, and want to replace string matching with SwiftSoup.

I'm pulling in SwiftSoup as a dependency with CocoaPods, and can use it from Swift without issue. Is there a way I can use SwiftSoup in the Obj-C code as well?

Parser parseBodyFragment memory leak

Hi,
It is parsing the HTML correctly but when I was debugging the Application and it looks like Parser has a memory leak.
It is reflecting leak on Parser.parseBodyFragment line . Please check the screenshot.

screen shot 2017-06-26 at 2 25 03 am

Missing canned Whitelist

I tried using your code sample in my project.

do{
    let unsafe: String = "<p><a href='http://example.com/' onclick='stealCookies()'>Link</a></p>"
    let safe: String = try SwiftSoup.clean(unsafe, Whitelist.basic())!
    // now: <p><a href="http://example.com/" rel="nofollow">Link</a></p>
}catch Exception.Error(let type, let message){
    print(message)
}catch{
    print("error")
}

There's an error: Type 'Whitelist' has no member 'basic'. SwiftSoup.Whitelist.basic() also isn't defined.

Looks like there should be Whitelist with some pre-canned rules: https://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html

How to get rid of new line symbol generated by `try html()`

Hello! I have a string:

hello<span class='smile'>TT</span>ios<span class='smile'>QQ</span>world

And I want to transform it to 5 custom objects:

[TheString, TheSmile, TheString, TheSmile, TheString]

This way I get an Element from the string:

let element = try SwiftSoup.parse(string).body()!

Most of the time I do the parsing recursively, so I don't know the initial html string which constructed the Element, but I have found the way to get this string:

let initialHtmlString = try html()

And I got this:

hello\n<span class='smile'>TT</span>ios\n<span class='smile'>QQ</span>world

Then I take each child's try outerHtml() and cut it out from the string. In the end I got 5 objects:

TheString = hello\n
TheSmile = TT
TheString = ios\n
TheSmile = QQ
TheString = world

The problem is: 'try html()' adds new line symbol which affects the final result. Is there a way to get the result without undesired new line symbols?

Thanks

Thai language characters

Thai language is not displaying properly in UIWebview.  We are running the html string through SwiftSoup, and the resulting string is manipulating the Thai characters.  Here is the test example I've created:

"<br>\r\n<br>\r\n<font size=\"2\" face=\"sans-serif\">Lorem Ipsum บังคับ Amet นั่ง, consectetur adipiscing Elit Maecenas หุบเขาญี่ปุ่นเพียงอย่างเดียว ตอนนี้เวลาที่จะลงทุนโปรตีน เสียงนุ่มของสมาชิกในการไว้ทุกข์ VespaCam หุบเขากล่อง การเชื่อมโยงเว็บไซต์ Semper ใน ประสิทธิภาพการทำงานของเส้นผ่าศูนย์กลางใด ๆ แต่การดูแลผู้เล่นฟุตบอลสำนักฟุตบอลสมาร์ทและเรียงลำดับความต้องการ สดนำไปหักลดหย่อนเส้นผ่าศูนย์กลาง ID Ut Libero แต่ละช่วงเวลา lorem ถั่วลิสงได้บังคับ แต่ dui ราคา malesuada แต่ละเครือข่ายแครอทวันหยุดสุดสัปดาห์ ไม่มีน้ำซุปหรือซอส ullamcorper ullamcorper กลัวใครใหญ่กว่าลูกศรสารต้านอนุมูลอิสระ</font>"

We then take this string and run it through SwiftSoup.parse using this code:

let originalDocument = try SwiftSoup.parse(testString.replacingOccurrences(of: "\r\n", with: "\n"))

This is the result output:

Document <html><head></head><body><br> <br> <font size="2" face="sans-serif">Lorem Ipsum บงคบ Amet นง, consectetur adipiscing Elit Maecenas หบเขาญปนเพยงอยางเดยว ตอนนเวลาทจะลงทนโปรตน เสยงนมของสมาชกในการไวทกข VespaCam หบเขากลอง การเชอมโยงเวบไซต Semper ใน ประสทธภาพการทงานของเสนผาศนยกลางใด ๆ แตการดแลผเลนฟตบอลสนกฟตบอลสมารทและเรยงลดบความตองการ สดนไปหกลดหยอนเสนผาศนยกลาง ID Ut Libero แตละชวงเวลา lorem ถวลสงไดบงคบ แต dui ราคา malesuada แตละเครอขายแครอทวนหยดสดสปดาห ไมมนซปหรอซอส ullamcorper ullamcorper กลวใครใหญกวาลกศรสารตานอนมลอสระ</font></body>
</html>

Does not build with SPM

The CleanerTest file is outside of the SwiftSoupTests directory meaning that it cannot be used with Swift Package Manager.

SwiftSoup in Xcode 9

Hi there,

I can't import SwiftSoup into my Xcode 9 project.
When typing in "import SwiftSoup" I get the following error: "No such module 'SwiftSoup' and 34 other compiler errors...

Is it planned to update SwiftSoup for Xcode 9? Any time plan when its ready?

How can I joined '\n' with the .text() ?

eg: let html: String = "

One one

<span style="color:#333333;font-family:"font-size:16px;background-color:#FFFFFF;">Two two
<span style="color:#333333;font-family:"font-size:16px;background-color:#FFFFFF;"> Three three
<span style="color:#333333;font-family:"font-size:16px;background-color:#FFFFFF;"> Four four"
let doc: Document = try SwiftSoup.parse(html)
return try doc.text()
Then, I want to joined '\n' with eath text, like this "One one \n Two two \n Three three..."
How can I do with it ? Thank you for your reply.

int has no member multipliedReportingOverflow

I am seeing this error anywhere this method is used throughout swiftsoup.

just for sake of being thourough this error began after installing tesseract and running the pod update function.

I removed tesseract and reinstalled and the error was still occuring

I opened a new project

ran pod init
added swifsoup and set package to version 10.3
ran pod install
opened new workspace
attempted to build blank project with just swift soup and am still recieving the same errors

I have completely reinstalled cocoapods, I have cleared my caches, I have attempted to clean the cache however non of this has helped.

if this is user error I am sorry

Extract video links from html ?

First thanks for great library .
I'd like to extract video files from html documents , like by searching for .mp4 or something , how to do this with swiftsoup ?

How to implement this Jsoup code in Swiftsoup?

How to implement this Jsoup code in Swiftsoup?

String URL = "http://www.example.com";
String USER_AGENT = "mozilla";
Response res;
Document doc;
						
res=Jsoup.connect(URL).userAgent(USER_AGENT)
		.timeout(15000)
		.data("id_bab", id_bab)
		.method(Method.POST)
                .followRedirects(true)
		.ignoreContentType(true)
                .execute();
						
	if(res.statusCode() == 200){	
		doc = res.parse();
	}

Add newer simulators to targeted devices to build with Carthage

  • tvOS
  • watchOS

Otherwise can't update with Carthage

» carthage update --platform watchOS
*** Fetching SwiftSoup
*** Checking out SwiftSoup at "1.6.2"
*** xcodebuild output can be found in /var/folders/wl/qnr2g12d0_b8gt90wzr0hlq40000gn/T/carthage-xcodebuild.NSZJeb.log
*** Building scheme "SwiftSoup-watchOS" in SwiftSoup.xcodeproj
Build Failed
	Task failed with exit code 70:
	/usr/bin/xcrun xcodebuild -project /Users/kristaps/Documents/ios/TartuWeatherProvider/Carthage/Checkouts/SwiftSoup/SwiftSoup.xcodeproj -scheme SwiftSoup-watchOS -configuration Release -derivedDataPath /Users/kristaps/Library/Caches/org.carthage.CarthageKit/DerivedData/9.2_9C40b/SwiftSoup/1.6.2 -sdk watchsimulator -destination platform=watchOS\ Simulator,id=5AE2A1D5-AF63-4E0B-BD98-E549C495F9AE -destination-timeout 3 ONLY_ACTIVE_ARCH=NO CODE_SIGNING_REQUIRED=NO CODE_SIGN_IDENTITY= CARTHAGE=YES build (launched in /Users/kristaps/Documents/ios/TartuWeatherProvider/Carthage/Checkouts/SwiftSoup)

This usually indicates that project itself failed to compile. Please check the xcodebuild log for more details: /var/folders/wl/qnr2g12d0_b8gt90wzr0hlq40000gn/T/carthage-xcodebuild.NSZJeb.log

Get all text nodes?

How can I get all text nodes in a document, process/change the text and get the resulting document?

Shell Script Invocation Error

/Users/alex/Library/Developer/Xcode/DerivedData/Forecast-fwijjcxzzzlsqafojqsrnnuecwpm/Build/Products/Debug-iphoneos/Forecast.app/Frameworks/SwiftSoup.framework: unknown error -1=ffffffffffffffff
Command /bin/sh failed with exit code 1

Running on Iphone 5(IOS 10.0)
All other devices on IOS 11+ works correctly

How to manage SwiftSoup by Carthage? It builds failed.

SwiftSoup scheme builds well in the project.
However , when I try to manage it in Carthage , it builds failed .

I'am no sure which cause this error , the project itself or the Carthage?

Following is some of the the error message logged by Carthage, I hope you can give me some suggestion.

Xcode 8.2.1 (8C1002)
Swift 3.0.2
Carthage 0.23
SwiftSoup tag 1.4.2

Undefined symbols for architecture arm64:
  "_FE9SwiftSoupScG9AmpersandSc", referenced from:
      function signature specialization <Arg[0] = Owned To Guaranteed> of SwiftSoup.TokeniserState.read (SwiftSoup.Tokeniser, SwiftSoup.CharacterReader) throws -> () in TokeniserState.o
  "_FE9SwiftSoupScG10BackslashFSc", referenced from:
      function signature specialization <Arg[0] = Owned To Guaranteed> of SwiftSoup.TokeniserState.read (SwiftSoup.Tokeniser, SwiftSoup.CharacterReader) throws -> () in TokeniserState.o
  "_FE9SwiftSoupScG8LessThanSc", referenced from:
      function signature specialization <Arg[0] = Owned To Guaranteed> of SwiftSoup.TokeniserState.read (SwiftSoup.Tokeniser, SwiftSoup.CharacterReader) throws -> () in TokeniserState.o
ld: symbol(s) not found for architecture arm64
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Ld /Users/umisky/Library/Caches/org.carthage.CarthageKit/DerivedData/SwiftSoup/1.2.9/Build/Intermediates/SwiftSoup.build/Release-iphoneos/SwiftSoup.build/Objects-normal/armv7/SwiftSoup normal armv7
    cd /Users/umisky/Desktop/Reimu/Carthage/Checkouts/SwiftSoup
    export IPHONEOS_DEPLOYMENT_TARGET=9.0
    export PATH="/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/usr/bin:/Applications/Xcode.app/Contents/Developer/usr/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"
    /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang -arch armv7 -dynamiclib -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS10.2.sdk -L/Users/umisky/Library/Caches/org.carthage.CarthageKit/DerivedData/SwiftSoup/1.2.9/Build/Products/Release-iphoneos -F/Users/umisky/Library/Caches/org.carthage.CarthageKit/DerivedData/SwiftSoup/1.2.9/Build/Products/Release-iphoneos -filelist /Users/umisky/Library/Caches/org.carthage.CarthageKit/DerivedData/SwiftSoup/1.2.9/Build/Intermediates/SwiftSoup.build/Release-iphoneos/SwiftSoup.build/Objects-normal/armv7/SwiftSoup.LinkFileList -install_name @rpath/SwiftSoup.framework/SwiftSoup -Xlinker -rpath -Xlinker @executable_path/Frameworks -Xlinker -rpath -Xlinker @loader_path/Frameworks -miphoneos-version-min=9.0 -dead_strip -Xlinker -object_path_lto -Xlinker /Users/umisky/Library/Caches/org.carthage.CarthageKit/DerivedData/SwiftSoup/1.2.9/Build/Intermediates/SwiftSoup.build/Release-iphoneos/SwiftSoup.build/Objects-normal/armv7/SwiftSoup_lto.o -fembed-bitcode -Xlinker -bitcode_verify -Xlinker -bitcode_hide_symbols -Xlinker -bitcode_symbol_map -Xlinker /Users/umisky/Library/Caches/org.carthage.CarthageKit/DerivedData/SwiftSoup/1.2.9/Build/Products/Release-iphoneos -fobjc-link-runtime -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift/iphoneos -Xlinker -add_ast_path -Xlinker /Users/umisky/Library/Caches/org.carthage.CarthageKit/DerivedData/SwiftSoup/1.2.9/Build/Intermediates/SwiftSoup.build/Release-iphoneos/SwiftSoup.build/Objects-normal/armv7/SwiftSoup.swiftmodule -single_module -compatibility_version 1 -current_version 1 -Xlinker -dependency_info -Xlinker /Users/umisky/Library/Caches/org.carthage.CarthageKit/DerivedData/SwiftSoup/1.2.9/Build/Intermediates/SwiftSoup.build/Release-iphoneos/SwiftSoup.build/Objects-normal/armv7/SwiftSoup_dependency_info.dat -o /Users/umisky/Library/Caches/org.carthage.CarthageKit/DerivedData/SwiftSoup/1.2.9/Build/Intermediates/SwiftSoup.build/Release-iphoneos/SwiftSoup.build/Objects-normal/armv7/SwiftSoup
Undefined symbols for architecture armv7:
  "_FE9SwiftSoupScG9AmpersandSc", referenced from:
      function signature specialization <Arg[0] = Owned To Guaranteed> of SwiftSoup.TokeniserState.read (SwiftSoup.Tokeniser, SwiftSoup.CharacterReader) throws -> () in TokeniserState.o
  "_FE9SwiftSoupScG8LessThanSc", referenced from:
      function signature specialization <Arg[0] = Owned To Guaranteed> of SwiftSoup.TokeniserState.read (SwiftSoup.Tokeniser, SwiftSoup.CharacterReader) throws -> () in TokeniserState.o
ld: symbol(s) not found for architecture armv7

** BUILD FAILED **


The following build commands failed:
	Ld /Users/umisky/Library/Caches/org.carthage.CarthageKit/DerivedData/SwiftSoup/1.2.9/Build/Intermediates/SwiftSoup.build/Release-iphoneos/SwiftSoup.build/Objects-normal/arm64/SwiftSoup normal arm64
(1 failure)

Example for inlining style for email HTML

I have a use case that involves reading a single-file HTML with <style> tags, and inlining the styles in the body's included tags. This would be for email HTML, where classes are not usually allowed, but inline styles are.

'substring(with:)' is deprecated: Please use String slicing subscript. (Swift 4)

There are some errors in the new Xcode 9 (Swift 4) for String.swift file, that can be solved easily (I've commented the old code..) :

func indexOf(_ substring: String, _ offset: Int ) -> Int {
        if(offset > characters.count) {return -1}

        let maxIndex = self.characters.count - substring.characters.count
        if(maxIndex >= 0) {
            for index in offset...maxIndex {
                let rangeSubstring = self.characters.index(self.startIndex, offsetBy: index)..<self.characters.index(self.startIndex, offsetBy: index + substring.characters.count)
                if self[rangeSubstring] == substring {
                //if self.substring(with: rangeSubstring) == substring {
                    return index
                }
            }
        }
        return -1
    }

and:

static func split(_ value: String, _ offset: Int, _ count: Int) -> String {
        let start = value.index(value.startIndex, offsetBy: offset)
        let end = value.index(value.startIndex, offsetBy: count+offset)
        let range = start..<end
        //return value.substring(with: range)
        return String(value[range])
    }

Change Log

Hello! Your framework is popular enough to start logging all changes! This is my favourite Change Log as example. Thanks in advance!

How can I get the checked state of an input?

Hello,

if I have some HTML:

How can I determine if the input is checked or not? I've tried checking if the element has the attribute "checked" but that doesn't work, and when I print out all the attributes, checked isn't in the list.

Leading blank space after parsing

Lib version: 1.4.2
Swift 3
iPhone 7 - iOS 10.3

We have the following HTML where after parsing add an extra leading blank space.

HTML:
<html><body><div>\r\n<div dir=\"ltr\">\r\n<div id=\"divtagdefaultwrapper\"><font face=\"Calibri,Helvetica,sans-serif\" size=\"3\" color=\"black\"><span style=\"font-size:12pt;\" id=\"divtagdefaultwrapper\">\r\n<div style=\"margin-top:0;margin-bottom:0;\">&nbsp;TEST</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\">TEST</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\">TEST</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\"><br>\r\n\r\n</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\">TEST</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\">TEST</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\">TEST</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\"><br>\r\n\r\n</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\"><br>\r\n\r\n</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\">TEST</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\">TEST</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\">TEST</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\"><br>\r\n\r\n</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\"><br>\r\n\r\n</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\"><br>\r\n\r\n</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\"><br>\r\n\r\n</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\"><br>\r\n\r\n</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\"><br>\r\n\r\n</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\"><br>\r\n\r\n</div>\r\n<div style=\"margin-top:0;margin-bottom:0;\">TEST</div>\r\n</span></font></div>\r\n</div>\r\n</div>\r\n</body></html>

Code to parse:
let doc: Document = try SwiftSoup.parse(contentHtml)
var text = try doc.text()

After parsing:
"TEST\r\n TEST\r\n TEST\r\n \r\n\r\n\r\n TEST\r\n TEST\r\n TEST\r\n \r\n\r\n\r\n \r\n\r\n\r\n TEST\r\n TEST\r\n TEST\r\n \r\n\r\n\r\n \r\n\r\n\r\n \r\n\r\n\r\n \r\n\r\n\r\n \r\n\r\n\r\n \r\n\r\n\r\n \r\n\r\n\r\n TEST"

Dynamic library linker warning

When building an extension that uses SwiftSoup I get a linker warning:

ld: warning: linking against a dylib which is not safe for use in application extensions: /Users/.../Carthage/Build/iOS/SwiftSoup.framework/SwiftSoup

Memory leak

SwiftSoup seems to have some leaked objects (Attributes, Document, Tag etc.), when I use it.

Is this due to the library or my implementation?

Parse process never finished for real we sites

Example:

import Just
import SwiftSoup

let url: String = "http://comcast.net"
let response: HTTPResult = Just.get(url)
let data: Data = response.content
let html = String(data: data!, encoding: String.Encoding.utf8)

do {
let doc: Document = try SwiftSoup.parse(html!)
print(try doc.text())
}
catch {
print("Error")
}

It never prints anything, just runs forever

Could not cast value of type 'Swift.Optional<SwiftSoup.Element>'

I am randomly downloading html pages and using SwiftSoup to parse them (1.6.0)

The odd page just gives me the following

Could not cast value of type 'Swift.Optional<SwiftSoup.Element>' (0x103049718) to 'SwiftSoup.Element' (0x100b4ded0).
2018-02-15 19:21:05.242422+0000 burf[42885:2056856] Could not cast value of type 'Swift.Optional<SwiftSoup.Element>' (0x103049718) to 'SwiftSoup.Element' (0x100b4ded0).

HTML page url "http://policewb.gov.in/"

Seems to be something in

func replaceActiveFormattingElement(_ out: Element, _ input: Element)throws {
        try formattingElements = replaceInQueue(formattingElements as! Array<Element>, out, input)//todo: testare as! non è bello
    }

Is there anyway I can handle this issue without it killing my program

Parser don't work when string has 0 notes

public static func parseBodyFragment(_ bodyHtml: String, _ baseUri: String)throws->Document {
		let doc: Document = Document.createShell(baseUri)
		if let body: Element = doc.body() {
			let nodeList: Array<Node> = try parseFragment(bodyHtml, body, baseUri)
			//var nodes: [Node] = nodeList.toArray(Node[nodeList.size()]) // the node list gets modified when re-parented
            if nodeList.count > 0 {
            for i in 1..<nodeList.count{
				try nodeList[i].remove()
			}
			for node: Node in nodeList {
				try body.appendChild(node)
			}
            }
		}
		return doc
	}

I have added nodeList.count for that.

Carthage Build Fails for Mac Platform

I added this to my Cartfile:

github "scinfu/SwiftSoup"

And then did this:

carthage update --platform mac

And it failed with:

*** Skipped building SwiftSoup due to the error:
Dependency "SwiftSoup" has no shared framework schemes for any of the platforms: Mac

Any suggestions? Thanks! :)

Question regarding SwiftSoup.parse(_:_:)

@param baseUri The URL where the HTML was retrieved from. Used to resolve relative URLs to absolute URLs, that occur before the HTML declares a <base href> tag.

Is this parameter intended to resolve the value of any href tag into an absolute URL?

Sample code throws error

When trying to use the following sample code:

do{
let html = "<title>First parse</title>"
+ "

Parsed HTML into a doc.

"
let doc: Document = try SwiftSoup.parse(html)
return try doc.text()
}catch Exception.Error(let type, let message){
print(message)
}catch{
print("error")
}

Xcode returns the following error on the return try line:
Unexpected non-void return value in void function

outerHTML() will add spaces and new lines wrongly

(the answer of issue #26 cannot solve my problem)
Hi, when I use parse() method to parse a html file, and then write the dom outer html to a new file, it will contain wrong spaces. For example, in the raw html, it has:
<div><p>a<p></div>

but in outerHtml():

<div>
    <p>
         a
    </p>
</div>

in this case, the position of 'a' change when render the html page.

Is there any method or solution to solve this problem?

SwiftSoup.parse(_:_:) fails to correctly parse this iframe

<iframe src=\"https://www.googletagmanager.com/ns.html?id=GTM-M48W9J\"\n\t\t                  height=\"0\" width=\"0\" style=\"display:none;visibility:hidden\"></iframe>

Here are its attributes:

[
    "src": SwiftSoup.Attribute,
    "": SwiftSoup.BooleanAttribute,
    "height": SwiftSoup.Attribute,
    "width": SwiftSoup.Attribute,
    "style": SwiftSoup.Attribute
 ]

It crashes in ParseSettings.normalizeAttributes() on this line:

try attr.setKey(key: attr.getKey().lowercased())

because attr.getKey() is empty.

Code to reproduce:

SwiftSoup.parse(html, "https://9to5mac.com")

What is the iOS minimum version to support and what architecture?

I receive an error while making release saying the below message:
Undefined symbols for architecture armv7:
"_FE9SwiftSoupScG9AmpersandSc", referenced from:
function signature specialization <Arg[0] = Owned To Guaranteed> of SwiftSoup.TokeniserState.read (SwiftSoup.Tokeniser, SwiftSoup.CharacterReader) throws -> () in TokeniserState.o
"_FE9SwiftSoupScG8LessThanSc", referenced from:
function signature specialization <Arg[0] = Owned To Guaranteed> of SwiftSoup.TokeniserState.read (SwiftSoup.Tokeniser, SwiftSoup.CharacterReader) throws -> () in TokeniserState.o
ld: symbol(s) not found for architecture armv7
clang: error: linker command failed with exit code 1 (use -v to see invocation)

(debug works fine but only getting release generates the error)

CPU is high

thank you html parser , it's a very smart tool like jsoup. but i found when i use it , the cpu is very high.

I compare it with other tool base on libxml2 (like Fuzi, kanna).i test in my project , fuzzy and kanna use 10%-15% cpu when parse one web site, but swiftsoup is 60% and more.

but I like this project because of liking jsoup too.

Example_ not working

Hello
Thanks for the project well i run pod install on example_ and after i run the project nothing appears how can i try it plz help?

Convert Document back to HTML string?

The README needs a code example of how to get the entire Document instance and generate the full HTML of it back into a String including html, head, body sections. In the interim, how do I do it? the html() method seems to remove the head. I'm trying to sanitize html then load it into a WKWebView. Perhaps the clean method is removing it?

            let str = try doc.html()
            let html = "<html>\(str)</html>"
            
            let wl:Whitelist = try Whitelist.relaxed().addEnforcedAttribute("a", "rel", "nofollow")
            let safe: String = try SwiftSoup.clean(html, wl)!  // the output starts with a div tag instead of html!

Linux Swift 3.1 Build Failing

SwiftSoup does not build on Linux with Swift 3.1 due to Apple helpfully renaming RegularExpression to NSRegularExpression on Linux in 3.1... I have no idea why either! I'll get a PR together which sorts this out and updates the CI scripts to run on 3.1 across everything

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.