facelessuser / soupsieve Goto Github PK

View Code? Open in Web Editor NEW

192.0 8.0 35.0 4.83 MB

A modern CSS selector implementation for BeautifulSoup

Home Page: https://facelessuser.github.io/soupsieve/

License: MIT License

Python 100.00%

soup-sieve css-selector html html5 xml python css4 beautifulsoup css

soupsieve's Introduction

Soup Sieve

Overview

Soup Sieve is a CSS selector library designed to be used with Beautiful Soup 4. It aims to provide selecting, matching, and filtering using modern CSS selectors. Soup Sieve currently provides selectors from the CSS level 1 specifications up through the latest CSS level 4 drafts and beyond (though some are not yet implemented).

Soup Sieve was written with the intent to replace Beautiful Soup's builtin select feature, and as of Beautiful Soup version 4.7.0, it now is 🎊. Soup Sieve can also be imported in order to use its API directly for more controlled, specialized parsing.

Soup Sieve has implemented most of the CSS selectors up through the latest CSS draft specifications, though there are a number that don't make sense in a non-browser environment. Selectors that cannot provide meaningful functionality simply do not match anything. Some of the supported selectors are:

.classes
#ids
[attributes=value]
parent child
parent > child
sibling ~ sibling
sibling + sibling
:not(element.class, element2.class)
:is(element.class, element2.class)
parent:has(> child)
and many more

Installation

You must have Beautiful Soup already installed:

pip install beautifulsoup4

In most cases, assuming you've installed version 4.7.0, that should be all you need to do, but if you've installed via some alternative method, and Soup Sieve is not automatically installed, you can install it directly:

pip install soupsieve

If you want to manually install it from source, first ensure that build is installed:

pip install build

Then navigate to the root of the project and build the wheel and install (replacing <ver> with the current version):

python -m build -w
pip install dist/soupsieve-<ver>-py3-none-any.whl

Documentation

Documentation is found here: https://facelessuser.github.io/soupsieve/.

License

MIT

soupsieve's People

Contributors

Stargazers

Watchers

soupsieve's Issues

:blank (maybe)

This selector is currently noted as "at risk" in the specifications. Basically it isn't well defined yet. It will most likely be fleshed out more at some point, but it cannot be implemented in its current state. Once it is actually fleshed out more, we can take a stab at implementing it.

Implement [attr!=value]

This should be equivalent to :not([attr=value]). It is not in the CSS specification, but it would be nice.

Possibly support a QUIRKS flag for bs4

In an issue on the BS4 google groups, a discussion took place in regards to the change in how quoted attributes are treated (Soup Sieve follows the spec, while the old BS4 method is super lax), and that maybe a deprecation path should be provided. There are three cases that were noted:

td.+.class: this case shows that the old BS4 select method would allow a class with no class name to be defined and would treat it as no class was defined. CSS actually won't match anything, Soup Sieve won't match either, but instead opts to fail. If we implemented a QUIRKS mode, this would ignore classes like this, and most likely ids (#) as well.
> p: BS4, before Soup Sieve, would treat this kind of like a relative selector. This was most likely due to omission that it was by design, but in at least one project this was exploited. And where there is one, there are many. QUIRKS mode would most likely inject :scope for the user.

Note: If BS4 actually matched the siblings of the element select was called on in the case of + div, we will not emulate that. Select should match downwards, not laterally. This point I will not budge on, but I'm pretty sure BS4 didn't match these...probably.
[attribute={}]: BS4 used to have a very lax rule for the attribute value. It would allow most anything unquoted as long as it wasn't a double quote or closing square bracket. We cannot relax to this degree. We will continue to recognize both single and double quoted values. But within an unquoted value, will allow a great deal more except whitespace.

QUIRKS mode would only exist for as long as BS4 requires it. It is not guaranteed that we will do this, but it is being considered.

:valid and :invalid support

Support for this would require quite a bit of work. We would need to write proper validators for each kind of input type. I am not sure when this will get done, but it is large enough to be a case unto itself.

~~This would include :in-range and :out-of-range, though as :in-range and :out-of-range is more simple, it is possible that could get implemented first.~~ Moved to a separate issue.

Allow a list of content in :contains()

The :contains() selector came about a long time ago and was abandoned in the CSS spec. We currently mimic contains as it was described originally.

It is important to note if that since the original spec is dead, there will be no updates. I imagine, if :contains() did not die way back when, that it is not impossible to think it would have been expanded to allow a comma-separated list of content: p:contains("some text", "some other text").

As :contains() currently supports valid identifiers or quoted values, a comma-separated list would contains a list of valid identifiers or quoted values. :contains() would match if any of the the items in the list match.

With things like :not() and :lang() moving towards comma-separated lists, and new functional selectors supporting comma-separated lists out of the box (except for a things like :nth-child(), :dir(), etc.), I think this evolution for :contains() makes sense.

Exclude all tests from the package

Please, exclude all tests from the package. Otherwise they'll get installed in site-packages/tests.

--- setup.py.orig	2019-02-19 09:37:15.000000000 +0000
+++ setup.py
@@ -51,7 +51,7 @@ setup(
     author='Isaac Muse',
     author_email='[email protected]',
     url='https://github.com/facelessuser/soupsieve',
-    packages=find_packages(exclude=['tests', 'tools']),
+    packages=find_packages(exclude=['tests', 'tests.*', 'tools']),
     install_requires=get_requirements(),
     license='MIT License',
     classifiers=[

:in-range and :out-of-range

Originally this was planned to be done with :valid and :invalid, but these can be handled separately.

:dir() selector for XML (maybe)

For XML, you can use the its namespace to do dir. I don't have time right now to look into it, but maybe in the future.

Python 2 support for official integration into Beautiful Soup 4

As Soup Sieve is planned for inclusion in Beautiful Soup 4, we need to add Python 2 support.

For the most part, there shouldn't be any huge changes. I am assuming we can work primarily in Unicode. With that we will have to be aware of wide and narrow characters when converting CSS Unicode escapes. I assume we'll use surrogate pairs for wide characters on narrow systems. Outside of that, it should be fairly straight forward.

User definable custom selectors

The idea of allowing user definable custom selectors is a cool idea that is currently in draft: https://drafts.csswg.org/css-extensions/#typedef-custom-selector.

The idea is essentially to allow create an custom pseudo-class as an alias for a more complex expression.

@custom-selector :--heading h1, h2, h3, h4, h5, h6;

:--heading { /* styles for all headings */ }
:--heading + p { /* more styles */ }

There seems to be some open issues for a lot more complexity than what is shown here, but I think it may be reasonable to to allow stuff like this, and then a user could construct aliases for whatever they would like. That way we don't ever actually need to support such things directly ourselves.

When testing HTML, test all HTML parsers (namespace will be handled differently)

Rework testing to so that when we test HTML, we test all parsers, and when we test namespace specific stuff, we test HTML5 and XHTML.

:nth-of-type and :nth-child works only up to the 9th element

Not sure where to file this bug as I'm using soupsieve via BeautifulSoup4, but here goes:

Using BeautifulSoup4 4.7.1, with SoupSieve 1.6.2, under Python 3.6.

import bs4

source = """<html><body>
<div>1</div>
<div>2</div>
<div>3</div>
<div>4</div>
<div>5</div>
<div>6</div>
<div>7</div>
<div>8</div>
<div>9</div>
<div>10</div>
<div>11</div>
<div>12</div>
<div>12</div>
<div>13</div>
<div>14</div>
<div>15</div>
<div>16</div>
</body></html>"""
soup = bs4.BeautifulSoup(source, 'lxml')  # same result with html5lib

print(soup.select("div:nth-of-type(9)"))  # Expect 9, is 9
print(soup.select("div:nth-child(9)"))  # Expect 9, is 9
print(soup.select("div:nth-of-type(10)"))  # Expect 10, is 15
print(soup.select("div:nth-child(10)"))  # Expect 10, is 15
print(soup.select("div:nth-of-type(11)"))  # Expect 11, is 16
print(soup.select("div:nth-child(11)"))  # Expect 11, is 16
print(soup.select("div:nth-of-type(12)"))  # Expect 12, finds nothing
print(soup.select("div:nth-child(12)"))  # Expect 12, finds nothing

It seems to work well with single-digit index, but either returns empty or the wrong element for 10 and upwards.

(Also filed this on BS4 LaunchPad.)

Better selector documentation?

It isn't bad, but it very brief. It more just says, hey you could use these with a brief example without much context. I don't think I need to dedicate a page to each selector, but maybe it would be nice to have each selector have a section that describes the selector in more detail and provides an example with HTML and Soup Sieve parsing it.

It'd be a lot of work, but if I don't get anymore Soup Sieve bugs, I could take my time on it. There probably aren't many more features to add per se. There are some selectors in the backlog, but I can't/am not implementing them right now.

Figure out why Appveyor still can use old pip after upgrading pip

We must somehow be using a secondary pip, or calling the wrong Python when call tox.

We are careful to set our desired Python first in the path, and then use python -m (calling the Python installation we set first in path) to ensure we use its tox. Yet, after upgrading to pip 18.1, we still get an install error for pip 7.x. It doesn't happen all the time, but it is frustrating as the cause is not understood. We need to get to the bottom of this moving forward so that we can have reliable, automated Windows testing.

type attribute and case sensitivity.

Apparently, it is quite common to have the type attribute specifically treated as case insensitive. This seems to be the only attribute that follows this convention. For this reason, the s sensitivity flag has been added to the CSS4 spec: *[type="submit" s]. We should treat type value as case insensitive, and also support s to for sensitivity. In addition, we should ensure that case is not enforced for the the flag itself.

Derive SelectorSyntaxError from Exception instead of SyntaxError

SyntaxError is a builtin exception meant for Python syntax errors, and it is specialized for that task. To avoid confusion for people, we should derive our new SelectorSyntaxError from the general Exception instead. For the 1.0 series, we will leave it as SyntaxError to avoid breakage, but for 2.0, we will make this change.

Reference #105

More descriptive syntax exceptions and maybe add a DEBUG flag

When raising syntax exceptions, some of the exceptions could be phrased better, and also, character position may be nice to display.

It may also be nice to provide a DEBUG flag that will cause verbose messages describing how the CSS pattern is tokenized.

select_one and select

I like the function soup.select() and soup.select_one(). Maybe you can add select_one to this repo? If the result is None, I can easily know that there are no match objects. Just like https://docs.python.org/3/library/re.html#re.search Infact, I don't know why you use limit to control the number of the results. I think people would need one or all of the results.

Compiled object cannot be pickled if contains a NullSelector object

We forgot to register NullSelector to be pickled. So if a compiled selector object contains a NullSelector, it will not pickle.

Possibly allow creating a custom parser with preset namespaces etc.

It might be nice to spawn a custom parser where you can just add your namespace mapping once, and all calls will pick it up. You could still do a one off call and manually feed them in, but you could also just create a custom parser and then just call as normal:

import soupsieve as sv

parser = sv.custom_parser()
parser.set_namespaces({"ns": "http:/ns.com"})
parser.select(':header.class', soup)

:dir() support

This is another selector that we just have to well define, once we understand the details, we can start work on this.

Use proper CSS identifier patterns for class, id, tags, etc.

Right now we allow things like #3 which isn't really allowed in CSS. We should use proper patterns that mimic CSS appropriately.

:defined support

:defined is a selector that is not in the CSS spec, but is mentioned in the HTML5 living spec. It is implemented by most browsers in some form or another. Ultimately it selects non custom selectors, or custom selectors that have been registered. Custom selectors are selectors with hyphens. If encountering a tag with : in it, it is usually counted that the tag has a prefix, and the fact that the tag has hyphens is ignored.

We would select all non custom tags as described above, and since we cannot register custom elements to any registry in BeautifulSoup or SoupSieve, that will be all.

In XML, this has no meaning and :defined will match nothing? Or maybe everything? Maybe we'll just declare it an HTML selector as it is browser specific so it matches nothing in XML.

Implement 'closest' api call

Implement a closest api function. it would function no different than noted here.

Summary, in short, given a selector and a tag, closest would return the closest tag ancestor (including the tag given) that matches the selector. See link above for examples.

This is super easy to implement.

Names of iter functions

Right now we have two iter functions commentsiter and selectiter. For 1.0, should they be called comments_iter and select_iter. Or maybe icomments and iselect?

We have some flexibility before 1.0 release, so we should really settle on something. commentsiter and selectiter just seem hard to read.

Internal Cleanup

Rework the parse engine to not assume that just because it is a "pseudo" class that it needs to close it. This will make some of our snippets less confusing as we have to include a ) at the end as we send replacement patterns through.

HTML detection for `type` selector not working correctly

Fix issue noted here: MechanicalSoup/MechanicalSoup#263.

We were checking against the html namespace for type case insensitivity instead of whether the document was XML or not.

Fix unsupported pseudo error

There is nothing wrong that we throw an error for unsupported pseudo, but currently, a supported pseudo with bad syntax may get caught and through this unsupported error. This can be confusing as it isn't exactly true. We should probably still catch these, but then compare the name against our supported list and issue a more appropriate error for supported pseudo classes.

Possible other non-official selectors

There are a couple selectors that might be useful that are found in JQuery's custom selector engine, but there are some that I absolutely will not support. We already support [attr!=value] and the old rejected CSS :contains(). Let's start off with what we won't support:

:selected is already covered by CSS4's :checked. If I need to specifically target options, it is easy enough to do manually.

I'm not sure if I care to implement these, but these are possibilities:

:parent: This seems like it may be useful. Essentially it would be an alias for :has(> *|*). It's a possibility.
:header: Who doesn't hate doing this: h1, h2, h3, h4, h5, h6? This might be useful and would simply be an alias for :is(h1, h2, h3, h4, h5, h6).
There exists some for input types, like :checkbox for input[type=checkbox], etc. Button would be mildly more complicated: :button ~= :is(button, input[type=button]).
:input would be a shortcut for all inputs: input, button, select, and textarea. I guess :is(input, button, select, textarea).

I'll at least leave this open for discussion.

Most of these can now be implemented with the coming custom selector support. You could have :--parent to implement parent, etc.

The JQuery selectors that cannot be supported with custom selectors would be the following:

:first, :last, :even, :odd, :eq, :nth, :lt and :gt will not be supported in any way shape or form. This is because these require for us to preserve order of the selectors in a compound selector and bubble of a list of elements that match each one allowing these to then filter them. This add a lot of complexity and code that I am not willing to do.

This actually wouldn't be as difficult as once thought. The key would be that we need to create a dictionary with each entry related to a unique indexing pseudo-class. There we could track the count for each one and simply apply a mod to tell if we should return it. These type of selectors would have to be evaluated at the very end.

It would only work effectively in one direction (the positive direction), so something like :eq(-1) would not be feasible as we don't accumulate a list of elements before yielding them, we yield them as we find them. I have no intention of changing this mechanic. The complexity of managing :eq(-1) nested in pseudo-classes such as :is() and :not() would be super complicated. But if we allowed positive numbers only, this would be easy to do. Positive values would be the only indexes I would consider.

Anyways, I would most likely have to receive requests for such a feature first.

:read-only and :read-write support

This is planned, but before it can be implemented, we need to have a well defined understanding of what can be "read only" and what is "read write". Once we understand this, we should be able to implement this no problem.

Clean up test suite

I want to break up tests more to single cases, and maybe abstract the testing of different parsers and quirks etc. Not a pressing issue, but something I do want to look at.

Compare attributes namespaces to attribute namespace value

Turns out, Beautiful Soup wraps its attribute keys in a special string like object that exposes prefix and namespace:

>>> list(soup.use.attrs.keys())[0]
'xlink:href'
>>> list(soup.use.attrs.keys())[0].namespace
'http://www.w3.org/1999/xlink'

We were only checking the prefix as we didn't know this existed. We should be checking the namespace as the prefix doesn't matter. You could redefine prefixes anywhere in the document.

[attr~=value] spec issue

According to the spec

[att~=val]
Represents an element with the att attribute whose value is a whitespace-separated list of words, one of which is exactly "val". If "val" contains whitespace, it will never represent anything (since the words are separated by spaces). Also if "val" is the empty string, it will never represent anything.

We are currently not enforcing this, but we should (the whitespace part).

Add support for more selectors that are HTML only or may never match due to environment

We could probably implement some things like: :hover, :active, :focus, :target, :visited, etc. Our environment (not in a browser) doesn't actually have these states, so we could have them just have them never match. The library cssselect apparently does this. We can pick up any CSS4 selectors (if any like this) and do the same.

We could also implement HTML only selectors, or selectors that are really only defined for HTML. We would have to make some assumptions based on our environment. For instance, :link would match all elements as in our environment, all links are unvisited.

Basically, we want to work towards having all selectors defined in a way that makes sense for our environment if possible.

This is an initial list that is targeted. It may change. Not all undefined selectors are being targeted. Some may be implemented at a later time.

Get rid of document flags

I think we can simplify this and remove document flags.

Combinator || for column relations

I've decided to move this to a separate issue as || is technically an "at risk" selector feature. There is very low priority, and if it doesn't make it into the spec, it will not be implemented. There still needs to be more clarification in the spec, or at least a reference implementation we can work off of.

Allow identifier tokens to start with `--` per the latest css-syntax-3 draft

Misc selectors

:playing / :paused is for media stuff. We can't play or pause in our environment, so this will just match nothing (#39).
:local-link this will most likely match nothing (#39).
:user-invalid there can be no user interaction in our environment, so this will match nothing (#39).
:scope I really need to understand what this really is. From what I understand right now, it would mainly just match :root, but I need to research this more, or understand what the future goal for this is.. Match :root if the document is under select or match, or match the tag if a tag is under match or select. It is used to make a pattern relative to the tag under evaluation. (#38)

XML default namespace leads to TypeError: init() keywords must be strings

This is a bug with handling valid XML namespaces; soupsieve assumes all namespaces have a prefix:

<prefix:tag xmlns:prefix="...">

but the prefix can be omitted to define a default namespace:

<tag xmlns="...">

meaning that any element without a prefix: prepended to the tag name is in that namespace. See section 6.2 of the XML namespaces 1.1 spec.

During parsing, lxml passes in a default namespace under the None key, e.g. {None: "..."}, and unique keys are accumulated in the soup._namespaces dictionary. soupsieve assumes the dictionary only ever has string keys, so an XML document with a default namespace leads to an exception.

Test case (using BeautifulSoup 4.7 for convenience):

>>> from bs4 import BeautifulSoup, __version__
>>> __version__
'4.7.0'
>>> sample = b'''\
... <?xml version="1.1"?>
... <!-- unprefixed element types are from "books" -->
... <book xmlns='urn:loc.gov:books'
...       xmlns:isbn='urn:ISBN:0-395-36341-6'>
...     <title>Cheaper by the Dozen</title>
...     <isbn:number>1568491379</isbn:number>
... </book>
... '''
>>> soup = BeautifulSoup(sample, 'xml')
>>> soup._namespaces
{'xml': 'http://www.w3.org/XML/1998/namespace', None: 'urn:loc.gov:books', 'isbn': 'urn:ISBN:0-395-36341-6'}
>>> soup.select_one('title')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/venvs/stackoverflow-latest/lib/python3.7/site-packages/bs4/element.py", line 1345, in select_one
    value = self.select(selector, namespaces, 1, **kwargs)
  File "/Users/mj/Development/venvs/stackoverflow-latest/lib/python3.7/site-packages/bs4/element.py", line 1377, in select
    return soupsieve.select(selector, self, namespaces, limit, **kwargs)
  File "/Users/mj/Development/venvs/stackoverflow-latest/lib/python3.7/site-packages/soupsieve/__init__.py", line 108, in select
    return compile(select, namespaces, flags).select(tag, limit)
  File "/Users/mj/Development/venvs/stackoverflow-latest/lib/python3.7/site-packages/soupsieve/__init__.py", line 50, in compile
    namespaces = ct.Namespaces(**(namespaces))
TypeError: __init__() keywords must be strings

where <title>Cheaper by the Dozen</title> was expected.

An easy way to set priority?

https://developer.mozilla.org/en-US/docs/Web/CSS/Specificity
"Using !important, however, is bad practice and should be avoided because it makes debugging more difficult by breaking the natural cascading in your stylesheets."
But sometimes when I use complex words to select, it is hard for me to review.
Can we use something like parentheses (eg:3*(3+4)=21)?

css_parser regex broken for python < 2.7.5

Python: 2.7.3
OS: RHEL
Dependency graph: MechanicalSoup → beauthifulsoup4 → soupsieve

With beautifulsoup4 4.7.0 soupsieve is installed as a dependency. Seems like python 2.7.3 had some issues in parsing one of the regular expressions in css_parser.py (RE_LANG to be specific). Do you have any plans on supporting python 2.7.4 or lower?

Here is the full stack trace for this issue:

File "~/virtualenvs/<venv>/lib/python2.7/site-packages/mechanicalsoup/__init__.py", line 2, in <module>
  from .browser import Browser
File "~/virtualenvs/<venv>/lib/python2.7/site-packages/mechanicalsoup/browser.py", line 2, in <module>
  import bs4
File "~/virtualenvs/<venv>/lib/python2.7/site-packages/bs4/__init__.py", line 34, in <module>
  from .builder import builder_registry, ParserRejectedMarkup
File "~/virtualenvs/<venv>/lib/python2.7/site-packages/bs4/builder/__init__.py", line 7, in <module>
  from bs4.element import (
File "~/virtualenvs/<venv>/lib/python2.7/site-packages/bs4/element.py", line 12, in <module>
  import soupsieve
File "~/virtualenvs/<venv>/lib/python2.7/site-packages/soupsieve/__init__.py", line 30, in <module>
  from . import css_parser as cp
File "~/virtualenvs/<venv>/lib/python2.7/site-packages/soupsieve/css_parser.py", line 144, in <module>
  RE_LANG = re.compile(r'(?:(?P<value>{value})|(?P<split>{ws}*,{ws}*))'.format(ws=WSC, value=VALUE), re.X)
File "~/virtualenvs/<venv>/lib/python2.7/re.py", line 190, in compile
  return _compile(pattern, flags)
File "~/virtualenvs/<venv>/lib/python2.7/re.py", line 242, in _compile
  raise error, v # invalid expression
sre_constants.error: nothing to repeat

Add Python 3.8 tests to CI

Add Python 3.8 to Travis and Appveyor, but allow it to fail to alert us if things change. Py 3.8 related issues were raised in #54.

Help: is soupsieve case-insensitive?

In [122]: xml = """<Envelope><Header>...</Header></Envelope>"""

In [123]: s = BeautifulSoup(xml, "xml")

In [124]: s.select("header")
Out[124]: [<Header>...</Header>]

In [125]: s.select("Header")
Out[125]: []

Before, BeautifulSoup accepted (and I think required) case-sensitive tag name in selector.

Now that BeautifulSoup uses soupsieve, it seems that only lower-case selectors are supported.

I'm really not sure why or if I can change this behaviour.

Selectors '> tag', '+ tag', and '~ tag'

'>+~' symbols at the beginning of the selectors.
These selectors worked in Beautiful Soup 4.6.x.
But in 4.7.x there is no support for such selectors.

For example, the code below causes an soupsieve.util.SelectorSyntaxError exception.

from bs4 import BeautifulSoup
BeautifulSoup('<a>test<b>test2</b></a>').a.select('> b')

Result:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Programs\Programming\Python-3\lib\site-packages\bs4\element.py", line 1376, in select
    return soupsieve.select(selector, self, namespaces, limit, **kwargs)
  File "D:\Programs\Programming\Python-3\lib\site-packages\soupsieve\__init__.py", line 112, in select
    return compile(select, namespaces, flags, **kwargs).select(tag, limit)
  File "D:\Programs\Programming\Python-3\lib\site-packages\soupsieve\__init__.py", line 63, in compile
    return cp._cached_css_compile(pattern, namespaces, custom, flags)
  File "D:\Programs\Programming\Python-3\lib\site-packages\soupsieve\css_parser.py", line 205, in _cached_css_compile
    CSSParser(pattern, custom=custom_selectors, flags=flags).process_selectors(),
  File "D:\Programs\Programming\Python-3\lib\site-packages\soupsieve\css_parser.py", line 1010, in process_selectors
    return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
  File "D:\Programs\Programming\Python-3\lib\site-packages\soupsieve\css_parser.py", line 888, in parse_selectors
    sel, m, has_selector, selectors, relations, is_pseudo, index
  File "D:\Programs\Programming\Python-3\lib\site-packages\soupsieve\css_parser.py", line 713, in parse_combinator
    index
soupsieve.util.SelectorSyntaxError: The combinator '>' at postion 0, must have a selector before it
  line 1:
> b
^

Unit tests/code coverage

Finish up testing uncovered areas of code. This is a requirement for 1.0.0.

Release 1.0

This has been pulled out of the PySpelling project to be released as its own project. A bit of work to do to before it can go on PyPI.

Modifying :placeholder-shown to handle select > option in certain scenarios

Over at csswg, it appears they plan to add :placeholder-select.

RESOLVED: Accept and add the :placeholder-select pseudo class and add a note for ::placeholder that we're interested in working on it

I don't think they plan to extend :placeholder-shown to included select options. Anyways, we'll have to wait until something is actually published before we even consider changing :placeholder-shown or adding :placeholder-select, but I want to at least track this so I remember to look at it in the future.

:placeholder-shown will be modified based on the wording in the CSS level 4 spec at this time.

When we speak of placeholders and select, we are specifically referring to this case: https://html.spec.whatwg.org/multipage/form-elements.html#placeholder-label-option.

Readme when showing supported css selectors.

Before and many more

After and many more

Selectors :nth-col() and :nth-last-col()

I don't know if these will make it into version 1.0 or not. There are no real implementations of this available. Some things seem to still be in flux, such as its renaming recently.

~~In parent || child is parent always compared against col and child against td? What happens if specify something else: p || span?~~

Same question applies to :nth-col(), is the implied target td:nth-col() if you do something like .class:nth-col()?

There are a number of questions that I have which will need to be understood before this is implemented. There is also a bit of complexity involved here.

child must be verified to meet cell requirement and selector requirements.
parent must meet column requirement and selector requirements.
We would most likely:
- verify that child is a td
- verify location in row
- crawl up and verify whether there are col tags in table header
- capture appropriate column span considerations
- determine td column and span based on captured info
- calculate nth position and whether the td fits within the relation.

Document CSS compiled selector structure.

For contributors coming to add improvements, it would be helpful to document the internal structure of how we construct our CSS selector structure. This should be in the development page.