Comments (18)
Specificity really only applies when you are in a stylesheet, and you are trying to see which style (based on the specificity of the selectors) gets applied. It really doesn't apply in the context we are using them in SoupSieve.
I guess I would have to see an example of what you are trying to do, and why it doesn't work.
from soupsieve.
Just by using requests.get(url='https://m.banzhuer.com/booklist/5393_39/')
, and you can get a html file. And you can find <li><a> href="/booklist/5393_38/" class="xbk">1481 - 1520章</a><li><a class="xbk this tb">1521 - 1560章</a><li><a href="/booklist/5393_40/" class="xbk">1561 - 1600章</a><li>
.
<a> class="xbk this tb">1521 - 1560章</a>
is the current index, and I want to find the next index. It should be easy(<a href="/booklist/5393_40/" class="xbk">1561 - 1600章</a>
is exactly what I want). But there is a problem, this code repeats 4 times in the html. So I have to filter them. It is very hard for me.
And there is another problem. I want to find all the indexes, but all of them repeats 4 times. I have to say, this website is disgusting.
from soupsieve.
I feel like you should be able to do something like li:nth-child(3) a.xbk
, but I still don't exactly understand what you are trying to target.
from soupsieve.
- I want to get the current index
and<a> class="xbk this tb">1521 - 1560章</a>
is what I want. But that tag repeats 4 times(You can check it, and even their parents looks the same, too.). I can't find a good way to filter. - I want to get the next index
and<a href="/booklist/5393_40/" class="xbk">1561 - 1600章</a>
is what I want(just behind the current index). But that tag repeats 4 times, too. I can't find a good way to filter. - I want to get all of the indexes
But I also can't filter them.
from soupsieve.
Are you able to retrieve the current index without issue? If so, I may have an idea.
from soupsieve.
Are you able to retrieve the current index without issue? If so, I may have an idea.
I can get the index easily. The problem is that there are clones. But I don't want to use something like set()
in python. Can I make it just by using soupsieve?
from soupsieve.
If you are able to get the current index, and extract that text, you could then maybe construct another selector li:contains("1521 - 1560章") + li > a.xbk
.
So something like:
select('li:contains("{}") + li > a.xbk'.format(current.text))
You'd have to play around and figure out what works best. This is really outside the scope of Soup Sieve support, as I'm more interested in addressing bugs and features, and not applications of the library, but maybe this helps.
from soupsieve.
If you are able to get the current index, and extract that text, you could then maybe construct another selector
li:contains("1521 - 1560章") + li > a.xbk
.So something like:
select('li:contains("{}") + li > a.xbk'.format(current.text))You'd have to play around and figure out what works best. This is really outside the scope of Soup Sieve support, as I'm more interested in addressing bugs and features, and not applications of the library, but maybe this helps.
But how can I get all of the indexes? I mean I can get them easily, but the problem is that it repeats in the html. I can't filter.
from soupsieve.
I didn't look too close, but while it repeats, I thought their parents were different.
I might be wrong, but you could get all indexes with parent li
.
Anyways, not all problems can be solved with just selectors. Sometimes you may have to use additional logic if the HTML is constructed in a way that does not easily lend too simple selectors.
from soupsieve.
And my way to filter, to get the next index(The codes about current index, next index and indexes are executed in different places, so you can't get the tag of current index and then get the next index, just one code and get the result.)
select('div[class="fenye"]:first-child div[class="showpage r3"]:first-child > ul li > a[class="xbk this tb"] + li > a[class="xbk"]')
That is very hard for me to read. But maybe you can help my by giving another way.
from soupsieve.
And maybe you can provide the priority of the function in the document?
from soupsieve.
If you have a complicated selector, you can annotate it with CSS comments. It can be helpful when you come back to understand what it is you were doing.
>>> selector = """
... /* This isn't complicated, but we're going to annotate it anyways.
... This is the a class */
... .a,
... /* This is the b class */
... .b,
... /* This is the c class */
... .c
... """
>>> sv.select(selector, soup)
[<p class="a">Cat</p>, <p class="b">Dog</p>, <p class="c">Mouse</p>]
Like with regular expressions, sometimes a selector solution can be complex. Complex doesn't always mean its a bad solution, sometimes that is just necessary.
Unfortunately, I don't have time to analyze the HTML you've provided in great detail and provide a complete solution.
from soupsieve.
I'm also not sure what you mean by function priority.
from soupsieve.
I'm also not sure what you mean by function priority.
li > a[class="xbk this tb"] ~ li > a[class="xbk"]
means ((li > a[class="xbk this tb"]) ~ li) > a[class="xbk"]
(I use parentheses to point out the priority and this code is wrong and can't be executed by soupseive). So maybe you can provide the priority(like * / + - in Maths)?
from soupsieve.
A complex selector (one with combinators such as >
, +
, etc.) are evaluated from right to left. Checkout this answer on Stack Overflow: https://stackoverflow.com/a/8135729/3609487.
Soup Sieve basically does the same thing. The element under consideration is evaluated with a[class="xbk"]
, then it checks that it has the parent li
, then it checks that li
comes after the sibling a[class="xbk this tb"]
, and that sibling is a child of li
.
Soup Sieve doesn't try to spell out the entire CSS spec. It is expected that the user will reference CSS rules if they are confused about how a selectors should work, but I will consider the suggestion.
from soupsieve.
A complex selector (one with combinators such as
>
,+
, etc.) are evaluated from right to left. Checkout this answer on Stack Overflow: https://stackoverflow.com/a/8135729/3609487.Soup Sieve basically does the same thing. The element under consideration is evaluated with
a[class="xbk"]
, then it checks that it has the parentli
, then it checks thatli
comes after the siblinga[class="xbk this tb"]
, and that sibling is a child ofli
.Soup Sieve doesn't try to spell out the entire CSS spec. It is expected that the user will reference CSS rules if they are confused about how a selectors should work, but I will consider the suggestion.
Thx.
But why from right to left? Commonly, the left one would be parent, and the right one would be child. And if you search the right one first, maybe there would be too many results. But if you check the parents first and then search the children from the results, I think it can save time.
from soupsieve.
The tree is crawled from parent to child, but each tag is matched. When the tag is matched, we start from the right. First we want to know if it is even the element that we want, then we look at ancestry.
from soupsieve.
Thanks a lot.
from soupsieve.
Related Issues (20)
- CDATA handling in HTML changed in lxml parser with libxml2 2.9.12 HOT 21
- Interesting psuedo class to keep an eye on `:in()` HOT 8
- Rework internal structure of "relations" HOT 1
- circular dependency /bs4 HOT 15
- Attribute selectors vs \n in values HOT 5
- Change in `:has()` CSS Level 4 spec - document our difference or update? HOT 1
- hatch? HOT 5
- Using Hatch in Python 3.6 is technically not allowed HOT 7
- setup.py is mentioned in readme but there is no setup.py HOT 2
- Invalid syntax error on python3.4 HOT 5
- Tracking `:scope` issue related to relative selector lists (`:has()`) HOT 1
- pyproject.toml: validation error since setuptools 61.2.0 HOT 8
- PermissionError: [Errno 13] Permission denied HOT 4
- missing dependency on `bs4` HOT 7
- LXML does not currently generate wheels for Python 3.11 on Windows
- `:has()` is no longer forgiving HOT 1
- malformed attribute selector HOT 7
- The new type hints cause pytest to hang after test session HOT 4
- Attribute Selector Case Sensitivity: Whitespace HOT 1
- Potentially rework CSS parsing
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from soupsieve.