Coder Social home page Coder Social logo

Comments (18)

facelessuser avatar facelessuser commented on August 16, 2024

Specificity really only applies when you are in a stylesheet, and you are trying to see which style (based on the specificity of the selectors) gets applied. It really doesn't apply in the context we are using them in SoupSieve.

I guess I would have to see an example of what you are trying to do, and why it doesn't work.

from soupsieve.

yjqiang avatar yjqiang commented on August 16, 2024

Just by using requests.get(url='https://m.banzhuer.com/booklist/5393_39/'), and you can get a html file. And you can find <li><a> href="/booklist/5393_38/" class="xbk">1481 - 1520章</a><li><a class="xbk this tb">1521 - 1560章</a><li><a href="/booklist/5393_40/" class="xbk">1561 - 1600章</a><li>.
<a> class="xbk this tb">1521 - 1560章</a> is the current index, and I want to find the next index. It should be easy(<a href="/booklist/5393_40/" class="xbk">1561 - 1600章</a> is exactly what I want). But there is a problem, this code repeats 4 times in the html. So I have to filter them. It is very hard for me.
And there is another problem. I want to find all the indexes, but all of them repeats 4 times. I have to say, this website is disgusting.

from soupsieve.

facelessuser avatar facelessuser commented on August 16, 2024

I feel like you should be able to do something like li:nth-child(3) a.xbk, but I still don't exactly understand what you are trying to target.

from soupsieve.

yjqiang avatar yjqiang commented on August 16, 2024
  1. I want to get the current index
    image and <a> class="xbk this tb">1521 - 1560章</a> is what I want. But that tag repeats 4 times(You can check it, and even their parents looks the same, too.). I can't find a good way to filter.
  2. I want to get the next index
    image
    and <a href="/booklist/5393_40/" class="xbk">1561 - 1600章</a> is what I want(just behind the current index). But that tag repeats 4 times, too. I can't find a good way to filter.
  3. I want to get all of the indexes
    image

But I also can't filter them.

from soupsieve.

facelessuser avatar facelessuser commented on August 16, 2024

Are you able to retrieve the current index without issue? If so, I may have an idea.

from soupsieve.

yjqiang avatar yjqiang commented on August 16, 2024

Are you able to retrieve the current index without issue? If so, I may have an idea.

I can get the index easily. The problem is that there are clones. But I don't want to use something like set() in python. Can I make it just by using soupsieve?

from soupsieve.

facelessuser avatar facelessuser commented on August 16, 2024

If you are able to get the current index, and extract that text, you could then maybe construct another selector li:contains("1521 - 1560章") + li > a.xbk.

So something like:

select('li:contains("{}") + li > a.xbk'.format(current.text))

You'd have to play around and figure out what works best. This is really outside the scope of Soup Sieve support, as I'm more interested in addressing bugs and features, and not applications of the library, but maybe this helps.

from soupsieve.

yjqiang avatar yjqiang commented on August 16, 2024

If you are able to get the current index, and extract that text, you could then maybe construct another selector li:contains("1521 - 1560章") + li > a.xbk.

So something like:

select('li:contains("{}") + li > a.xbk'.format(current.text))

You'd have to play around and figure out what works best. This is really outside the scope of Soup Sieve support, as I'm more interested in addressing bugs and features, and not applications of the library, but maybe this helps.

But how can I get all of the indexes? I mean I can get them easily, but the problem is that it repeats in the html. I can't filter.

from soupsieve.

facelessuser avatar facelessuser commented on August 16, 2024

I didn't look too close, but while it repeats, I thought their parents were different.

I might be wrong, but you could get all indexes with parent li.

Anyways, not all problems can be solved with just selectors. Sometimes you may have to use additional logic if the HTML is constructed in a way that does not easily lend too simple selectors.

from soupsieve.

yjqiang avatar yjqiang commented on August 16, 2024

And my way to filter, to get the next index(The codes about current index, next index and indexes are executed in different places, so you can't get the tag of current index and then get the next index, just one code and get the result.)
select('div[class="fenye"]:first-child div[class="showpage r3"]:first-child > ul li > a[class="xbk this tb"] + li > a[class="xbk"]')
That is very hard for me to read. But maybe you can help my by giving another way.

from soupsieve.

yjqiang avatar yjqiang commented on August 16, 2024

And maybe you can provide the priority of the function in the document?

from soupsieve.

facelessuser avatar facelessuser commented on August 16, 2024

If you have a complicated selector, you can annotate it with CSS comments. It can be helpful when you come back to understand what it is you were doing.

>>> selector = """
... /* This isn't complicated, but we're going to annotate it anyways.
...    This is the a class */
... .a,
... /* This is the b class */
... .b,
... /* This is the c class */
... .c
... """
>>> sv.select(selector, soup)
[<p class="a">Cat</p>, <p class="b">Dog</p>, <p class="c">Mouse</p>]

Like with regular expressions, sometimes a selector solution can be complex. Complex doesn't always mean its a bad solution, sometimes that is just necessary.

Unfortunately, I don't have time to analyze the HTML you've provided in great detail and provide a complete solution.

from soupsieve.

facelessuser avatar facelessuser commented on August 16, 2024

I'm also not sure what you mean by function priority.

from soupsieve.

yjqiang avatar yjqiang commented on August 16, 2024

I'm also not sure what you mean by function priority.

li > a[class="xbk this tb"] ~ li > a[class="xbk"] means ((li > a[class="xbk this tb"]) ~ li) > a[class="xbk"](I use parentheses to point out the priority and this code is wrong and can't be executed by soupseive). So maybe you can provide the priority(like * / + - in Maths)?

from soupsieve.

facelessuser avatar facelessuser commented on August 16, 2024

A complex selector (one with combinators such as >, +, etc.) are evaluated from right to left. Checkout this answer on Stack Overflow: https://stackoverflow.com/a/8135729/3609487.

Soup Sieve basically does the same thing. The element under consideration is evaluated with a[class="xbk"], then it checks that it has the parent li, then it checks that li comes after the sibling a[class="xbk this tb"], and that sibling is a child of li.

Soup Sieve doesn't try to spell out the entire CSS spec. It is expected that the user will reference CSS rules if they are confused about how a selectors should work, but I will consider the suggestion.

from soupsieve.

yjqiang avatar yjqiang commented on August 16, 2024

A complex selector (one with combinators such as >, +, etc.) are evaluated from right to left. Checkout this answer on Stack Overflow: https://stackoverflow.com/a/8135729/3609487.

Soup Sieve basically does the same thing. The element under consideration is evaluated with a[class="xbk"], then it checks that it has the parent li, then it checks that li comes after the sibling a[class="xbk this tb"], and that sibling is a child of li.

Soup Sieve doesn't try to spell out the entire CSS spec. It is expected that the user will reference CSS rules if they are confused about how a selectors should work, but I will consider the suggestion.

Thx.
But why from right to left? Commonly, the left one would be parent, and the right one would be child. And if you search the right one first, maybe there would be too many results. But if you check the parents first and then search the children from the results, I think it can save time.

from soupsieve.

facelessuser avatar facelessuser commented on August 16, 2024

The tree is crawled from parent to child, but each tag is matched. When the tag is matched, we start from the right. First we want to know if it is even the element that we want, then we look at ancestry.

from soupsieve.

yjqiang avatar yjqiang commented on August 16, 2024

Thanks a lot.

from soupsieve.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.