Coder Social home page Coder Social logo

Comments (9)

abmyii avatar abmyii commented on May 25, 2024 1

Sure! As I said, I'll be working on this project and once I collect enough to warrant a plug-in, I'll submit one. Right now I only have ParagraphField and I don't think that's enough to make a plug-in for.

from ruia.

howie6879 avatar howie6879 commented on May 25, 2024

Hi @abmyii :

Maybe we should remove .srtip()

from ruia.

abmyii avatar abmyii commented on May 25, 2024

Maybe we should remove .strip()

Will do.

BTW, this is my paragraph field, in case it is useful:

class ParagraphField(TextField):
    """
    This field is used to get paragraphs.
    """

    def _parse_element(self, element):
        strings = [node for node in element.itertext()]
        string = "".join(strings)
        return string if string else self.default

    def extract(self, *args, **kwargs):
        # Join lines (after stripping them)
        return "\n".join(
            _.strip() for _ in super().extract(*args, **kwargs) if _.strip()
        ).strip()

from ruia.

howie6879 avatar howie6879 commented on May 25, 2024

I think getting customized target content should be implemented by the developer in the clean function, for example:

class TargetItem(Item):
    paragraph = TextField(css_selector="")

    async def clean_paragraph(self, value):
        # TODO
        pass

from ruia.

abmyii avatar abmyii commented on May 25, 2024

I did that before, but having ~3/4 paragraph fields all with an identical clean function (except different name) isn't very pleasing:

async def clean_<name1>(self, values):
    return "\n".join(_.strip() for _ in values if _.strip()).strip()

async def clean_<name2>(self, values):
    return "\n".join(_.strip() for _ in values if _.strip()).strip()

async def clean_<name3>(self, values):
    return "\n".join(_.strip() for _ in values if _.strip()).strip()
...

from ruia.

howie6879 avatar howie6879 commented on May 25, 2024

If repeated often, I recommend that developers do this themselves, just like what you do now:

class ParagraphField(TextField):
    """
    This field is used to get paragraphs.
    """

    def _parse_element(self, element):
        strings = [node for node in element.itertext()]
        string = "".join(strings)
        return string if string else self.default

    def extract(self, *args, **kwargs):
        # Join lines (after stripping them)
        return "\n".join(
            _.strip() for _ in super().extract(*args, **kwargs) if _.strip()
        ).strip()

It might not be a good idea if it came directly from Ruia, Ruia should only provide basic functions, what do you think about it?

from ruia.

abmyii avatar abmyii commented on May 25, 2024

Ruia should only provide basic functions

True. I can't tell if this should be or not, but I'm sure you know better. I'll submit the PR for removing .strip and this should be done.
Thanks for your input!

from ruia.

howie6879 avatar howie6879 commented on May 25, 2024

Thank you for your understanding. You can write a plug-in. This can also achieve what you want and help more people.

from ruia.

howie6879 avatar howie6879 commented on May 25, 2024

if you are interested in this job, we can do this in python-ruia

from ruia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.