Comments (9)
Sure! As I said, I'll be working on this project and once I collect enough to warrant a plug-in, I'll submit one. Right now I only have ParagraphField
and I don't think that's enough to make a plug-in for.
from ruia.
Hi @abmyii :
Maybe we should remove .srtip()
from ruia.
Maybe we should remove
.strip()
Will do.
BTW, this is my paragraph field, in case it is useful:
class ParagraphField(TextField):
"""
This field is used to get paragraphs.
"""
def _parse_element(self, element):
strings = [node for node in element.itertext()]
string = "".join(strings)
return string if string else self.default
def extract(self, *args, **kwargs):
# Join lines (after stripping them)
return "\n".join(
_.strip() for _ in super().extract(*args, **kwargs) if _.strip()
).strip()
from ruia.
I think getting customized target content should be implemented by the developer in the clean
function, for example:
class TargetItem(Item):
paragraph = TextField(css_selector="")
async def clean_paragraph(self, value):
# TODO
pass
from ruia.
I did that before, but having ~3/4 paragraph fields all with an identical clean function (except different name) isn't very pleasing:
async def clean_<name1>(self, values):
return "\n".join(_.strip() for _ in values if _.strip()).strip()
async def clean_<name2>(self, values):
return "\n".join(_.strip() for _ in values if _.strip()).strip()
async def clean_<name3>(self, values):
return "\n".join(_.strip() for _ in values if _.strip()).strip()
...
from ruia.
If repeated often, I recommend that developers do this themselves, just like what you do now:
class ParagraphField(TextField):
"""
This field is used to get paragraphs.
"""
def _parse_element(self, element):
strings = [node for node in element.itertext()]
string = "".join(strings)
return string if string else self.default
def extract(self, *args, **kwargs):
# Join lines (after stripping them)
return "\n".join(
_.strip() for _ in super().extract(*args, **kwargs) if _.strip()
).strip()
It might not be a good idea if it came directly from Ruia, Ruia should only provide basic functions, what do you think about it?
from ruia.
Ruia should only provide basic functions
True. I can't tell if this should be or not, but I'm sure you know better. I'll submit the PR for removing .strip
and this should be done.
Thanks for your input!
from ruia.
Thank you for your understanding. You can write a plug-in. This can also achieve what you want and help more people.
from ruia.
if you are interested in this job, we can do this in python-ruia
from ruia.
Related Issues (20)
- Would be nice to be able to pass in "start_urls" HOT 7
- Trouble scraping deck.tk/deckstats.net HOT 7
- python3.9 remove asyncio.Task.all_tasks() HOT 3
- 【suggestion】重试逻辑可以添加或更换代理ip HOT 8
- 运行示例代码报错 HOT 10
- 代理使用问题 HOT 1
- 是否可以用模式匹配工具-pampy来实现对json解析的支持 HOT 1
- 并发5,循环爬取1000个网页,CPU耗尽为0,但是内存没有耗完,大佬帮看看代码有什么问题 HOT 3
- POST发送请求,收不到请求中的body HOT 2
- httpx替换aiohttp支持http2 HOT 1
- 我应当如何向 Spider 传递 start_urls? HOT 1
- 示例代码运行报错 HOT 3
- worker_numbers 数值多少合适 HOT 1
- ruia 使用lxml编码xml文档时报错 HOT 1
- 希望添加更多功能,更多示例,更多文档,希望长期维护~
- 通过中间件添加 socks5 代理后如何关闭 session?
- docs.python-ruia.org is not available HOT 4
- 如果能支持分布式就好了
- Logs HOT 1
- 请问如何判断发生了跳转呢? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ruia.