Comments (7)
I am also looking for the ability to follow links or parse the next page. Sometimes the first url is not what you are looking for (for example if you want to parse the first result of a search page and not the search page itself)
from toapi.
There is a solution here:
api = Api(url)
app = api.server.app
@app.route('/post_page/')
def post_method():
res = requests.post(url, data) # You need to analysis the ajax post request of source site.
return item.parse(res.text)
from toapi.
This example could help you.
https://github.com/gaojiuli/toapi/blob/master/examples/hackernews_page.py
from toapi.
@gaojiuli Thanks!
from toapi.
@gaojiuli Where does data
, and item
come from? I tried:
@app.route('/posts')
def post_method(*args, **kwargs):
print(args)
print(kwargs)
but they are empty.
from toapi.
由于toapi内置的fetch_page_source()
方法 没有针对post请求的情况
我们需要自行添加flask路由来实现功能
这里给出一个比较详细的例子
假设我需要通过post方法来得到这个 url 的数据,并且通过toapi的方式来解析的
- items的编写
from toapi import Item, XPath
class Search(Item):
'''
从搜索的界面解析出
书名 id 链接 简介
'''
title = XPath('//h3/a/text()')
book_id = XPath('//h3/a/@href')
url = XPath('//h3/a/@href')
content = XPath('//p[2]/text()')
def clean_title(self, title):
return ''.join(title)
def clean_book_id(self, book_id):
return book_id.split('-')[1]
def clean_url(self, url):
return url[:url.find('?')]
class Meta:
source = XPath('//li[@class="pbw"]')
# 这里的route留空,防止重复注册路由
route = {}
- 路由的注册
from toapi import Api
from items.search import Search
from settings import MySettings
import json
import requests
api = Api('',settings=MySettings)
api.register(Search)
@api.server.app.route('/search/<keyword>')
def search_page(keyword):
'''
91bay新书论坛
搜索功能
'''
data = {
'searchsel': 'forum',
'mod': 'forum',
'srchtype': 'title',
'srchtxt': keyword,
}
r = requests.post(
'http://91baby.mama.cn/search.php?searchsubmit=yes', data)
r.encoding = 'utf8'
html = r.text
results = {}
items = [Search]
# 通过toapi的方法对网页进行解析
for item in items:
parsed_item = api.parse_item(html, item)
results[item.__name__] = parsed_item
# 返回json
return api.server.app.response_class(
response=json.dumps(results, ensure_ascii=False),
status=200,
mimetype='application/json'
)
if __name__ == '__main__':
api.serve()
这样我们就可以通过访问http://127.0.0.1:5000/search/keyword 来解析post数据
这个方法由于没有得到toapi的支持
所以缓存功能是不可以使用的
from toapi.
Hi @Ehco1996
You can also use the cache by yourself
There is a document here:
- cached: demo
- api_cached
from toapi.
Related Issues (20)
- 关于post数据的获取和item是编写 HOT 6
- Add force-refresh on api HOT 1
- 使用toapi run 时出现问题 HOT 1
- 过滤删除无用的item
- Cache TTL clarification HOT 5
- Access to RawHTML from selectors HOT 2
- Production Deployment Instructions HOT 3
- python2.7安装报错 HOT 2
- How to set expiration on Local Storage
- toapi is easy toStar hard todevelope HOT 1
- Error: No such command "new". HOT 2
- ImportError: cannot import name 'XPath' HOT 2
- Flask logging error HOT 2
- SyntaxError: invalid syntax HOT 1
- modify routing argument 2
- Elements not always present on page
- Is this projects still active ?
- Problem: can't start the app
- Fix simple typo: programe -> program
- Your Documentation is unable to access HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from toapi.