Comments (3)
@pkwenda 图片批量爬取可参考HuabanImgDemo。另,对于图片视频等的爬取,建议应和爬取内容的方式尽量一致。爬取工作只负责寻找图片或视频的url,找到后直接交给专门的文件下载处理器去处理。这样思路也比较清晰。
from web-bee.
我认真想了想你的想法,觉得可以。
URLConnection urlConnection = destUrl.openConnection();
InputStream inputStream = urlConnection.getInputStream();
关于用URL
来获取的流 没有 content-type
,无法自动匹配mineType
后缀,达到自动化的目的.并且与前面的Setting互驳,我这边重构Task并提供一个依托HttpClient
的 newRequest
函数方便你的专门的文件下载处理器
内部正确的利用URL
产生请求.
- 自动获取mineType代码我写在:这里
if (destPath.endsWith("/")) {
destPath = destPath.substring(0, destPath.length() - 1);
}
byte[] buffer = new byte[1024];
这部分,我来做,我会在不影响你的功能基础上抽离出来,我觉得可以抽离的抽离出来,因为可能很多地方要用。
以后关于下载问题我们去 #33 讨论。包括今后以后所有的问题,我觉得我们尽量在github讨论,多人协作,前期避免一个功能出现岔路。
@wangtonghe @trto1987 @biezhi
cheer up 😄
from web-bee.
@pkwenda 好的。刚看到这个。昨天我也发现url直接下载的方式不能直接获取文件类型,后用判断流开始字节的部分暂时获得了文件类型。类似这样的处理。不过感觉不太优雅。你既提供,那就直接用你的了。再讨论。
from web-bee.
Related Issues (20)
- 持续集成测试问题
- 缓存 HOT 2
- 异常处理
- 关于分页下一页 HOT 1
- 工厂模式
- redis 每爬数百次 报错 HOT 2
- 爬取图片
- 定时器 HOT 2
- org.apache.http.conn.ConnectTimeoutException: Connect to www.zhihu.com:443 [www.zhihu.com/118.178.213.186] failed: connect timed out
- 爬取视频 HOT 1
- 为下载文件实现多线程
- 为下载文件添加进度条
- 提供一个单个下载[视频]、[图片] API
- API网站整站爬取
- 完善日志系统
- 实现自动解析网络流type后缀
- 文件下载器优化 HOT 3
- 如何使用? HOT 1
- 怎么运行起来?项目之间关系是什么?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from web-bee.