winsoft666 / zoe Goto Github PK
View Code? Open in Web Editor NEWC++ File Download Library.
License: GNU General Public License v3.0
C++ File Download Library.
License: GNU General Public License v3.0
When requesting data the libcurl option CURLOPT_SSL_VERIFYHOST
and CURLOPT_SSL_VERIFYPEER
are explicitly disabled.
see EntryHandler::requestFileInfo
and Slice::start
CHECK_SETOPT1(curl_easy_setopt(curl_, CURLOPT_SSL_VERIFYHOST, 0L));
CHECK_SETOPT1(curl_easy_setopt(curl_, CURLOPT_SSL_VERIFYPEER, 0L));
I hereby quote the cURL documentation
WARNING: disabling verification of the certificate allows bad guys to man-in-the-middle the communication without you knowing it. Disabling verification makes the communication insecure. Just having encryption on a transfer is not enough as you cannot be sure that you are communicating with the correct end-point.
download failed when can't get file size.
Hi,
EnableSaveSliceToTmp: 0 or 1, optional, whether save slice file to system temp directory or not, Windows system is the path returned by GetTempPath API, Linux is /var/tmp/
You could use O_TMPFILE while creating temp file. O_TMPFILE is a Linux-specific flag for open(), that allows creation of already-unlinked temporary files, that don't need to be explicitly removed via unlink. This means that you could create a temp file without name.
https://kernelnewbies.org/Linux_3.11#head-8be09d59438b31c2a724547838f234cb33c40357
现在已经支持设置最大下载速度了(setMaxDownloadSpeed),
但有时网络并不好(中美)
所以希望调整 slice.cpp :: start :: CURLOPT_LOW_SPEED_LIMIT 处 支持设置最低下载速度和超时时间,
1.用户手动设置后,这样遇到某些分片因为网络原因,可以及时的自动取消下载,而不是一直低速等待
2.用户不设置,那就一直低速等待
PS:我实际遇到的问题是,有的时候,下载很慢,此时暂停,然后立即重新开始下载,速度就很快了。所以需要低速自动中断,然后我会立即调用重新下载,这样下载速度就恢复了。跟网络有关,跟服务器链接策略有关(有的服务器对连接时间过长的链接会调低优先级限速,资源倾斜给新连接上来的链接)
bool includeInvalidChar = false;
for (int i = 0; i < fullFileName.length(); i++) {
char c = fullFileName[i];
if (c == '\\' || c == '/' || c == ':' || c == '*' || c == '?' || c == '<' ||
c == '>' || c == '|' || c == '"') {
includeInvalidChar = true;
break;
}
}
if (includeInvalidChar) {
return false;
}
file_util.cpp中的这部分在commit ce2b1c2时加入。
正反斜杠出现在filename中时,应该是会被当作目录处理,在这部分改变之前我一直使用形如"directory/file.xx"的filename都是正常的。
虽然windows提示正反斜杠不能出现在文件名中,但应该能作为路径名,这样做会使带路径的文件名失效。
(我觉得如果再加一个target directory会非常累赘
Hello,
the vcpkg CONTROL file for teemo should be specifying Build-Depends: curl[non-http,openssl]
instead of Build-Depends: curl[non-http]
.
As it is, builds will fail on systems without a system-wide OpenSSL installation. The OpenSSL dependency was automatically met when cpprestsdk was still a dependency of this project, but it was since removed.
If second filesize is smaller than first . last always is the first file size. Or am I wrong? I use teemo in a shared_ptr.
if (options_ && slice_manager_) {
int64_t now = slice_manager_->totalDownloaded();
static int64_t last = already_download_; //here static int64
if (now >= last) {
int64_t downloaded = now - last;
last = now;
options_->speed_functor(downloaded);
}
}
i wanna be a zoe main after this project 🤣🤣🤣🤣
在你的最近一次修改中,已经采用了fallocate
但在我的意识里,mac并不支持fallocate或posix_fallocate linux支持,mac是unix,并不是linux
参见:
https://github.com/aria2/aria2/blob/15cad965eb75c8b7f11bc2fc94354d1873bf6261/src/AbstractDiskWriter.cc
elif defined(__APPLE__) && defined(__MACH__)
const auto toalloc = offset + length - size();
fstore_t fstore = {F_ALLOCATECONTIG | F_ALLOCATEALL, F_PEOFPOSMODE, 0,
toalloc, 0};
if (fcntl(fd_, F_PREALLOCATE, &fstore) == -1) {
// Retry non-contig.
fstore.fst_flags = F_ALLOCATEALL;
if (fcntl(fd_, F_PREALLOCATE, &fstore) == -1) {
int err = errno;
throw DL_ABORT_EX3(
err,
fmt("fcntl(F_PREALLOCATE) of %" PRId64 " failed. cause: %s",
fstore.fst_length, util::safeStrerror(err).c_str()),
error_code::FILE_IO_ERROR);
}
}
// This forces the allocation on disk.
ftruncate(fd_, offset + length);
# elif HAVE_FALLOCATE
aria2针对mac是使用的fcntl F_PREALLOCATE,在issues7中我也提过
请在macos中实际测试一下,看是否需要进一步修正
看的不是太仔细,没找到清理,所以问一下
EntryHandler::asyncTaskProcess() 217行 multi = curl_multi_init();
multi_最后没有调用curl_multi_cleanup进行清理?
slice::stop确实清理了curl_
但是,只有判断下载成功的slice才触发stop().那么那些下载失败的slice里面的curl_,没有清理(curl_multi_remove_handle+curl_easy_cleanup)?
帮我捋一下,告诉我问1和问2是怎么清理的
你好,测试下载大于4G文件失败,提示 SLICE_DOWNLOAD_FAILED
teemo_tool_verbose.log 日志
你好,测试下载大于4G文件失败,提示 SLICE_DOWNLOAD_FAILED
teemo_tool_verbose.log 日志
[teemo] URL: http://192.168.1.2/vs2015.3.ent_chs.iso.
[teemo] Content MD5: .
[teemo] Redirect URL: .
[teemo] Thread number: 2.
[teemo] Disk Cache Size: 268435456.
[teemo] Target file path: D:\TestDown\vs2015.3.ent_chs.iso.
[teemo] Load exist slice success.
<1> [010485759] (10485759), Disk: 10485760, Buffer: 020971519] (10485759), Disk: 10485760, Buffer: 0
<2> [10485760
<3> [2097152031457279] (10485759), Disk: 10485760, Buffer: 041943039] (10485759), Disk: 10485760, Buffer: 0
<4> [31457280
<5> [4194304052428799] (10485759), Disk: 10485760, Buffer: 062914559] (10485759), Disk: 10485760, Buffer: 0
<6> [52428800
<7> [6291456073400319] (10485759), Disk: 10485760, Buffer: 083886079] (10485759), Disk: 10485760, Buffer: 0
<8> [73400320
<9> [8388608094371839] (10485759), Disk: 10485760, Buffer: 0104857599] (10485759), Disk: 10485760, Buffer: 0
<10> [94371840
<11> [104857600115343359] (10485759), Disk: 10485760, Buffer: 0125829119] (10485759), Disk: 10485760, Buffer: 0
<12> [115343360
<13> [125829120136314879] (10485759), Disk: 10485760, Buffer: 0146800639] (10485759), Disk: 10485760, Buffer: 0
<14> [136314880
<15> [146800640157286399] (10485759), Disk: 10485760, Buffer: 0167772159] (10485759), Disk: 10485760, Buffer: 0
<16> [157286400
<17> [167772160178257919] (10485759), Disk: 10485760, Buffer: 0188743679] (10485759), Disk: 10485760, Buffer: 0
<18> [178257920
<19> [188743680~199229439] (10485759), Disk: 10485760, Buffer: 0
......
[teemo] CURLOPT_RANGE: -1113587514--1103101953.
[teemo] CURLOPT_RANGE: -1103101754--1092616193.
[teemo] CURLOPT_RANGE: -1092615994--1082130433.
[teemo] CURLOPT_RANGE: -1082130234--1071644673.
[teemo] CURLOPT_RANGE: -1071644474--1061158913.
[teemo] CURLOPT_RANGE: -1061158714--1050673153.
[teemo] CURLOPT_RANGE: -1050672954--1040187393.
[teemo] CURLOPT_RANGE: -1040187194--1029701633.
[teemo] CURLOPT_RANGE: -1029701434--1019215873.
[teemo] CURLOPT_RANGE: -1019215674--1008730113.
[teemo] CURLOPT_RANGE: -1008729914--998244353.
[teemo] CURLOPT_RANGE: -998244154--987758593.
[teemo] CURLOPT_RANGE: -987758394--977272833.
[teemo] CURLOPT_RANGE: -977272634--966787073.
[teemo] CURLOPT_RANGE: -966786874--956301313.
[teemo] CURLOPT_RANGE: -956301114--945815553.
[teemo] CURLOPT_RANGE: -945815354--935329793.
[teemo] CURLOPT_RANGE: -935329594--924844033.
[teemo] CURLOPT_RANGE: -924843834--914358273.
[teemo] CURLOPT_RANGE: -914358074--903872513.
[teemo] CURLOPT_RANGE: -903872314--893386753.
[teemo] CURLOPT_RANGE: -893386554--882900993.
[teemo] CURLOPT_RANGE: -882900794--872415233.
[teemo] CURLOPT_RANGE: -872415034--861929473.
[teemo] CURLOPT_RANGE: -861929274--851443713.
[teemo] CURLOPT_RANGE: -851443514--840957953.
[teemo] CURLOPT_RANGE: -840957754--830472193.
[teemo] CURLOPT_RANGE: -830471994--819986433.
[teemo] CURLOPT_RANGE: -819986234--809500673.
[teemo] CURLOPT_RESUME_FROM_LARGE: -802725888.
[teemo] Downloading end.
[teemo] Start flushing cache to disk.
[teemo] Slice total size error.
看到了下载的进度索引文件
.teemo 和 .efdindex 这2个临时后缀名太长了,windows要求完整的文件路径,少于256字符。这么长的临时后缀太长了,推荐修改为.td 和.ti 这种3个字符的临时后缀名
efdindex 里面保存有文件每个分片的下载进度
我看到只有下载成功和下载暂停2个时机,才会刷新这个文件(EntryHandler::asyncTaskProcess finishDownloadProgress --> flushIndexFile)保存下载进度
那么就想到,下载一个10GB的文件时,分100片,下载完99片时,电脑突然自动更新重启了/蓝屏死机了/程序崩溃了/停电了 等等突发意外情况时,因为程序是直接崩溃退出的,并不会执行flushIndexFile,那么重启电脑后继续下载,则99个分片进度都会丢失,也就是需要完全重新下载,无法断点续传
只有下载了99片,点击暂停,此时会触发flushIndexFile,等下次继续下载时,才会读取到99片的下载进度,只下载剩余1个分片
所以推荐,修改为定时执行以下flushIndexFile或者在每个分片下载成功后都执行一下flushIndexFile。
最后 UNSURE_DOWNLOAD_COMPLETED这个好像没用到,删除掉吧
如果能用mingw编译静态库引入,不依赖其它库做成update自升级工具就太棒啦
Start()方法内部修改
3.1 if (verbose_functor_) {
std::stringstream ss_verbose; .... 如果有输出再生成日志
3.2 multi_ = curl_multi_init(); 移动到 thread_num_ = uncomplete_slice_num; 之前。应该先判断分片是否全部完成,有需要下载的再初始化multi_
3.3 if (!slice->InitCURL(multi_, each_slice_download_speed)) 这里,应该把逻辑改成可以更新url,因为有的文件下载地址是时效的,url会变动。现在的逻辑是url一变就清除上一次旧的indexfile重新开始下载。对时效url(url里面包含时间戳),不合理,不能继续下载
3.4 if (progress_functor_) {
progress_notify_thread_ = std::async(std::launch::async, this { 加上判断,有进度回调再创建进度更新的异步线程。没有回调方法,不需要创建进程
3.5 if (speed_functor_) { speed_notify_thread_ 这里同样是,如果不需要回调,就不创建线程
3.6 if (file_size_ == -1 || (file_size_ > 0 && total_capacity == file_size_)) { if (!CombineSlice()) { 这里的逻辑应该优化
现在的逻辑是,全部分片下载完,合并文件--成功或失败
应该改成:下载完一个分片触发或者定时触发,多次执行合并,直到全部下载完执行最后一次合并--成功或失败
目的:1 遇到大文件下载(10GB),可能用户的硬盘的空间不够用(剩余12GB),全部下载完(分片文件占用10GB)+合并(最终文件占用10GB)会出错---【我注意到你是最后全部清理,不是合并一个分片清理一个】一边下载一边合并可以及早提示失败(下载到第5GB就提示出错退出下载,不需要下载完才出错--最后都是空间不足出错)
目的:2 一边下载一边合并可以避免下载结束时长时间合并文件--有些机械硬盘只有30MB/s的合并速度(读30MB/s+写30MB/s)10GB文件需要合并6分钟。你让用户看着下载速度==0,进度==100%,然后等待6分钟没有任何提示?
优化:3 这个不是重点,但可以在合并一个分片后立即清理这个分片的文件,并在indexfile里面去保存结果,这样用户只有10GB空间,可以下载10GB的文件。而不是只能下载5GB的文件(分片文件5GB+最终文件5GB==总占用10GB)
应该改成:
简单讲就是线程和分片数分离。不管用户设置的是几个线程,当文件体积较大时,强制按100MB分片,10GB文件=100个分片。然后开4个(用户设置的)线程轮训按顺序下载所有分片
这样的逻辑,不管用户下载了多少,停止后恢复,都是剩余文件满线程数下载,只有最后一个分片(100MB)才会和原来一样导致单线程下载。并且用户恢复下载时可以随意的修改线程数。
最后对于大文件下载,本身就应该经常断开重连(小体积多分片),因为有的服务器会减慢长时间一直链接着的用户的下载速度,把带宽分给新用户
当前方式是,下载时,先创建一个临时文件(TargetFile::createNew)
并分配全部文件体积,(CreateFixedSizeFile)
Seek(f, fixed_size - 1, SEEK_SET)
fwrite("", 1, 1, f)
如果要下载的文件较大(10GB)+保存在usb2.0移动硬盘里,则需要较长时间(实际上就是说的机械硬盘写入文件速度慢)
推荐修改为使用各个系统的api快速分配文件空间
mac F_PREALLOCATE
方式一:快速预分配文件体积实际占用为0MB(毫秒级)
方式二:快速预分配文件体积实际占用文件体积(毫秒级)
linux fallocate
快速预分配文件体积实际占用文件体积(毫秒级)
windows
NTFS分区 SetEndOfFile
快速预分配文件体积实际占用文件体积(毫秒级)
FAT32分区 没有办法,不支持快速分配
具体代码github上都有的
最后遇到实在无法快速分配硬盘空间时,判断文件体积是否较大,较大时不予分配文件空间,直接去下载
这样相交之前的fwrite("", 1, 1, f),可以节省非常多的时间+减少硬盘狂转
因为fwrite("", 1, 1, f)在各个系统表现都是,使用0填充整个文件体积,就是完整的在硬盘上写入一便文件,
对于机械硬盘 10GB / 30MB/s =干等5分钟不去下载+硬盘狂转
对于固态硬盘 下载10GB文件却需要写入20GB的数据,减少了硬盘使用寿命
望及时采纳,这是有意义的
当调用
entry_handler.cpp: EntryHandler.requestFileInfo()
...
entry_handler.cpp: EntryHandler.asyncTaskProcess()
...
teemo.cpp: Teemo.start()
时,
其中,检测了if (!fetch_file_info_curl),显然第一次是nullptr,留意到fetch_file_info_curl_的类型ScopedCurl的构造函数是curl_easy_init创建一个新的CURL*,但在这个获取文件信息的操作中并没有对这个CURL*设置代理。(我上次搜索你在哪里设置proxy也是在方法里看到了你注释的那一行)。
因此下载文件产生了Result的31 FETCH_FILE_INFO_FAILED错误。
我简单看了一下,现在的分片下载逻辑是,
创建slice时, status_(UNFETCH)
SliceManager里循环抓到一个UNFETCH的slice执行下载
slice::start时标记为DOWNLOADING
当一个slice下载成功或失败后,就不管他了,继续抓下一个UNFETCH的slice,等最后全部分片处理完,检查文件是否下载成功。
所以slice的下载,本身没有出错重试机制
我在下载mp4文件时,按分片下载,下载了一段时间后,使用一些播放器去播放.teemo 文件,实现边下边播
现在遇到某一个slice下载失败,就无法继续播放了。所以希望slice下载失败后,可以立即重新尝试继续下载(比如重试3次),实在没办法了再继续下载下一个分片
另外从逻辑上,
所以不论是从边下边播的需求,还是从正常下载的需求上看,都需要分片出错自动重试机制
用teemo来代替CURL时发现teemo不支持设置代理,teemo的源码也没有proxy相关内容。
我的目标是用户能根据配置文件灵活地决定用不用代理,用什么代理地址,正如CURL Easy API能做的那样。
似乎加上几个option以及在CURL中处理就能够支持。
Java的URLConnection设置Proxy的方式应该很适合teemo的风格,希望日后能支持代理设置。
在linux下面使用此源码,curl 版本 7.29.0, 执行make时发生下面的问题:
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp: In function ‘void teemo::{anonymous}::locking_function(int, int, const char*, int)’:
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:39:23: error: ‘pthread_mutex_lock’ was not declared in this scope
#define MUTEX_LOCK(x) pthread_mutex_lock(&(x))
^
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:48:5: note: in expansion of macro ‘MUTEX_LOCK’
MUTEX_LOCK(mutex_buf[n]);
^~~~~~~~~~
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:39:23: note: suggested alternative: ‘pthread_mutex_t’
#define MUTEX_LOCK(x) pthread_mutex_lock(&(x))
^
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:48:5: note: in expansion of macro ‘MUTEX_LOCK’
MUTEX_LOCK(mutex_buf[n]);
^~~~~~~~~~
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:40:25: error: ‘pthread_mutex_unlock’ was not declared in this scope
#define MUTEX_UNLOCK(x) pthread_mutex_unlock(&(x))
^
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:50:5: note: in expansion of macro ‘MUTEX_UNLOCK’
MUTEX_UNLOCK(mutex_buf[n]);
^~~~~~~~~~~~
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:40:25: note: suggested alternative: ‘pthread_mutex_t’
#define MUTEX_UNLOCK(x) pthread_mutex_unlock(&(x))
^
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:50:5: note: in expansion of macro ‘MUTEX_UNLOCK’
MUTEX_UNLOCK(mutex_buf[n]);
^~~~~~~~~~~~
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp: In function ‘long unsigned int teemo::{anonymous}::id_function()’:
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:41:19: error: ‘pthread_self’ was not declared in this scope
#define THREAD_ID pthread_self()
^
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:54:26: note: in expansion of macro ‘THREAD_ID’
return ((unsigned long)THREAD_ID);
^~~~~~~~~
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:41:19: note: suggested alternative: ‘pthread_key_t’
#define THREAD_ID pthread_self()
^
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:54:26: note: in expansion of macro ‘THREAD_ID’
return ((unsigned long)THREAD_ID);
^~~~~~~~~
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp: In function ‘int teemo::{anonymous}::THREAD_setup()’:
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:37:24: error: ‘pthread_mutex_init’ was not declared in this scope
#define MUTEX_SETUP(x) pthread_mutex_init(&(x), NULL)
^
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:63:5: note: in expansion of macro ‘MUTEX_SETUP’
MUTEX_SETUP(mutex_buf[i]);
^~~~~~~~~~~
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:37:24: note: suggested alternative: ‘pthread_mutex_t’
#define MUTEX_SETUP(x) pthread_mutex_init(&(x), NULL)
^
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:63:5: note: in expansion of macro ‘MUTEX_SETUP’
MUTEX_SETUP(mutex_buf[i]);
^~~~~~~~~~~
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp: In function ‘int teemo::{anonymous}::THREAD_cleanup()’:
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:38:26: error: ‘pthread_mutex_destroy’ was not declared in this scope
#define MUTEX_CLEANUP(x) pthread_mutex_destroy(&(x))
^
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:76:5: note: in expansion of macro ‘MUTEX_CLEANUP’
MUTEX_CLEANUP(mutex_buf[i]);
^~~~~~~~~~~~~
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:38:26: note: suggested alternative: ‘pthread_mutexattr_t’
#define MUTEX_CLEANUP(x) pthread_mutex_destroy(&(x))
^
/data2/workspace/outer_third_party/teemo/src/curl_utils.cpp:76:5: note: in expansion of macro ‘MUTEX_CLEANUP’
MUTEX_CLEANUP(mutex_buf[i]);
^~~~~~~~~~~~~
make[2]: *** [src/CMakeFiles/teemo.dir/curl_utils.cpp.o] Error 1
make[1]: *** [src/CMakeFiles/teemo.dir/all] Error 2
make: *** [all] Error 2
我看头文件中确实也没有包含 thread.h 头文件,能帮忙看下吗?不知是否是我的用法有问题。
Can add support for SHA512?
Ubuntu 18.04.4环境引用库的时候,报错CREATE_TARGET_FILE_FAILED,
切换到root用户时候,还是报同样的错,而且也是报同样的错误
Total: 93ms
1在下载时,应该支持传入自定义httpheader
有的下载链接,需要特定UserAgent,有的需要检查Cookie,有的有referrer防盗链
所以应该支持传入自定义的header。
EntryHandler::requestFileInfo
Slice::start
用这个库都是程序内api调用,
std::shared_future Teemo::start(
const utf8string& url,
const utf8string& target_file_path,
这里未对传入的target_file_path进行任何检查
总之,应该检查、格式化、清理 一遍传入的target_file_path。确保无论用户传入的是什么,最大限度可以继续下载,而不是直接报错返回路径错误
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.