Coder Social home page Coder Social logo

loadchange / amemv-crawler Goto Github PK

View Code? Open in Web Editor NEW
2.3K 2.3K 621.0 1.67 MB

🙌Easily download all the videos from TikTok(amemv).下载指定的 抖音(Douyin) 号的视频,抖音爬虫

License: MIT License

Python 70.39% JavaScript 28.07% Dockerfile 1.54%

amemv-crawler's People

Contributors

loadchange avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amemv-crawler's Issues

好像有比较大的几率下载到“视频不见啦”

不知道从什么版本开始,下载到 ~121KB 的 “视频不见啦” 的空视频的几率变大了,几百个视频里面有十几二十个这样的视频,要一个一个找出来然后删掉重新下载才行,有时候重下的还是这种,要不停尝试才行。

screen shot 2018-07-20 at 11 39 05

7.2号晚上更新了算法,现在怎么破

7.2号晚上更新了算法,现在怎么破
7.2号晚上更新了算法,现在怎么破
7.2号晚上更新了算法,现在怎么破
/!douyin_falcon:node_modules/byted-acrawler/dist/runtime.js/
__M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime",function(l,e){Function(function(l){return'�e(e,a,r){�(b[e]||(b[e]=t("x,y","�x "+e+" y"�)(r,a)}�a(e,a,r){�(k[r]||(k[r]=t("x,y","�new xy"�)(e,a)}�r(e,a,r){�n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t�)s[n="$"+t]=r[n];for(t=0,b=s�=a�;t<b;t�)s[t]=a[t];�c(e,0,s)}�c(t,b,k){�u(e){v[x�]=e}�f�{�g=�,t�ing(b�g)}�l�{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(�h,y,d,g,v=[],x=0;;)switch(g=�){case 1:u(!�)�4:�f��5:u(�(e){�a=0,r=e�;���{�c=a<r;�c&&u(e[a�]),c}}(���6:y=�,u(�(y��8:if(g=�,l��g,g=�,y===c)b+=g;else if(y!==l)�y�9:�c�10:u(s(���11:y=�,u(�+y)�12:for(y=f�,d=[],g=0;g<y�;g�)d[g]=y.charCodeAt(g)^g+y�;u(String.fromCharCode.apply(null,d��13:y=�,h=delete �[y]�14:���59:u((g=�)?(y=x,v.slice(x-=g,y�:[])�61:u(�[�])�62:g=�,k[0]=65599k[0]+k[1].charCodeAt(g)>>>0�65:h=�,y=�,�[y]=h�66:u(e(t[b�],�,���67:y=�,d=�,u((g=�).x===c?r(g.y,y,k):g.apply(d,y��68:u(e((g=t[b�])<"<"?(b--,f�):g+g,�,���70:u(!1)�71:�n�72:�+f��73:u(parseInt(f�,36��75:if(�){b��case 74:g=�<<16>>16�g�76:u(k[�])�77:y=�,u(�[y])�78:g=�,u(a(v,x-=g+1,g��79:g=�,u(k["$"+g])�81:h=�,�[f�]=h�82:u(�[f�])�83:h=�,k[�]=h�84:�!0�85:�void 0�86:u(v[x-1])�88:h=�,y=�,�h,�y�89:u(��{�e�{�r(e.y,arguments,k)}�e.y=f�,e.x=c,e}�)�90:�null�91:�h�93:h=��0:��;default:u((g<<16>>16)-16)}}�n=this,t=n.Function,s=Object.keys||�(e){�a={},r=0;for(�c in e)a[r�]=c;�a�=r,a},b={},k={};�r'.replace(/[�-�]/g,function(e){return l[15&e.charCodeAt(0)]})}("v[x++]=�v[--x]�t.charCodeAt(b++)-32�function �return �))�++�.substr�var �.length�()�,b+=�;break;case �;break}".split("�")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&eff�kx[!cs"l".Pq%widthl"@q&heightl"vrgetContextx$"2d[!cs#l#,;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2qshadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$j�l s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$bb^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"j�l s&l&z0l!$ +["cs'(0l#i'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0syWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!�+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l &+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl'g,)gk}ejo{�cm,)|ynLijem["cl$b%@d<l&zl'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ',[Object.defineProperty(e,"__esModule",{value:!0})])});

_signature不稳定

七月二号更新算法后,用新的fuck-byted-acrawler.js拿到的_signature有时能用有时失效,这回确定所用的headers 和amemv-video-ripper.py中的header一致,请问可能是什么原因呢?Thanks~

以用户模板抓取会被抖音限制访问频率

当以用户模板(share/user)抓取的时候,连续抓取超过20个模板后,就会被返回空字符串。需要等待一段时间后再抓取,比如一小时。
但是音乐模板并不会被限制。

运行报错

看不懂 js,请问该怎么修正?谢谢。
环境:windows 7, python 3.7

E:\Code\Python3\amemv-crawler-master\fuck-byted-acrawler.js:5
let e = {}
^
SyntaxError: Unexpected identifier
at exports.runInThisContext (vm.js:73:16)
at Module._compile (module.js:443:25)
at Object.Module._extensions..js (module.js:478:10)
at Module.load (module.js:355:32)
at Function.Module._load (module.js:310:12)
at Function.Module.runMain (module.js:501:10)
at startup (node.js:129:16)
at node.js:814:3
Traceback (most recent call last):
File ".\amemv-video-ripper.py", line 412, in
CrawlerScheduler(content)
File ".\amemv-video-ripper.py", line 128, in init
self.scheduling()
File ".\amemv-video-ripper.py", line 147, in scheduling
self.download_videos(params)
File ".\amemv-video-ripper.py", line 159, in download_videos
video_count = self._download_user_media(number, dytk)
File ".\amemv-video-ripper.py", line 201, in _download_user_media
signature = self.generateSignature(str(user_id))
File ".\amemv-video-ripper.py", line 138, in generateSignature
return p.readlines()[0]
IndexError: list index out of range

运行脚本后,爬了几个视频就爬不了了,一段时间后报错,请问要怎么办呢?

Downloading v0200f9a0000bc1ahsmbn5vdr6i8f2h0.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=v0200f9a0000bc1ahsmbn5vdr6i8f2h0&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Downloading ba5798aee12b4bff801598d4abf5cb99.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=ba5798aee12b4bff801598d4abf5cb99&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Downloading 4b784f8aaef2407f935a40e18a6a8811.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=4b784f8aaef2407f935a40e18a6a8811&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.
Downloading f9d5c6cde2fa418f883fc65793c8280d.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=f9d5c6cde2fa418f883fc65793c8280d&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Downloading f88b386af271417987be0b6df7a6065e.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=f88b386af271417987be0b6df7a6065e&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Downloading v0200fbd0000bcjo8tcthbi90j14pcag.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=v0200fbd0000bcjo8tcthbi90j14pcag&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Downloading v0200f9a0000bck76427u0r58fotk7ug.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=v0200f9a0000bck76427u0r58fotk7ug&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.
Downloading v0200fbd0000bc6kaaelg9jt2h43h2ng.mp4 from https://aweme.snssdk.com/aweme/v1/play/?video_id=v0200fbd0000bc6kaaelg9jt2h43h2ng&line=0&ratio=720p&media_type=4&vr_type=0&test_cdn=None&improve_bitrate=0.

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 196, in get_subj_alt_name
ext = cert.extensions.get_extension_for_class(
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\utils.py", line 159, in inner
result = func(instance)
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\x509.py", line 138, in extensions
self._backend, self._x509
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\decode_asn1.py", line 238, in parse
value = handler(backend, ext_data)
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\decode_asn1.py", line 417, in _decode_subject_alt_name
_decode_general_names_extension(backend, ext)
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 1210, in init
self._general_names = GeneralNames(general_names)
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 1163, in init
if not all(isinstance(x, GeneralName) for x in general_names):
File "C:\ProgramData\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 1163, in
if not all(isinstance(x, GeneralName) for x in general_names):
File "C:\ProgramData\Anaconda3\lib\abc.py", line 182, in instancecheck
if subclass in cls._abc_cache:
File "C:\ProgramData\Anaconda3\lib_weakrefset.py", line 75, in contains
return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 379, in
CrawlerScheduler(content)
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 119, in init
self.scheduling()
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 131, in scheduling
self.download_challenge_videos(challenge)
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 143, in download_challenge_videos
video_count = self._download_challenge_media(challenge)
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 280, in _download_challenge_media
video_count = get_aweme_list()
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 276, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 276, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 276, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
[Previous line repeated 947 more times]
File "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py", line 269, in get_aweme_list
res = requests.get(url, headers=self.headers)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\adapters.py", line 440, in send
timeout=timeout
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
chunked=chunked)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 850, in _validate_conn
conn.connect()
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\connection.py", line 337, in connect
cert = self.sock.getpeercert()
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 348, in getpeercert
'subjectAltName': get_subj_alt_name(x509)
File "C:\ProgramData\Anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 202, in get_subj_alt_name
except (x509.DuplicateExtension, x509.UnsupportedExtension,
AttributeError: module 'cryptography.x509' has no attribute 'UnsupportedExtension'
[Finished in 331.1s with exit code 1]
[shell_cmd: python -u "E:\amemv-crawler-master\amemv-crawler-master\amemv-video-ripper.py"]
[dir: E:\amemv-crawler-master\amemv-crawler-master]
[path: C:\Program Files\curl-7.60.0\I386;C:\ffmpeg\bin;C:\ProgramData\Anaconda3;C:\ProgramData\Anaconda3\Scripts;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\bin]

效果超赞!👍👍👍 可否加一个不下载用户收藏(Favorite)的参数

看到这个repo后第一时间下载下来配置并使用,效果很好,没有水印,速度也很快,作者辛苦!
同时发现一个问题:一个用户可能自己的作品不多,但是他喜欢的作品超多(比如我2333),就导致爬虫工作的大部分时间都是在下用户的收藏。但是很多时候我们只需要该作者的作品就够了。
所以是否可以加一个不下载用户收藏的功能,比如 命令类似这样:python amemv-video-ripper.py -nofavorite
最后再次感谢作者的付出,请作者收下我们的膝盖 ^_^

_signature错误

Hi,我在爬用户信息时的也遇到了_signature字段,但是用fuck-byted-acrawler.js拿到的signature却用不了,请问下是怎么回事呢?
用户url: https://www.douyin.com/share/user/2613650662
抓数据的url: https://www.douyin.com/aweme/v1/aweme/post/?user_id=2613650662&count=21&max_cursor=0&aid=1128&_signature=vsISThAa5eS8bwWIdK-OVL7CEl
_signature生成的js: https://s3a.bytecdn.cn/ies/resource/falcon/douyin_falcon/pkg/third_ee38eac.js 里的/!douyin_falcon:node_modules/byted-acrawler/dist/runtime.js/
感谢!~

_signature准确率问题

运行环境 node ,用了大神的算_signature的js 发现有很大概率返回的列表是空值

1k多视频下载600多报错,重复尝试结果一样

Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 849, in validate_conn
conn.connect()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 356, in connect
ssl_context=context)
File "/usr/lib/python3.6/site-packages/urllib3/util/ssl
.py", line 359, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3.6/ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "/usr/lib/python3.6/ssl.py", line 814, in init
self.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 1068, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 445, in send
timeout=timeout
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 367, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/lib/python3.6/site-packages/urllib3/packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 849, in validate_conn
conn.connect()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 356, in connect
ssl_context=context)
File "/usr/lib/python3.6/site-packages/urllib3/util/ssl
.py", line 359, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/usr/lib/python3.6/ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "/usr/lib/python3.6/ssl.py", line 814, in init
self.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 1068, in do_handshake
self._sslobj.do_handshake()
File "/usr/lib/python3.6/ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "amemv-video-ripper.py", line 412, in
CrawlerScheduler(content)
File "amemv-video-ripper.py", line 112, in init
res = requests.get(url, headers=self.headers)
File "/usr/lib/python3.6/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 495, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

fuck-byted-acrawler.js 的js打码是你自己改的吗? 我跳过去有乱码“” 这种,你是怎么解析出来的?

fuck-byted-acrawler.js 的js打码是你自己改的吗? 我跳过去有乱码。。。。
__M.define("douyin_falcon:node_modules/byted-acrawler/dist/runtime", function(l, e) {
Function(function(l) {
return '�e(e,a,r){�(b[e]||(b[e]=t("x,y","�x "+e+" y"�)(r,a)}�a(e,a,r){�(k[r]||(k[r]=t("x,y","�new xy"�)(e,a)}�r(e,a,r){�n,t,s={},b=s.d=r?r.d+1:0;for(s["$"+b]=s,t=0;t<b;t�)s[n="$"+t]=r[n];for(t=0,b=s�=a�;t<b;t�)s[t]=a[t];�c(e,0,s)}�c(t,b,k){�u(e){v[x�]=e}�f�{�g=�,t�ing(b�g)}�l�{try{y=c(t,b,k)}catch(e){h=e,y=l}}for(�h,y,d,g,v=[],x=0;;)switch(g=�){case 1:u(!�)�4:�f��5:u(�(e){�a=0,r=e�;���{�c=a<r;�c&&u(e[a�]),c}}(���6:y=�,u(�(y��8:if(g=�,l��g,g=�,y===c)b+=g;else if(y!==l)�y�9:�c�10:u(s(���11:y=�,u(�+y)�12:for(y=f�,d=[],g=0;g<y�;g�)d[g]=y.charCodeAt(g)^g+y�;u(String.fromCharCode.apply(null,d��13:y=�,h=delete �[y]�14:���59:u((g=�)?(y=x,v.slice(x-=g,y�:[])�61:u(�[�])�62:g=�,k[0]=65599k[0]+k[1].charCodeAt(g)>>>0�65:h=�,y=�,�[y]=h�66:u(e(t[b�],�,���67:y=�,d=�,u((g=�).x===c?r(g.y,y,k):g.apply(d,y��68:u(e((g=t[b�])<"<"?(b--,f�):g+g,�,���70:u(!1)�71:�n�72:�+f��73:u(parseInt(f�,36��75:if(�){b��case 74:g=�<<16>>16�g�76:u(k[�])�77:y=�,u(�[y])�78:g=�,u(a(v,x-=g+1,g��79:g=�,u(k["$"+g])�81:h=�,�[f�]=h�82:u(�[f�])�83:h=�,k[�]=h�84:�!0�85:�void 0�86:u(v[x-1])�88:h=�,y=�,�h,�y�89:u(��{�e�{�r(e.y,arguments,k)}�e.y=f�,e.x=c,e}�)�90:�null�91:�h�93:h=��0:��;default:u((g<<16>>16)-16)}}�n=this,t=n.Function,s=Object.keys||�(e){�a={},r=0;for(�c in e)a[r�]=c;�a�=r,a},b={},k={};�r'.replace(/[�-�]/g, function(e) {
return l[15 & e.charCodeAt(0)]
})
}("v[x++]=�v[--x]�t.charCodeAt(b++)-32�function �return �))�++�.substr�var �.length�()�,b+=�;break;case �;break}".split("�")))()('gr$Daten Иb/s!l y͒yĹg,(lfi~ah`{mv,-n|jqewVxp{rvmmx,&eff�kx[!cs"l".Pq%widthl"@q&heightl"vr
getContextx$"2d[!cs#l#,;?|u.|uc{uq$fontl#vr(fillTextx$$龘ฑภ경2<[#c}l#2qshadowBlurl#1q-shadowOffsetXl#$$limeq+shadowColorl#vr#arcx88802[%c}l#vr&strokex[ c}l"v,)}eOmyoZB]mx[ cs!0s$l$Pb<k7l l!r&lengthb%^l$1+s$j�l s#i$1ek1s$gr#tack4)zgr#tac$! +0o![#cj?o ]!l$b%s"o ]!l"l$bb^0d#>>>s!0s%yA0s"l"l!r&lengthb<k+l"^l"1+s"j�l s&l&z0l!$ +["cs'(0l#i'1ps9wxb&s() &{s)/s(gr&Stringr,fromCharCodes)0syWl ._b&s o!])l l Jb<k$.aj;l .Tb<k$.gj/l .^b<k&i"-4j!�+& s+yPo!]+s!l!l Hd>&l!l Bd>&+l!l &+l!l 6d>&+l!l &+ s,y=o!o!]/q"13o!l q"10o!],l 2d>& s.{s-yMo!o!]0q"13o!]*Ld<l 4d#>>>b|s!o!l q"10o!],l!& s/yIo!o!].q"13o!],o!]*Jd<l 6d#>>>b|&o!]+l &+ s0l-l!&l-l!i'1z141z4b/@d<l"b|&+l-l(l!b^&+l-l&zl'g,)gk}ejo{�cm,)|ynLijem["cl$b%@d<l&zl'l $ +["cl$b%b|&+l-l%8d<@b|l!b^&+ q$sign ', [Object.defineProperty(e, "__esModule", {
value: !0
})])
});

下载challenge的分享链接出现问题

可以下载大概十几个视频,但接下来就会报错,查了一下好像是超过最大递归次数,但深层原因就不知道了
Traceback (most recent call last):
File "amemv-video-ripper.py", line 412, in
CrawlerScheduler(content)
File "amemv-video-ripper.py", line 128, in init
self.scheduling()
File "amemv-video-ripper.py", line 150, in scheduling
self.download_challenge_videos(challenge)
File "amemv-video-ripper.py", line 165, in download_challenge_videos
video_count = self._download_challenge_media(challenge)
File "amemv-video-ripper.py", line 309, in _download_challenge_media
video_count = get_aweme_list()
File "amemv-video-ripper.py", line 305, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
File "amemv-video-ripper.py", line 305, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
File "amemv-video-ripper.py", line 305, in get_aweme_list
return get_aweme_list(contentJson.get('cursor'), video_count)
[Previous line repeated 947 more times]
File "amemv-video-ripper.py", line 298, in get_aweme_list
res = requests.get(url, headers=self.headers)
File "D:\Anaconda3\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "D:\Anaconda3\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "D:\Anaconda3\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "D:\Anaconda3\lib\site-packages\requests\sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "D:\Anaconda3\lib\site-packages\requests\adapters.py", line 440, in send
timeout=timeout
File "D:\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
chunked=chunked)
File "D:\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "D:\Anaconda3\lib\site-packages\urllib3\connectionpool.py", line 850, in _validate_conn
conn.connect()
File "D:\Anaconda3\lib\site-packages\urllib3\connection.py", line 337, in connect
cert = self.sock.getpeercert()
File "D:\Anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 348, in getpeercert
'subjectAltName': get_subj_alt_name(x509)
File "D:\Anaconda3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 196, in get_subj_alt_name
ext = cert.extensions.get_extension_for_class(
File "D:\Anaconda3\lib\site-packages\cryptography\utils.py", line 158, in inner
result = func(instance)
File "D:\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\x509.py", line 137, in extensions
self._backend, self._x509
File "D:\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\decode_asn1.py", line 249, in parse
value = handler(backend, ext_data)
File "D:\Anaconda3\lib\site-packages\cryptography\hazmat\backends\openssl\decode_asn1.py", line 428, in _decode_subject_alt_name
_decode_general_names_extension(backend, ext)
File "D:\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 1008, in init
self._general_names = GeneralNames(general_names)
File "D:\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 964, in init
if not all(isinstance(x, GeneralName) for x in general_names):
File "D:\Anaconda3\lib\site-packages\cryptography\x509\extensions.py", line 964, in
if not all(isinstance(x, GeneralName) for x in general_names):
File "D:\Anaconda3\lib\abc.py", line 182, in instancecheck
if subclass in cls._abc_cache:
File "D:\Anaconda3\lib_weakrefset.py", line 75, in contains
return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison

能爬取视频属性信息吗?

请问能够爬取视频属性信息吗?比如什么时候发布的,视频宽度和高度,点赞数,评论数,播放数这些

spider based on douyin id

系统给的id不能用来抓取了吗,怎么感觉现在像是通过分享动态生成一个非常短时效的id从接口来拿数据

第一次换cookie有用,下载了N个后,现在又出现新的错。

requests.exceptions.SSLError: HTTPSConnectionPool(host='api.amemv.com', port=443): Max retries exceeded with url: /aweme/v1/challenge/search/?ac=WIFI&app_name=aweme&vid=2ED370A7-F09C-4C9E-90F5-872D57F3127C&as=a1c5600cb7576a7e273418&device_type=iPhone8,2&os_api=18&build_number=17805&version_code=1.7.8&ts=1524105474&app_version=1.7.8&channel=App%20Store&device_platform=iphone&mas=008c37d4eaf9b158c3d1b7e3fc0d66008dc45306aae0ff5380d6a8&screen_width=1242&search_source=challenge&iid=28175672430&idfa=00000000-0000-0000-0000-000000000000&openudid=20dae85eeac1da35a69e2a0ffeaeef41c78a2e97&device_id=46166717995&count=20&keyword=%E7%BE%8E%E9%A3%9F%E7%BE%8E%E9%A3%9F%E7%BE%8E%E9%A3%9F&cursor=0&aid=1128 (Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:661)'),))
换了下同事的一个cookie也一样哦

_signature生成分析

“_signature,该字段是由
douyin_falcon:node_modules/byted-acrawler/dist/runtime生成的”, 请问这个你是如何分析出来的?有点好奇

生成的signature不行

我把你的fuck-byted-acrawler.js下载到我本地,用下面的命令生成signature

node fuck-byted-acrawler.js 58585956426

然后带入到url里面

https://www.douyin.com/aweme/v1/aweme/post/?user_id=58585956426&count=21&max_cursor=0&aid=1128&_signature=8Z4HuwAAq4AOYfhE08pWY.GeB6

返回结果为空数组

用pycharm运行出现这个错误

'node' �����ڲ����ⲿ���Ҳ���ǿ����еij���
���������ļ���
Traceback (most recent call last):
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 411, in
CrawlerScheduler(content)
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 128, in init
self.scheduling()
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 150, in scheduling
self.download_challenge_videos(challenge)
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 165, in download_challenge_videos
video_count = self._download_challenge_media(challenge)
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 276, in _download_challenge_media
signature = self.generateSignature(str(challenge_id))
File "C:/Users/500/Desktop/github/amemv-crawler-master/amemv-crawler-master/amemv-video-ripper.py", line 138, in generateSignature
return p.readlines()[0]
IndexError: list index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.