Coder Social home page Coder Social logo

脚本貌似失效了? about getcomic HOT 12 CLOSED

abcfy2 avatar abcfy2 commented on September 18, 2024
脚本貌似失效了?

from getcomic.

Comments (12)

abcfy2 avatar abcfy2 commented on September 18, 2024

应该是腾讯把页面上base64的解析算法改了。

比如任意一页的漫画页: http://ac.qq.com/ComicView/index/id/634670/cid/2

中的源码,都能找到类似的:

<script>
    var DATA        = 'eyJjb21pYyI6ecyJpZCI6NjM0NjcwLCJ0aXRsZbcSI6Ilx1NGYwYVx1NzUzOFx1NjYxZlx1NTM5ZiBFREVOUyBaRVJPIiwcffiY29sbGVjdCI6IjMwMTUiLCJpc0phcGFuQ29taWMfiOmZhbHNlLCJpc0xpZ2h0Tm92ZWwiOmZhbHNlLCJpc0xpZ2h0Q29taWMiOmZhbHNlLCJpc0ZpbmlzaCI6ZmFsc2UsImlzUm9hc3RhYmxlIjp0cnVlLCJlSWQiOiJLbEJQVEVKQlZGUlZDUXNmQWdZQ0FROEpIRUl5In0sImNoYXB0ZXIiOnsiY2lkIjoyLCJjVGl0bGUiOiJcdTY1NmNcdThiZjdcdTY3MWZcdTVmODUiLCJjU2VxIjoiMSIsInZpcFN0YXR1cyI6MSwicHJldkNpZCI6MCwibmV4dENpZCI6NCwiYmxhbmtGaXJzdCI6MSwiY2FuUmVhZCI6dHJ1ZX0sInBpY3R1cmUiOlt7InBpZCI6IjcwMjgiLCJ3aWR0aCI6MTEwMCwiaGVpZ2h0Ijo1NDgsInVybCI6Imh0dHBzOlwvXC9tYW5odWEucXBpYy5jblwvbWFuaHVhX2RldGFpbFwvMFwvMjZfMTBfNTZfZTZjMDhhMzMxNGE4MTY4MThmOGI0NTM4OTY0ODAwZjVfNzAyOC5qcGdcLzAifV0sImFkcyI6eyJ0b3AiOiIiLCJsZWZ0IjpbXSwiYm90dG9tIjp7InRpdGxlIjoiXHU5MDFhXHU3MDc1XHU1OTgzXHU2NzA5XHU1OGYwXHU2ZjJiXHU3NTNiIiwicGljIjoiaHR0cHM6XC9cL21hbmh1YS5xcGljLmNuXC9vcGVyYXRpb25cLzBcLzA3XzEyXzM2X2UyNjY2ZGQ4NTFiMzY1M2NlMDAxMjRkMDk2ZjdlYjEyXzE1NDE1NjU0MDYxODguanBnXC8wIiwidXJsIjoiaHR0cHM6XC9cL3YucXEuY29tXC94XC9wYWdlXC94MDc4NjJrd2VsaS5odG1sIiwid2lkdGgiOiI2NTAiLCJoZWlnaHQiOiIxMTAifX0sImFydGlzdCI6eyJhdmF0YXIiOiJodHRwOlwvXC90aGlyZHFxLnFsb2dvLmNuXC9nP2I9c2RrJms9NjlpY1gwNzFZT0xRZ0R2RVJ1MmhMVHcmcz02NDAmdD0xNDgzMzY2MTE5IiwibmljayI6Ilx1OGJiMlx1OGMwOFx1NzkzZVx1NTMxN1x1NGVhYyIsInVpbkNyeXB0IjoiYUc5elZ6SXplV2RFY0hnMldVUXlVbkUyWm14WWR6MDkifX0=',
        PRELOAD_NUM = 2,
        NOTICE_TIME = 15,
        ROAST_SIZE  = 300,
        ROAST_PRE   = 5,
        ROAST_VIEW  = 11,
        DANMU_TIME  = 10000;
</script>

这个DATA变量中存储的实际是包含了章节图片的json,但是不能直接解析。以前的算法是移除第一个字母后,其余字符串便是标准的Base64。目前看来改了,这个Base64无法解析,需要在页面打断点,找到具体的解析算法了。

from getcomic.

ckz1211 avatar ckz1211 commented on September 18, 2024

有办法在页面的JS函数里找到解密base64的相关段落么?

from getcomic.

abcfy2 avatar abcfy2 commented on September 18, 2024

当初我的做法是直接在chrome的开发者工具中的js打断点,观察变量的变化,找到是哪一段函数解析了DATA,然后只阅读这段代码就能找出来它是怎么解析这段DATA变量的。

经过初步观察应该是在这个js中解析的,需要找到具体是哪一段代码解析了这个DATA: http://ac.gtimg.com/media/js/ac.page.chapter.view_v2.4.0.js?v=20170622

from getcomic.

ckz1211 avatar ckz1211 commented on September 18, 2024

也就是说js打断点的方法局限性很大?没法直接找到具体是哪个函数负责解密?

from getcomic.

abcfy2 avatar abcfy2 commented on September 18, 2024

也不算太难,就是要有耐心。在开发者工具中格式化js代码,一个函数一个函数打断点,基本很快就能找到

from getcomic.

abcfy2 avatar abcfy2 commented on September 18, 2024

而且我已经找到了:
image

的确如我所料,就是那个js中,这个_v变量存储着DATA解析后的json,只要向上找应该就能很快找到具体解析算法了。

from getcomic.

abcfy2 avatar abcfy2 commented on September 18, 2024

我知道了,打断点再加上格式化,已经找到这段算法了,js代码大致如下:

var B = new Base(),
T = W['DATA'].split(''),
N = W['nonce'],
len,
locate,
str;
N = N.match(/\d+[a-zA-Z]+/g);
len = N.length;
while (len--) {
  locate = parseInt(N[len]) & 255;
  str = N[len].replace(/\d+/g, '');
  T.splice(locate, str.length)
}
T = T.join('');
_v = JSON.parse(B.decode(T));

关键在于T变量和N变量,T变量的数值来自于页面的DATA部分,而N变量来自于页面的window.nonce部分,经过下面那一段解密算法还原正确的base64就可以了,只要把这段js代码翻译为python就搞定了。

而且这个B.decode函数还挺阴险,表面上看是反解base64的,但是它的入口函数有一段正则替换:

function Base() {
  _keyStr = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=";
  this.decode = function(c) {
    var a = "",
    b, d, h, f, g, e = 0;
    for (c = c.replace(/[^A-Za-z0-9\+\/\=]/g, ""); e < c.length;) b = _keyStr.indexOf(c.charAt(e++)),
    d = _keyStr.indexOf(c.charAt(e++)),
    f = _keyStr.indexOf(c.charAt(e++)),
    g = _keyStr.indexOf(c.charAt(e++)),
    b = b << 2 | d >> 4,
    d = (d & 15) << 4 | f >> 2,
    h = (f & 3) << 6 | g,
    a += String.fromCharCode(b),
    64 != f && (a += String.fromCharCode(d)),
    64 != g && (a += String.fromCharCode(h));
    return a = _utf8_decode(a)
  };
  _utf8_decode = function(c) {
    for (var a = "",
    b = 0,
    d = c1 = c2 = 0; b < c.length;) d = c.charCodeAt(b),
    128 > d ? (a += String.fromCharCode(d), b++) : 191 < d && 224 > d ? (c2 = c.charCodeAt(b + 1), a += String.fromCharCode((d & 31) << 6 | c2 & 63), b += 2) : (c2 = c.charCodeAt(b + 1), c3 = c.charCodeAt(b + 2), a += String.fromCharCode((d & 15) << 12 | (c2 & 63) << 6 | c3 & 63), b += 3);
    return a
  }
}

除了这一段初始化 c = c.replace(/[^A-Za-z0-9\+\/\=]/g, "")之外,别的算法都是base64 decode算法,直接用标准的base64 decoder就可以搞定,但是T处理之后必须要经过正则再做一次替换,去除掉里面的非法字符才能正确反解。

明天修复这个问题。

from getcomic.

abcfy2 avatar abcfy2 commented on September 18, 2024

fixed. 多谢反馈

from getcomic.

ckz1211 avatar ckz1211 commented on September 18, 2024

居然又失效了……http://ac.qq.com/Comic/comicInfo/id/634670

from getcomic.

abcfy2 avatar abcfy2 commented on September 18, 2024

看来http://ac.qq.com/Comic/comicInfo/id/{}这个接口被废弃掉了,等有时间了看看现在页面是怎么拿到漫画章节列表的吧。

from getcomic.

nhacvina avatar nhacvina commented on September 18, 2024

getComic.py -u http://ac.qq.com/Comic/ComicInfo/id/634393

正在下载第0001话: 预告
下载失败,重试1次
下载失败,重试2次
下载失败,重试3次
下载失败,重试4次
Traceback (most recent call last):
File "./getComic.py", line 338, in
main(url, path, lst, one_folder)
File "./getComic.py", line 294, in main
imgList = getImgList(contentList[i - 1]['url'])
File "./getComic.py", line 105, in getImgList
img_detail_json = __decode_data(data, nonce)
File "./getComic.py", line 166, in __decode_data
json_str = base64.b64decode(base64_str).decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8c in position 7: invalid start byte

from getcomic.

abcfy2 avatar abcfy2 commented on September 18, 2024

已知问题。最近暂时没时间细细研究页面更改。感觉已经被腾讯盯上了,我这边一改那边立刻就改。欢迎递交PR解决。

from getcomic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.