bfabiszewski / libmobi Goto Github PK

View Code? Open in Web Editor NEW

419.0 419.0 70.0 47.19 MB

C library for handling Kindle (MOBI) formats of ebook documents

License: GNU Lesser General Public License v3.0

Shell 0.54% C 96.15% Makefile 0.46% M4 1.31% Roff 0.79% CMake 0.74%

c ebook kindle library

libmobi's People

Contributors

Stargazers

Watchers

Forkers

b-rich torinkwok maciejgad linuxsong etopian gabb99 a3linux hunaocode clee aksdfauytv kerasking gregko htaohongtao dean-wong zhangfh yinyue200 godblessu gujiaxi huxiaomao sonywork whiskey0201 gale320 haoyi guoyu07 morbatex yongweihu liudeng lnhieuvn knifer anxiaoweidui badibadiola faceweb simon987 ncrusher74 qq641472246 crackercat ababook codetheweb kaich houlin occia xuxu5112 xiaogdgenuine dungpt0393 qanu-survey 2h1p1n9x quickdict mbrukman nchungdev dreshetnyak mahaloz zinwalin zha0 quinnan-gill ofanweio gerhobbelt sourcecafe fingerart andrewiethoff anezih joeyschmoe cutelicense baowuwolf eric-wish tianziyao lixiaoyu0123 nolan57 scantist-ossops-m2

libmobi's Issues

Another Out-of-bound read vulnerability caused by incomplete check inside `mobi_decompress_huffman_internal`

Detail bug report is at here. Developer can access it by logging in.

Bug: Integer overflow parsing record offsets

There is an error parsing the records offsets in mobi_load_rec. If the next record offset is lower than the previous that results in a negative size that overflows the unsigned integer, so the malloc in mobi_load_recdata can be enormous.

        if (curr->next != NULL) {
            next = curr->next;
            size = next->offset - curr->offset; // <- integer overflow here
        } else {
           ....stripped
        }

        curr->size = size;
        ret = mobi_load_recdata(curr, file); // -> malloc(curr->size); -> enormous malloc

Here is sample that shows this behaviour:
sample.zip

Can't convent pdb files to epub

Please check I can't convert that pdb to epub
"Error while loading document (Unsupported document format)"
PDB.zip

I am confused about a function.

_buffer_get_varlen I am puzzled by this function, why should I read 7 bit, Stops when byte has bit 7 set, I am also confused about this condition. Should not be a step-by-step read 8 bit

Can't get image from mobi

Hello @bfabiszewski
I am using your another lib QLMobi combine with libmobi to parse html and images from mobi book.
Most book works great, but some books can not get media image.
I have try to fix but can not get the point. Hope you can help,this is the last problem for me i think~
Both QLMobi and libmobi are great nearly perfect lib.
Thank you very much for your great job~
World of Warcraft - Dawn of the Aspects Part I.mobi.zip

Also i am the developer of Alook Browser - 2x Speed (https://itunes.apple.com/us/app/alook-web-browser-2x-speed/id1261944766?mt=8) if you are using iOS ，and here is a promotional code JWYTH3FE4JJK
Forgive my poor english~
Best Regards.

Trying to get in touch regarding a security issue

Hey there!

I'd like to report a security issue but cannot find contact instructions on your repository.

If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

Thank you for your consideration, and I look forward to hearing from you!

(cc @huntr-helper)

AddressSanitizer: heap-buffer-overflow at buffer.c:212

We found with our fuzzer several heap-buffer-overflow errors when compiling libmobi with address sanitizer and run with the command mobitool -i7m $file. Someone else also found a few others here.

We will list them separately in the following issue threads and this is the 1st one.

POC (proof-of-crash) files:
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c%3A212_1.mobi
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c%3A212_2.mobi

gdb output:
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c:212_1.mobi.gdb.txt
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c:212_2.mobi.gdb.txt

AddressSanitizer: heap-buffer-overflow at util.c:2759

POC files:
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_util.c%3A2759_1.mobi
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_util.c%3A2759_2.mobi

gdb output:
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_util.c%3A2759_1.mobi.gdb.txt
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_util.c%3A2759_2.mobi.gdb.txt

tarball is missing for latest releases (e.g. 0.6)

Github-provided tarballs are pretty bad, because the size is about 45 megabytes

convert mobi ebook to epub error

convert mobi file to epub format successfully, but the epub file format is error, it can't be opened by iBooks and many android epub readers. I check the epub file with calibre-edit, and get the error below:

ERROR: Parsing failed: xmlParseEntityRef: no name, line 1, column 807    [OEBPS/part00000.html]
INFO: File too large    [OEBPS/part00000.html]

123_test.epub.zip

How to get the HTML content with the specified sequence number as fast as possible?

When i parse big file. Method MOBI_RET mobi_parse_rawml(MOBIRawml *rawml, const MOBIData *m) is too slow.

How to get the HTML content with the specified sequence number as fast as possible?

Mobi file can't parse

Mobi parse failed but can be open in Kindle app.
File is in attachment.
Thank you for your great work~
Best Regards.
World of Warcraft - Dawn of the Aspects Part I.mobi.zip

Out-of-bound read vulnerability caused by incomplete check inside `mobi_decompress_huffman_internal`

Developer can access the bug detail at here.

The Chinese characters in the printout information are displayed as garbled characters

64-bit Simplified Chinese version

how can i use this lib to get the mobi cover image?

missing libmobi.rc

libmobi.rc is missing from repo.

AZW3 file generates table of contents that does not work

I have an AZW3 file that I cannot post publicly, but could send you by email for testing. When converted to EPUB, it generates non-functional Table of Contents (TOC) - the chapter names are correct, but links do not work. The TOC entries are like:

  <navPoint id="toc-2" playOrder="2">
   <navLabel>
    <text>CAP&amp;Iacute;TULO II: Otra mudanza ca&amp;oacute;tica</text>
   </navLabel>
   <content src="part00000.html#"/>
  </navPoint>

Note that '' is missing a tag after # character. The same happens with internal links in ebook text to the chapter titles. The same file converts fine to EPUB e.g. with Calibre.

BTW, tried to email you privately about this first, but the email does not go through and sits in the retry queue. Your own mail server at your .net domain says that your email address is graylisted...

Greg

AddressSanitizer: heap-buffer-overflow at buffer.c:230

POC files:
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c%3A230_1.mobi
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c%3A230_2.mobi

gdb output:
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c%3A230_1.mobi.gdb.txt
https://github.com/ntu-sec/pocs/blob/master/libmobi/hbo_buffer.c%3A230_2.mobi.gdb.txt

[Feature Request] Please add an option to mobitool and user can extract .mobi or .azw3 only.

Currently mobitool -t can output both .mobi and .azw3.
If the user has thousands of mobi files and he only want .azw3 files, generating all the .mobi and .azw3 and then delete all .mobi is waste of time and disk space.
Please add an option like -k8 or -k7 to the mobitool and the user can extract only one type of them.

README question: can libmobi also create new documents from scratch?

The README lists a lot of features, but they're all apparently centered around reading or modifying an existing file.

Can libmobi also create new ebooks from scratch? (For use in an EPUB->MOBI conversion software) If yes, maybe another bullet point in the README clarifying that would be useful 🙂

Thanks for creating this cool library!

MOBI support added to Darkthumbs ...

thanks to your project!

DarkThumbs

Please add .kfx support

Please add support new Kindle format KFX
(sample attached)
sample.zip

Mobi parse failed but can be open in Kindle app.

Mobi parse failed but can be open in Kindle app.
File is in attachment.
请停止无效努力.mobi.zip

Thank you for your great work~
Best Regards.

Homebrew formula

Homebrew is an awesome package manager for macOS. If you add a brew formula, i.e. libmobi.rb, it will get very convenient to install libmobi on macOS.

MOBI_ATTRNAME_MAXSIZE 100 for some books it's not enought

Please increase MOBI_ATTRNAME_MAXSIZE and MOBI_ATTRVALUE_MAXSIZE to 150

#define MOBI_ATTRNAME_MAXSIZE 150 /< Maximum length of tag attribute name, like "href" */
#define MOBI_ATTRVALUE_MAXSIZE 150 /< Maximum length of tag attribute value */

thanks

convert azw3 ebook to epub error

printf("Could not initialize zip archive\n");
Here is the link to the file I tested.
https://1drv.ms/u/s!AkaVccfysLmAhI5Odqj2pZ1QCMci6g?e=U9lkC3

Can't convert mobi file to epub

HI:
I use create_epub(const MOBIRawml *rawml, const char *fullpath) ,create epub file, but epub file Wrong format, can't open; thanks

Amazon azw4 format?

Cześć Bartek!
One of the users of my app (@voice Aloud Reader in Google Play) sent me the first azw4 ebook. Do you think you could include this format into your library? Would you need any help with this? I was able to convert the file to epub using the latest calibre, but the format is weird - short lines about 80 characters long formatted as <p>...</p>. Could be a problem with this original file, or calibre's conversion process, don't know at this time.

OK, just managed to update my old Kindle HDX 3rd generation, and it opened the azw4 file fine, no problem with formatting there. Apparently Calibre's conversion is not perfect yet. Please let me know if you have any plans regarding AZW4. Thanks!

Grzesiek

an issue with the function implementation of mobi_buffer_get_varlen_internal in src/buffer.c

When you create a MOBIBuffer object:

    typedef struct {
    size_t offset; /**< Current offset in respect to buffer start */
    size_t maxlen; /**< Length of the buffer data */
    unsigned char *data; /**< Pointer to buffer data */
    MOBI_RET error; /**< MOBI_SUCCESS = 0 if operation on buffer is successful, non-zero value on failure */
} MOBIBuffer;

the initial value of buf->offset is 0:

MOBIBuffer * mobi_buffer_init_null(unsigned char *data, const size_t len) {
    MOBIBuffer *buf = malloc(sizeof(MOBIBuffer));
    if (buf == NULL) {
        debug_print("%s", "Buffer allocation failed\n");
        return NULL;
    }
    buf->data = data;
    buf->offset = 0;
    buf->maxlen = len;
    buf->error = MOBI_SUCCESS;
    return buf;
}

I think there is a problem calling mobi_buffer_get_varlen_internal when direction is -1(read buffer backwards) with a value of buf->offset that is 3.
If buf->offset is 3, it should Reads maximum 4 bytes from the buffer. Stops when byte has bit 7 set.
so it should read byte number 3, byte number 2, byte number 1, and then byte number 0.
but when it comes to read byte number 0, we can see the following check at line 267:
if (buf->offset < 1) {
it checks if zero is less than 1 and it is, so an error is printed and only the last 3 bytes that have been read return and not the 4.
(even though according to pull request it should return 0)

if it needs to read byte number 0 - it should read it and then return without decrementing buf->offset of 0 because if it does it, it will lead to an integer underflow and we will get the max value for size_t in buf->offset, so I suggest checking if it is 0 after reading the byte to the value byte and after updating the value of val, and if buf->offset is 0,
we should check byte_count and according to that decide whether to execute

                debug_print("%s", "End of buffer\n");
                buf->error = MOBI_BUFFER_END;
                return 0;

or to set byte to stop_flag so it will stop reading and return val, while keeping buf->offset at 0,

when i make and install in mac osx10.12.4, it tell me can`t find the mobi.h

main.c:10:11: fatal error: 'mobi.h' file not found
how can i fix this error.

How to build it as dll?

How can I build this as a dll so that I can consume in a C# project?

Mobi file can't parse.

Thank you and sorry to trouble you aging.
Best Regards.

pg1342-images.mobi.zip

enabling MOBI_DEBUG on Windows

I am getting a CMake error if I enable MOBI_DEBUG on Windows (VS 2022):

cl : command line error D8021: invalid numeric argument '/Wextra'

toc.ncx is sometimes created with wrong navigation labels

The issue is with "World of Warcraft - Dawn of the Aspects Part I.mobi" ebook file, submitted with "Mobi file can't parse #10" by @LiuDeng:

The toc.ncx that libmobi generates from this file has wrong links. For example for "Part I" we have in toc.ncx:

However, there is no element with id "0000006908" in part00000.html at all. Instead, "Part I" header is preceded with:

Could you maybe tell me where and how the toc.ncx is constructed, maybe then I could find a fix on my own...

Greg

Request for CHM support

It would be good if you can add covert chm to epub.
It should be easy for you, to extract chm to the list of html files and create epub with them.
Maybe you can use
https://github.com/jedwing/CHMLib

mobitool -d <file> -o <output directory> ignores -o argument

$ mobitool.exe -d data/googled.mobi -o .
Title: Googled
Author: Ken Auletta
. . . <output snipped> . . .

Dumping rawml...
Saving rawml to data/googled.rawml

Out of bounds write, crash

diff --git a/src/util.c b/src/util.c
index be08b26..8887afd 100644
--- a/src/util.c
+++ b/src/util.c
@@ -1601,7 +1601,7 @@ static MOBI_RET mobi_decompress_content(const MOBIData *m, char *text, FILE *fil
         if (dump) {
             fwrite(decompressed, 1, decompressed_size, file);
         } else {
-            if (text_length > *len) {
+            if (text_length + decompressed_size > *len) {
                 debug_print("%s", "Text buffer too small\n");
                 /* free huff/cdic tables */
                 mobi_free_huffcdic(huffcdic);
-- 
2.7.4