Coder Social home page Coder Social logo

vandry / mairix Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rc0/mairix

18.0 18.0 8.0 699 KB

mairix is a program for indexing and searching email messages stored in Maildir, MH or mbox folders

Home Page: http://www.rc0.org.uk/mairix

License: GNU General Public License v2.0

C 78.94% Perl 0.49% Shell 8.97% Makefile 1.50% Yacc 1.41% Lex 0.76% Roff 7.94%

mairix's People

Contributors

clausa avatar dfandrich avatar dscho avatar edgewood avatar foxharp avatar jikamens avatar jsagarribay avatar makoshark avatar mika-fischer avatar mlichvar avatar okapia avatar peterjeremy avatar psoberoi avatar radhermit avatar rc0 avatar rhertzog avatar samueltardieu avatar slumos avatar snarkophilus avatar spwhitton avatar vandry avatar weisslj avatar yurivict avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mairix's Issues

Odd numbering of search results with mformat=mh

Hi there,

[using Debian's mairix 0.24-2]

mairix looks amazingly useful; I'm embarrassed to have not tried it many years ago.

One question, on something I can't find any information on in the man pages: search results in MH format ("mformat=mh") produce a folder with really wild message numbers โ€” e.g. my last search got results numbered 7091 and 7094. There's nothing to stop me renumbering the folder after running a search, but it seems odd; is there a reason for this, or a way to make mairix number them consecutively from 1?

Best,
Conrad

Lots of headers that can't be parsed

I just downloaded the .zip file and compiled mairix. When I run it (and this is the same as V0.24), I get many complaints about headers that can't be parsed. For example:

Header 'content-type: image/*; name="20221017_130844_resized.jpg"' in [89420989,90670144) could not be parsed

I'm not a mail wizard, but that looks OK to me.

A more lengthy example:

Header 'content-disposition: inline; filename="image004.png"; size=79197; creation-date=Fri, 06 May 2022 16:51:48 GMT; modification-date=Fri, 06 May 2022 20:09:01 GMT' in [28093802,28267769) could not be parsed

Q1: Is it just me, or is this happening to other people?

Q2: Are these complaints valid, or are they spurious?

Thanks.

changing max number of mailboxes or max number of messages per mailbox

Back in August 2016 in the rc0/mairix git repository, spwhitton asked about increasing the number of mailboxes. I just replied:

The number of mailboxes and the number of messages in each mailbox are stored in the same unsigned integer (32 bits), with 16 bits used for each number. The number of mailboxes is in the upper 16 bits; the number of messages per mailbox is in the lower 16 bits.

It's not hard to re-proportion the number of bits used for each, i.e. by decreasing the number of mailboxes to increase the number of messages or vice-versa. You only need to modify encode_mbox_indices() and decode_mbox_indices() in mbox.c.

I did this in an earlier version of mairix, reducing the number of mailboxes to 8 bits and increasing the number of messages to 24 bits.

Of course you have to rebuild all your mairix index files if you do this.

Here's the approximate diff I used for mairix 0.22 mbox.c:

1027c1059,1063
< unsigned int encode_mbox_indices(unsigned int mb, unsigned int msg)/*{{{*/
---
> #define SHIFTBITS 24  /* how many bits to use to count messages */
> #define SHIFTMASKMBS ((1<<(32-SHIFTBITS))-1)
> #define SHIFTMASKMSGS ((1<<SHIFTBITS)-1)
>
> inline unsigned int encode_mbox_indices(unsigned int mb, unsigned int msg)/*{{{*/
1029,1031c1065
<   unsigned int result;
<   result = ((mb & 0xffff) << 16) | (msg & 0xffff);
<   return result;
---
>   return (mb << SHIFTBITS) | msg;
1034c1068
< void decode_mbox_indices(unsigned int index, unsigned int *mb, unsigned int *msg)/*{{{*/
---
> inline void decode_mbox_indices(unsigned int myindex, unsigned int *mb, unsigned int *msg)/*{{{*/
1036,1037c1070,1071
<   *mb = (index >> 16) & 0xffff;
<   *msg = (index & 0xffff);
---
>   *mb = (myindex >> SHIFTBITS) & SHIFTMASKMBS;
>   *msg = (myindex & SHIFTMASKMSGS);
1044,1045c1078,1080
<   if (db->n_mboxen > 65536) {
<     fprintf(stderr, "Too many mboxes (max 65536, you have %d)\n", db->n_mboxen);
---
>   if (db->n_mboxen >= SHIFTMASKMBS) {
>     fprintf(stderr, "Too many mboxes (max %d, you have %d)\n",
>               SHIFTMASKMBS, db->n_mboxen);
1050,1052c1085,1087
<     if (db->mboxen[i].n_msgs > 65536) {
<       fprintf(stderr, "Too many messages in mbox %s (max 65536, you have %d)\n",
<               db->mboxen[i].path, db->mboxen[i].n_msgs);
---
>     if (db->mboxen[i].n_msgs >= SHIFTMASKMSGS) {
>       fprintf(stderr, "Too many messages in mbox %s (max %d, you have %d)\n",
>               db->mboxen[i].path, SHIFTMASKMSGS, db->mboxen[i].n_msgs);

Update NEWS file

Can the NEWS file be updated with a new release? The last release was in 2017, version 0.24. Since then, there have been many new commits. The version 2017 has many bugs that have since been fixed. Many platforms such as Archlinux go by the official release number. So if the NEWS is not updated, then they will point to the last official release, which in this case is OLD.

Thank you!

Explicitly handle SIGPIPE

When running mairix in a pipeline, mairix does not delete the lock file after receiving SIGPIPE, so running something like mairix --excerpt-output a:ericpruitt | less results in Database .../mairixdb appears to be locked by (pid,node,user)=(6387,sinister,ericpruitt) the next time around if less(1) was closed before mairix finished writing data the standard output.

Release 0.25?

Is there a roadmap or list of work to be done prior to a 0.25 release? I would love to have some of the new features added since 0.24, particularly the XZ archive support added last year. (Especially since I started compressing mbox files several years ago and just now noticed that mairix doesn't support them when I went looking for an email I knew was there and mairix couldn't find it.)

If there are tasks that an unprivileged volunteer could do to hasten a new release, I'd be willing to help out.

Tagging a release

hi kim. could you tag a release so that matrix can be packaged easily?
Something like

git tag -a v0.24 -m "kim's version"

thanks

Index "BCC" headers

I think "BCC" email headers should be explicitly indexed. I can't think of any major email service providers or user agents that don't allow users to BCC recipients. I propose using "B:" as the search pattern prefix and modifying the "a:" so that it also implies "B:".

Segfault in make_nvp

The following minimal example will trigger a segfault during indexing:

Content-Type: application/pdf; name*=UTF-8''filename.pdf;

foo

Backtrace:

Program received signal SIGSEGV, Segmentation fault.
0x000055555556b085 in make_nvp (src=src@entry=0x555555778ac0 <result>, 
    s=0x55555577f43d " application/pdf; name*=UTF-8''filename.pdf;", 
    s@entry=0x55555577f430 "Content-Type: application/pdf; name*=UTF-8''filename.pdf;", pfx=pfx@entry=0x555555570a1c "content-type:") at nvp.c:279

#0  0x000055555556b085 in make_nvp (src=src@entry=0x555555778ac0 <result>, 
    s=0x55555577f43d " application/pdf; name*=UTF-8''filename.pdf;", 
    s@entry=0x55555577f430 "Content-Type: application/pdf; name*=UTF-8''filename.pdf;", pfx=pfx@entry=0x555555570a1c "content-type:") at nvp.c:279
        current_state = 2
        tok = <optimized out>
        q = 0x55555577f469 ""
        tempsrc = 0x0
        tempdst = 0x0
        qq = <optimized out>
        name = 0x0
        minor = 0x55555577f520 "UTF-8"
        value = 0x0
        copy_start = 0x55555577f468 ";"
        last_action = GOT_NAMEVALUE_CSET
        current_action = <optimized out>
        last_copier = COPY_NOWHERE
        result = <optimized out>
        pfxlen = <optimized out>
#1  0x000055555555d114 in data_to_rfc822 (
    src=src@entry=0x555555778ac0 <result>, 
    data=0x7fcadd6b5000 "Content-Type: application/pdf; name*=UTF-8''filename.pdf;\n\nfoo\n\n", length=64, error=error@entry=0x0) at rfc822.c:1031
        body_start = 0x7fcadd6b503b "foo\n\n"
        header = {next = 0x55555577f480, prev = 0x55555577f480, 
          text = 0x3499309 <error: Cannot access memory at address 0x3499309>}
        x = 0x55555577f480
        nx = <optimized out>
        ct_nvp = 0x0
        cte_nvp = 0x0
        cd_nvp = 0x0
        nvp = <optimized out>
        body_len = <optimized out>
#2  0x000055555555ead4 in make_rfc822 (
    filename=filename@entry=0x55555577f410 "/tmp/bug.mh/1") at rfc822.c:1435
        len = 64
        data = 0x7fcadd6b5000 "Content-Type: application/pdf; name*=UTF-8''filename.pdf;\n\nfoo\n\n"
        result = 0x0
#3  0x000055555555b9e3 in scan_new_messages (imapc=<optimized out>, 
    start_at=<optimized out>, db=<optimized out>) at db.c:755
        msg = 0x0
        len = <optimized out>
        i = 0
#4  update_database (db=0x55555577b6a0, sorted_paths=<optimized out>, 
    n_msgs=<optimized out>, do_fast_index=<optimized out>, 
    imapc=<optimized out>) at db.c:1085
        matched_index = <optimized out>
        i = <optimized out>
        any_new = 1
        n_newly_pruned = <optimized out>
        n_already_dead = <optimized out>
        __PRETTY_FUNCTION__ = "update_database"
#5  0x00005555555574e7 in main (argc=<optimized out>, argv=<optimized out>)

Note that value is 0.

BTW, thank you very much for picking up maintainership! mairix is such a useful tool, and still makes the smallest indexes I know of.

"Out of memory" error on broken mailbox

I'm getting an out of memory error on a mailbox that mutt messed up. I searched for an hour to identify the message
that causes the bug and then I condensed it to the attached file test-mailbox that triggers the error below (unzip the file of course).

This is on a fresh compilation from github HEAD.

$ rm /tmp/mairix.database.*; mairix -v -F -p -f /tmp/.mairixrc 2>&1 
mairix DEVELOPMENT, Copyright (C) 2002-2010 Richard P. Curnow
mairix comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions; see the GNU General Public License for details.

Finding all currently existing messages...
Reading existing database...
Checking message path integrity
Checking to
Checking cc
Checking from
Checking subject
Checking body
Checking attachment_name
Loaded 1 existing messages
Scanning mbox /tmp/test-mailbox : 100% done
1 newly dead messages, 1 messages now dead in total
Out of memory (at rfc822.c:465, -203 bytes)

test-mailbox.zip

Path searching is not documented in mairix.1

I discovered mairix supports searching for paths using "p:..." while looking at the source code. Although it appears in "mairix --help", it isn't documented in the manual, and I've been using mairix for years occasionally bemoaning the inability to refine my search using filenames when the feature has apparently existed for a very long time.

Assertion in db.c fails, leading to abort trap

Hi

 $ mairix -v -p

mairix DEVELOPMENT, Copyright (C) 2002-2010 Richard P. Curnow
mairix comes with ABSOLUTELY NO WARRANTY.
This is free software, and you are welcome to redistribute it
under certain conditions; see the GNU General Public License for details.

Finding all currently existing messages...
Reading existing database...
Assertion failed: (nt->match1.highest < n_msgs), function import_toktable2, file db.c, line 412.
Abort trap: 6

The db in question is approx. 24M, if that's relevant; here's what it looks like after a fresh indexing run:

Wrote 32603 messages (652060 bytes of tables, 2899041 bytes of text)
Wrote 0 mbox headers (0 bytes of tables, 0 bytes of paths)
Wrote 0 bytes of mbox message checksums
To: Wrote 11795 tokens (94360 bytes of tables, 170867 bytes of text, 259613 bytes of hit encoding)
Cc: Wrote 3563 tokens (28504 bytes of tables, 48724 bytes of text, 41836 bytes of hit encoding)
From: Wrote 7844 tokens (62752 bytes of tables, 106980 bytes of text, 225559 bytes of hit encoding)
Subject: Wrote 11748 tokens (93984 bytes of tables, 88691 bytes of text, 206157 bytes of hit encoding)
Body: Wrote 579794 tokens (4638352 bytes of tables, 5200170 bytes of text, 7717808 bytes of hit encoding)
Attachment Name: Wrote 5522 tokens (44176 bytes of tables, 129209 bytes of text, 23229 bytes of hit encoding)
(Threading): Wrote 34405 tokens (275240 bytes of tables, 1705835 bytes of text, 301234 bytes of hit encoding)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.