Coder Social home page Coder Social logo

mimemagic's People

Contributors

aliismayilov avatar boutil avatar bronson avatar coldnebo avatar crisfole avatar epitron avatar fedot avatar gbh avatar gsar avatar haines avatar iangreenleaf avatar indiebrain avatar janko avatar jaredbeck avatar jcoyne avatar jellybob avatar jordan-thoms avatar junaruga avatar kachick avatar mathieumahe avatar minad avatar nicklamuro avatar olleolleolle avatar pocke avatar robcherry avatar rosa avatar scpike avatar viraptor avatar wbond avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mimemagic's Issues

CSV files starting with MZ are detected as x-ms-dos-executable

Steps to reproduce:

 echo 'MZABC' > /tmp/csv_detection_test.csv

Then in (rails) console:

MimeMagic.by_magic(File.open('/tmp/csv_detection_test.csv'))
# Or for rails-5 applications
Marcel::MimeType.for Pathname.new("/tmp/csv_detection_test.csv")

Result is x-ms-dos-executable

xlsx support

I'm using carrierwave-aws and mimemagic latest version, now when upload a xlsx file in s3 everything is OK and file will be uploaded to s3, but when i try to download it, a zip file startdownloading, as i know the content type of file may be wrong,
now could you please tell me how can i set correct mime type for xlsx file while uploading using CarrierWave::Uploader::Base config

MimeMagic by_magic fall on ruby 1.8.7

There is result of test:

Using ruby-1.8.7-p352
~/tmp/mimemagic$ rake
bacon -q -Ilib:test test/mimemagic_test.rb
.......F..
Bacon::Error: nil.==("application/zip") failed
./test/mimemagic_test.rb:55: MimeMagic - should recognize by magic
./test/mimemagic_test.rb:53:in `each'
./test/mimemagic_test.rb:53
./test/mimemagic_test.rb:52
./test/mimemagic_test.rb:4

10 tests, 38 assertions, 1 failures, 0 errors
rake aborted!
Command failed with status (1): [bacon -q -Ilib:test test/mimemagic_test.rb...]

Tasks: TOP => default => test
(See full trace by running task with --trace)

OS: Ubuntu 12.04 and Debian

Any suggestion?
P.S. Test gone well on ruby 1.9.2 and 1.9.3

PDF magic in 0.1.8 is too loose

I'm stuck on mimemagic 0.1.7 because of the new PDF magic.

This change is the problem:

-    ['application/pdf', [[0, "%PDF-"]]],
+    ['application/pdf', [[0..1024, "%PDF-"]]],

Turns out LOTS of files have %PDF- in the first 1024 bytes. Here are two:

Is there a way to report this broken magic upstream?

(I originally commented on the commit)

Changelog and semantic versioning

Hello,

The paperclip gem now has a dependence on mimemagic, so it would be appreciated if the project can adopt semantic versioning and a changelog. This will help the maintainers of paperclip make the dependence less restrictive (currently =0.3.0) so that developers can better keep their project gems updated. I'm not affiliated with the paperclip gem, just bringing it to your attention.

Thank you

XML files are identified as binary

Files with these header

<?xml version="1.0" encoding="ISO-8859-1"?>

are identified as binary and not as XML file.

I am using version 0.3.0.

.xls file detected as application/x-ole-storage

I'm having an issue with an .xls file being detected as application/x-ole-storage when using by_magic:

> file -I test.xls
test.xls: application/vnd.ms-excel; charset=binary
> MimeMagic.by_magic(File.open("test.xls"))
=> #<MimeMagic:0x007fb76df72ea8 @type="application/x-ole-storage", @mediatype="application", @subtype="x-ole-storage">

> MimeMagic.by_path("test.xls")
=> #<MimeMagic:0x007fb76df98c98 @type="application/vnd.ms-excel", @mediatype="application", @subtype="vnd.ms-excel">

Is this a bug or how can I get MimeMagic to recognize the file as application/vnd.ms-excel ?

Filetypes with same extension not recognized

Is it somehow possible to handle different filetypes with the same extension but different magic?
We've this case in our application and we'd like to extend mimemagic to support this.
Before implementing I'd like to know if you already did some investigation or already have some implementation ideas on this?

RTF files comment is blank

MimeMagic.by_magic(File.open('./test.rtf', 'r')) returns 'application/rtf' but it should also accept 'text/rtf', which, for example, is used by filestack...

the problem is that MimeMagic.new("text/rtf").comment returns "" which can cause some trouble.

I don't know if it's supposed to be this way so feel free to close the bug...

PS: I think this is related to #69 (which I also can reproduce)

Can't identify Photoshop PSD files with `by_magic`

I have a Photoshop PSD file and MimeMagic's by_magic method doesn't identify it.

pry(main)> require 'mimemagic'
true
pry(main)> f = File.open 'LBT-73-15-046_cuff_rings_blue.jpg.psd'
#<File:LBT-73-15-046_cuff_rings_blue.jpg.psd>
-rw-r--r--@ 1 andy  andy  85185  4 Aug 10:21 LBT-73-15-046_cuff_rings_blue.jpg.psd
pry(main)> MimeMagic.by_magic f
nil

A little debugging shows that magic_match_io is returning nil.

However by_extension does identify it:

pry(main)> MimeMagic.by_extension 'psd'
#<MimeMagic:0x007ff6242eaaf0 @type="image/vnd.adobe.photoshop", @mediatype="image", @subtype="vnd.adobe.photoshop">

I'm not familiar with how this gem is supposed to be used so please excuse any silly questions:

  • would you expect by_magic to identify the file here?
  • if not, would you expect callers to chain the various identification methods, e.g:
type = MimeMagic.by_magic(f) || MimeMagic.by_extension(f)

Many thanks in advance.

ZIP file detected as PDF

The attached ZIP file with 2 PDF files inside is recognised by MimeMagic 0.3.2 as application/pdf, while file -I recognises it correctly as application/zip.

Inspecting the ZIP file contents, it can be seen that the PK\003\004 token is present at the start, while %PDF- can be found in the contents.

So, according to the freedesktop.org.xml file, two rules would be applicable:

PDF:

<mime-type type="application/pdf">
    <_comment>PDF document</_comment>
    <acronym>PDF</acronym>
    <expanded-acronym>Portable Document Format</expanded-acronym>
    <generic-icon name="x-office-document"/>
    <magic priority="50">
      <match type="string" value="%PDF-" offset="0:1024"/>
    </magic>
    <glob pattern="*.pdf"/>
    <alias type="application/x-pdf"/>
    <alias type="image/pdf"/>
    <alias type="application/acrobat"/>
</mime-type>

ZIP:

<mime-type type="application/zip">
    <_comment>Zip archive</_comment>
    <alias type="application/x-zip-compressed"/>
    <alias type="application/x-zip"/>
    <generic-icon name="package-x-generic"/>
    <magic priority="40">
      <match type="string" value="PK\003\004" offset="0"/>
    </magic>
    <glob pattern="*.zip"/>
</mime-type>

The ZIP rule has priority 40, so I'm inferring – and correct me if I'm wrong – that it should have higher priority than the PDF rule (priority 50).

Looking at the generated table, though, the rules have inverse order:

This would explain gem's behaviour. Is my reasoning correct? Might this be a bug in the tables generation script?

sample.zip

freedesktop.org.xml file license

I've historically been the maintainer of shared-mime-info for around 15 years, and script/freedesktop.org.xml looks like it's a copy of the database shipped with shared-mime-info, which is released under the GPL, with shared-mime-info's translators work merged in, and the GPL header removed.
The license that you're shipping mimemagic under (MIT) isn't compatible with shared-mime-info's.
There are a number of possibilities to fix this problem:

  • change the mimemagic license to be GPL compatible
  • parse the XML file that shared-mime-info ships at runtime, and don't ship it in a codebase with an incompatible license

Using a GPL file as a source makes your whole codebase a derived work, making it all GPL, so I think it's pretty important that this problem gets corrected before somebody uses it in a pure MIT codebase, or a closed-source application.

You will also need to re-add the GPL header to the shared-mime-info XML file as a matter of urgency. It was stripped in release tarballs by the tool used to merge translations, but is visible in the .in version of the same file.

Wrong mime for some files

  • test.ppt : application/x-ole-storage this must be a application/vnd.ms-powerpoint
  • test.pps : application/x-ole-storage this must be a application/vnd.ms-powerpoint
  • test.ppsx : application/vnd.openxmlformats-officedocument.presentationml.presentation this must be a application/vnd.openxmlformats-officedocument.presentationml.slideshow
  • test.xls : application/x-ole-storage this must be a application/vnd.ms-excel

Someone can explain me how to write a custom mime type to add? Especially i don't understand what does means the first part (eg. [0, "PK\003\004"...).

Misidentifies jpeg as "audio/vnd.dts.hd"

I have 10 photos.
file and identify (imagemagick) think all are jpegs, however this gem classifies one of them incorrectly.

I used the "magic" found here:
https://github.com/minad/mimemagic/blob/v0.3.0/lib/mimemagic/tables.rb#L1487
to verify:

$ for i in *.jpg; { echo $i; file $i; identify $i; head -c 18725 $i | grep -Eo 'dX %'; }                                                                                                                        
1470445_01.jpg
1470445_01.jpg: JPEG image data, JFIF standard 1.02
1470445_01.jpg JPEG 800x600 800x600+0+0 8-bit DirectClass 85.6KB 0.000u 0:00.000
1470445_02.jpg
1470445_02.jpg: JPEG image data, JFIF standard 1.02
1470445_02.jpg JPEG 800x600 800x600+0+0 8-bit DirectClass 74.6KB 0.000u 0:00.000
1470445_03.jpg
1470445_03.jpg: JPEG image data, JFIF standard 1.02
1470445_03.jpg JPEG 800x600 800x600+0+0 8-bit DirectClass 73.4KB 0.000u 0:00.000
1470445_04.jpg
1470445_04.jpg: JPEG image data, JFIF standard 1.02
1470445_04.jpg JPEG 800x600 800x600+0+0 8-bit DirectClass 69.6KB 0.000u 0:00.000
1470445_05.jpg
1470445_05.jpg: JPEG image data, JFIF standard 1.02
1470445_05.jpg JPEG 800x600 800x600+0+0 8-bit DirectClass 82.5KB 0.000u 0:00.000
Binary file (standard input) matches
1470445_06.jpg
1470445_06.jpg: JPEG image data, JFIF standard 1.02
1470445_06.jpg JPEG 800x600 800x600+0+0 8-bit DirectClass 68.4KB 0.000u 0:00.000
1470445_07.jpg
1470445_07.jpg: JPEG image data, JFIF standard 1.02
1470445_07.jpg JPEG 800x600 800x600+0+0 8-bit DirectClass 49.6KB 0.000u 0:00.000
1470445_08.jpg
1470445_08.jpg: JPEG image data, JFIF standard 1.02
1470445_08.jpg JPEG 800x600 800x600+0+0 8-bit DirectClass 30KB 0.000u 0:00.000
1470445_09.jpg
1470445_09.jpg: JPEG image data, JFIF standard 1.02
1470445_09.jpg JPEG 800x600 800x600+0+0 8-bit DirectClass 35.9KB 0.000u 0:00.000
1470445_10.jpg
1470445_10.jpg: JPEG image data, JFIF standard 1.02
1470445_10.jpg JPEG 800x600 800x600+0+0 8-bit DirectClass 55.2KB 0.000u 0:00.000

Release a new version

@minad would you mind releasing a new version? I'd like to make a PR against the paperclip library to use your gem.

doesn't sense all tarfiles?

MimeMagic.by_magic isn't able to sense either file on this page as a tarfile: http://www.vim.org/scripts/script.php?script_id=1729

However, both tar and the Unix file utility like them.

$ wget http://www.vim.org/scripts/download_script.php?src_id=6516 -O tt.tar
$ file tt.tar      => tt.tar: tar archive

in irb:

require 'open-uri'
require 'mimemagic'
x = open('http://www.vim.org/scripts/download_script.php?src_id=6516')
MimeMagic.by_magic(x)    => nil
x = open('http://www.vim.org/scripts/download_script.php?src_id=2877')
MimeMagic.by_magic(x)    => application/x-tar 

(the last 2 are just to prove that mimemagic does sense other tarfiles)

Just fyi, mimemagic is serving me great.

Yanking previous versions on Rubygems

Hi @minad, Thank you for maintaining mimemagic!

I have seen builds suddenly failing because mimemagic versions prior to 0.3.5 have been removed from rubygems.org.

Is there a security issue that we should be aware of that required these yanks?

Add support for BMP

Hello,

MimeMagic.by_magic(bmp_file) respond me invalid/invalid content_type
when file --mime-type -b bmp_file respond me image/bmp

I've found a workaround by doing this:

[ [ 'image/bmp',
    { magic: [[0, "BM", [[0..2, 'BM']]]], extensions: %w[bmp], parents: %w[], comment: 'BMP' } ]
].each do |magic|
  MimeMagic.add(*magic)
end

(if someone can check the validity of this for me)

If you (little github user) encounter content_type invalid/invalid issues, I've found the solution on wikipedia:
https://en.wikipedia.org/wiki/List_of_file_signatures

MimeMagic objects can't group_by or be used as hash keys

MimeMagic objects that represent the same type return different .hash values, and aren't equal using .eql?.

This means you can't use them as hash keys, and hence can't group_by them.

This simple monkeypatch fixes it:

class MimeMagic
  def hash; type.hash; end
  def eql?(other); type == other.type; end
end

Yanked gem versions break rails install

I noticed that all but the most recent version of this gem got yanked from rubygems fairly recently.

activestorage (part of rails) depends on mimemagic ~> 0.3.2 so I think this could break a lot of peoples deployments!

Publish gem changes to rubygems

Hi @minad,

Could you please release a new version? I am using paperclip and I happened to stumble upon a png that contained "xl" in its first 2K characters...thus mistaking it for "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" instead of "image/png". New rules are fine from my usage.

Thanks

Doesn't recogonize Apple .pages and some .docx

Hi there, hen I submit an Apple .pages file or certain .docx files, it comes back as application/zip despite including mimemagic/overlay, any ideas why? Attached are the files in question.
unknown_files.zip

edit: also some extra context, the docx file was created by opening up a Google doc and selecting 'download as docx'.

Text files not recognised by magic

Somehow plaintext files are not recognised:

irb>> MimeMagic::VERSION
=> "0.3.2"
irb>> file = '/tmp/foo'
=> "/tmp/foo"
irb>> File.write(file, 'This is a text file')
=> 19
irb>> MimeMagic.by_magic(file)
=> nil

Add Outlook .msg files

mime type: application/vnd.ms-outlook
extension: .msg
possible magic bytes: D0 CF 11 E0 A1 B1 1A E1

MP4 audio file is incorrectly recognized as video/mp4

I have an MP4 file which contains an audio stream only, but is wrongly recognized as video/mp4.

MimeMagic.by_magic(File.open('spec/fixtures/files/audio.mp4'))
=> #<MimeMagic:0x00007f96de9317d8 @type="video/mp4", @mediatype="video", @subtype="mp4">

The file command fails, too:

$ file -I spec/fixtures/files/audio.mp4
spec/fixtures/files/audio.mp4: video/mp4; charset=binary

But ffprobe identifies it correctly as audio file:

$ ffprobe spec/fixtures/files/audio.mp4
ffprobe version 4.3.1 Copyright (c) 2007-2020 the FFmpeg developers
  built with Apple clang version 12.0.0 (clang-1200.0.32.28)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.1_9 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'spec/fixtures/files/audio.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    creation_time   : 2021-01-09T17:02:28.000000Z
    com.android.version: 8.1.0
  Duration: 00:03:59.76, start: 0.000000, bitrate: 165 kb/s
    Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 163 kb/s (default)
    Metadata:
      creation_time   : 2021-01-09T17:02:28.000000Z
      handler_name    : SoundHandle

What is going wrong here? Can MimeMagic be fixed to recognize this file correctly?

Release a new version with PR #39

XLSX files saved by Google Spreadsheet and OpenOffice/LibreOffice are being recognized as application/zip na versão 0.3.3

Checking the PRs and the problems I found the PR #39 that fixed (or tried) this problem and is already merged in master but yet not released.
Please generate a new version with this fix.

Docx recognition

Docx is recognized as application/zip. I think this may be correct due to file structure, but shouldn't it return application/vnd.openxmlformats-officedocument.wordprocessingml.document?

Changing bacon to rspec

Right now this project is using the small RSpec clone bacon.
But the problem is that bacon project stopped the development [1].
I wanted to fix a issue for that in a past time. But I could not.

So, I think changing this project's testing framework from bacon to rspec would be better.
If you like it, I can contribute for that.

[1] leahneukirchen/bacon#32

Wrong Website URL for the project.

Thanks for this useful library!
The website URL at the top of this repository seems to be incorrect.

The current settings are as follows:

Mime type detection in ruby via file extension or file content 
https://rdoc.info/github/minad/mimemagic/frames/file/README.md

After clicking on the above website, I saw a 404 error page. ("We Couldn't Find That Page")

ASIS: https://rdoc.info/github/minad/mimemagic/frames/file/README.md
Maybe: https://rdoc.info/github/minad/mimemagic

I hope this will be of some help.

Zip files with pdfs unexpectedly returning application/pdf instead of application/zip

Explanation of the problem

  • I have a .zip file which contains .pdf files and .dwg files.

Expected Behaviour

  • MimeMagic.by_magic(File.read(path_to_zip_file)).to_s should return application/zip.

Actual Behaviour

  • MimeMagic.by_magic(File.read(path_to_zip_file)).to_s returns application/pdf. This is completely unexpected. It should return zip, instead of pdf.

Reproducible test

Any pointers in the right direction would be much appreciate. I'd be happy to make a PR.

chrs

Ben

test/files/images.* considered non free

Hi,

The files test/files/images.* used in test comes from https://commons.wikimedia.org/wiki/File:Phalaenopsis_%28aka%29.jpg which are licensed under the CC-BY-SA-2.5. This license is considered non free by the Debian project, and because of that, your project cannot be packaged and distributed as is in this GNU/Linux distribution (and others).

Since the content of the image in itself does not seem to play a role, would you consider using instead an image licensed under CC-BY-SA-3.0 or CC-BY-SA-4.0, which is considered free?

Thanks in advance!

Cédric

Adobe Photoshop files are not detected correctly by magic

It seems PSD files are not detected correctly.

Minimal reproduction steps:

  1. Unzip attached archive
  2. Run the following code
MimeMagic.by_magic(File.open('arrows.psd')) # => nil

Expected:
The code above should return image/vnd.adobe.photoshop'

Actual:
The code above returns nil.

Tested with mimemagic-0.3.2, macOS Sierra.

arrows.zip

MimeMagic.by_magic returns nil for Javascript file

Hi,

I was trying to detect the type for a simple js file and this is what I'm getting:

2.1.2 :028 > @filename = "/home/deployer/ajax.js"
 => "/home/deployer/ajax.js"
2.1.2 :029 > MimeMagic.by_magic(File.open(@filename)).try(:type)
 => nil

Here is more information about the file:

[deployer@production ~]$ ls -la ajax.js
-rw-rw-r--. 1 deployer deployer 316 Dec 22 14:35 ajax.js
[deployer@production ~]$ pwd
/home/deployer
[deployer@production ~]$ cat ajax.js
jQuery(document).ready(function(){

  $("#cart-form form").bind('ajax:success', function(data, response, status, xhr) {
    console.log(xhr.responseText);
    console.log("Great Success!");
  });

  $("#cart-form form").bind('ajax:error', function(data, response, status) {
    console.log("Problem! ");
  });

});

I expected it to return something like "application/javascript", "application/x-javascript" or "text/plain"

This is what I get if I use file:

[deployer@production ~]$ file -b --mime-type ajax.js
text/plain

It seems to be working fine for other extensions:

2.1.2 :024 > @filename = "/home/deployer/prune-var-backups.sh"
 => "/home/deployer/prune-var-backups.sh"
2.1.2 :025 > MimeMagic.by_magic(File.open(@filename)).try(:type)
 => "application/x-shellscript"
2.1.2 :026 > @filename = "/home/deployer/cerdo.jpg"
 => "/home/deployer/cerdo.jpg"
2.1.2 :027 > MimeMagic.by_magic(File.open(@filename)).try(:type)
 => "image/jpeg"

Do you know what could be wrong?

Yanked 0.3.x breaks Rails install

Hey @minad - I see you've addressed a license issue today by yanking all gem versions prior to 0.4.0. Trouble is, Rails itself depends on 0.3.x so this is breaking all CI installs of Rails for me (and probably others too)!

I know this is on Rails to fix and I'll make sure the issue is raised over there too but is there any chance of releasing a version 0.3.x with the correct license while we wait? No worries if not.

Word doc detection limited to 5kb

Hi there,

Looks like the overlay setup only looks in the first 5kb for the signature strings, whereas a sample I have has [Content_Types].xml in the 30,000s.

Is there a particular reason for choosing 5000?

Doesn't detect XLSX when generated by OpenOffice/Googledocs

I just used this gem to detect files by content, and it works perfectly.

The only thing I detected is, that the Overlay features for xlsx, docx is only working for files generated by Microsoft Office. When they are exported by Google Docs or OpenOffice, they are not detected properly.

# Stunden.xlsx is exported from google docs
MimeMagic.by_magic(File.open('Stunden.xlsx'))                                                              
 => #<MimeMagic:0x00000003401ea8 @type="application/zip", @mediatype="application", @subtype="zip"> 

# ex03_Grade Report.xlsx saved by Microsoft Excel 
MimeMagic.by_magic(File.open('ex03_Grade Report.xlsx'))                                                      
 => #<MimeMagic:0x000000033ea230 @type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", @mediatype="application", @subtype="vnd.openxmlformats-officedocument.spreadsheetml.sheet">

Misdetection of CSV file mime-type

I'm using paperclip gem which uses mimemagic gem for mime-type detection. I have the problem with detection of mime-type of CSV file type. The content of the file is like this

candidate_id,email
1,[email protected]

The mime-type of a file with such content is misdetected with image/x-quicktime because of this rule https://github.com/minad/mimemagic/blob/master/lib/mimemagic/tables.rb#L1717

As we can see the rule defines to return image/x-quicktime if the file has idat pattern on the 4th position of the file. Can we fix this somehow?

For now as a temporary workaround I have just removed this type in initializer file

MimeMagic.remove('image/x-quicktime')

by_magic identifies a PDF as spreadsheetml

We've found a few pdfs we generated get miss identified as application/vnd.openxmlformats-officedocument.spreadsheetml.sheet by MimeMagic.by_magic.

It seems to be because of the pattern in overlay.rb that checks if xl/ appears in the first 5000 bytes.

I can't attach the whole pdf as it contains private information but this is the first 5000 bytes head.pdf

perhaps it should prefer explicit file starting blocks before moving to the guessing?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.