Coder Social home page Coder Social logo

Comments (11)

Sicos1977 avatar Sicos1977 commented on June 12, 2024

I saw that you fixed it yourself 👍 ... can you tell me what the problem was?

Greetings,
Kees van Spelde

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 12, 2024

I published a new version (1.5.3) to nuget.

from ifiltertextreader.

andreas-eriksson avatar andreas-eriksson commented on June 12, 2024

I'm not sure why the error occurs. Some suggestions seem to indicate that installed filters could be corrupt but it happens on my test machine as well.

I am hoping that the fix will make the code work for a few more legacy formats.

Thanks :)

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 12, 2024

Is it possible to send me the old xls file so that I can investigate it some more? If so then send it to [email protected]

Also if you want to do really advanced things with extracting data from files then have a look at Tika (https://tika.apache.org/). There is also a .NET port that is generated with IKVM (https://github.com/KevM/tikaondotnet).... it's not that iFilters aren't any good but there is a wider support for files in Tika. I have to do everything myself for the iFilters and there is an Apache team behind Tika with more developers. It's just a time management problem :-)

from ifiltertextreader.

andreas-eriksson avatar andreas-eriksson commented on June 12, 2024

Mail sent.

Thanks for the info, I will definitely investigate Tika.

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 12, 2024

Also just to to satisfy my own curiosity... for what are you using my library?

from ifiltertextreader.

andreas-eriksson avatar andreas-eriksson commented on June 12, 2024

It's used to extract text from documents and then making them searchable with Lucene.

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 12, 2024

Also another thing, you also can use the Java Tika version. It has a web interface that can be called from .NET. It's just what you prefer. I myself prefer .NET above Java.

from ifiltertextreader.

andreas-eriksson avatar andreas-eriksson commented on June 12, 2024

Me too.

Tika sure looks interesting, especially since it doesn't seem to have any other dependencies. Would be nice if users didn't have to install Office.

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 12, 2024

You also don't have to install office for my iFilter library. There is a iFilter package for it. You can find it overhere --> https://www.microsoft.com/en-us/download/details.aspx?id=17062

from ifiltertextreader.

Sicos1977 avatar Sicos1977 commented on June 12, 2024

I also made an MSGReader library to extract information from MSG files. It has no Ifilter support since that is kind of difficult to make in .NET. But with some coding you probably can make it work. You can find it overhere --> https://github.com/Sicos1977/MSGReader. Other "extracting" libraries can be found overhere --> https://github.com/Sicos1977/OfficeExtractor and https://github.com/Sicos1977/VCardReader.

Office extractor extract embedded OLE objects from office files... like an Excel attachment inside a Word document.

from ifiltertextreader.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.