Comments (11)
I saw that you fixed it yourself 👍 ... can you tell me what the problem was?
Greetings,
Kees van Spelde
from ifiltertextreader.
I published a new version (1.5.3) to nuget.
from ifiltertextreader.
I'm not sure why the error occurs. Some suggestions seem to indicate that installed filters could be corrupt but it happens on my test machine as well.
I am hoping that the fix will make the code work for a few more legacy formats.
Thanks :)
from ifiltertextreader.
Is it possible to send me the old xls file so that I can investigate it some more? If so then send it to [email protected]
Also if you want to do really advanced things with extracting data from files then have a look at Tika (https://tika.apache.org/). There is also a .NET port that is generated with IKVM (https://github.com/KevM/tikaondotnet).... it's not that iFilters aren't any good but there is a wider support for files in Tika. I have to do everything myself for the iFilters and there is an Apache team behind Tika with more developers. It's just a time management problem :-)
from ifiltertextreader.
Mail sent.
Thanks for the info, I will definitely investigate Tika.
from ifiltertextreader.
Also just to to satisfy my own curiosity... for what are you using my library?
from ifiltertextreader.
It's used to extract text from documents and then making them searchable with Lucene.
from ifiltertextreader.
Also another thing, you also can use the Java Tika version. It has a web interface that can be called from .NET. It's just what you prefer. I myself prefer .NET above Java.
from ifiltertextreader.
Me too.
Tika sure looks interesting, especially since it doesn't seem to have any other dependencies. Would be nice if users didn't have to install Office.
from ifiltertextreader.
You also don't have to install office for my iFilter library. There is a iFilter package for it. You can find it overhere --> https://www.microsoft.com/en-us/download/details.aspx?id=17062
from ifiltertextreader.
I also made an MSGReader library to extract information from MSG files. It has no Ifilter support since that is kind of difficult to make in .NET. But with some coding you probably can make it work. You can find it overhere --> https://github.com/Sicos1977/MSGReader. Other "extracting" libraries can be found overhere --> https://github.com/Sicos1977/OfficeExtractor and https://github.com/Sicos1977/VCardReader.
Office extractor extract embedded OLE objects from office files... like an Excel attachment inside a Word document.
from ifiltertextreader.
Related Issues (20)
- Text extraction hangs when reading .odt file HOT 4
- Index out of bounds reading a pdf document HOT 1
- Can't get the PDF filter to load the IPersistStream in FileLoader.cs HOT 4
- Question of requirements: does not contain a method named 'new' HOT 5
- TextReader not recognixing line breaks in .docx File HOT 4
- Keep file formatting HOT 1
- Open File Reader with MemoryStream HOT 3
- Document metadata properties HOT 8
- Exception if property with multiple values exists
- Weird text encoding issue with colons and section symbols HOT 1
- Registry DLL issue after upgrading HOT 1
- System.AccessViolationException HOT 19
- Outdated(?) OffFilter.dll on Windows Server 2012 HOT 2
- OffFilt.dll AccessViolationException HOT 11
- ReadToEnd() causes "Destination Array Not Long Enough" for legacy Word files HOT 1
- Missing filter return code? HOT 7
- Version 1.7+ - System.ExecutionEngineException and System.AccessViolationException HOT 16
- Cannot read text from .xls HOT 6
- License question HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ifiltertextreader.