Conversation
|
Thank you for opening this. I'm not sure that it fits within the current design goals of Tika, but there may be ways forward. I deeply respect sleuthkit and would be interested in pursuing whatever we can to work together. I also may misunderstand your use case and this PR. Please bear with me. My major concern is that Tika is intended to process individual files one at a time. Even with a single large docx or PDF, Tika can go out of memory. If we treat an entire filesystem as a file (obv with embedded files), I think we're aiming for serious problems. There are two ways I could see some kind of integration point with Tika.
|
|
The above is all high-level. At a lower level, I'm concerned about platform dependent binary code in Tika. We definitely have it in |
Thanks for your contribution to Apache Tika! Your help is appreciated!
Before opening the pull request, please verify that
TIKA-XXXX)[TIKA-XXXX] Issue or pull request title)mvn clean testmainbranch. If there are conflicts, please try to rebase the pull request branch on top of a freshly pulledmainbranchtika-bom/pom.xml.We will be able to faster integrate your pull request if these conditions are met. If you have any questions how to fix your problem or about using Tika in general, please sign up for the Tika mailing list. Thanks!