SUNScholar/Media Filters/Text Extraction
Jump to navigation
Jump to search
Back to Media Filters
Step 1
Check the following settings in the "dspace.cfg" file:
#Custom settings for PDFFilter # If true, all PDF extractions are written to temp files as they are indexed...this # is slower, but helps ensure that PDFBox software DSpace uses doesn't eat up # all your memory #pdffilter.largepdfs = true # If true, PDFs which still result in an Out of Memory error from PDFBox # are skipped over...these problematic PDFs will never be indexed until # memory usage can be decreased in the PDFBox software pdffilter.skiponmemoryexception = true
Step 2
Enable daily media filter jobs. See link below.
http://wiki.lib.sun.ac.za/index.php/SUNScholar/Daily_Admin