SUNScholar/Media Filters/Text Extraction

From Libopedia
Jump to navigation Jump to search
Back to Media Filters

Step 1

Check the following settings in the "dspace.cfg" file:

#Custom settings for PDFFilter
# If true, all PDF extractions are written to temp files as they are indexed...this
# is slower, but helps ensure that PDFBox software DSpace uses doesn't eat up
# all your memory
#pdffilter.largepdfs = true
# If true, PDFs which still result in an Out of Memory error from PDFBox
# are skipped over...these problematic PDFs will never be indexed until
# memory usage can be decreased in the PDFBox software
pdffilter.skiponmemoryexception = true

Step 2

Enable daily media filter jobs. See link below.