SUNScholar/Media Filters/Text Extraction

Back to Media Filters

Step 1

Check the following settings in the "dspace.cfg" file:

#Custom settings for PDFFilter
# If true, all PDF extractions are written to temp files as they are indexed...this
# is slower, but helps ensure that PDFBox software DSpace uses doesn't eat up
# all your memory
#pdffilter.largepdfs = true
# If true, PDFs which still result in an Out of Memory error from PDFBox
# are skipped over...these problematic PDFs will never be indexed until
# memory usage can be decreased in the PDFBox software
pdffilter.skiponmemoryexception = true

Step 2

Enable daily media filter jobs. See link below.

http://wiki.lib.sun.ac.za/index.php/SUNScholar/Daily_Admin

News

http://onetransistor.blogspot.co.za/2015/12/ocr-searchable-pdf-linux.html

SUNScholar/Media Filters/Text Extraction

Step 1

Step 2

News

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools