Difference between revisions of "SUNScholar/Media Filters/5.X"

From Libopedia
Jump to navigation Jump to search
 
(70 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<center>
 
<center>
  '''[[SUNScholar/Media Filters|Back to Media Filters]]'''
+
  '''[[SUNScholar/Media_Filters/Thumbnails|Back to Thumbnails]]'''
 
</center>
 
</center>
  
==<font color="red">'''''PLEASE NOTE''''':</font>==
+
==<font color="red">'''PLEASE NOTE''':</font>==
The media filters have changed by incorporating the use of ImageMagick and Ghostscript. See the link below for details.
+
*The media filters have changed by incorporating the use of ImageMagick and Ghostscript. See the link below for details about enabling media filters.
*https://wiki.duraspace.org/display/DSDOC5x/ImageMagick+Media+Filters
+
https://wiki.duraspace.org/display/DSDOC5x/ImageMagick+Media+Filters
 +
*After a while we noticed our server load increasing radically when doing the nightly media-filter jobs.
 +
*We isolated the problem to the "Branded Preview JPEG" filter.
 +
*This filter has been disabled as these branded previews are not important to us.
  
==Requirements==
+
==Step 1 - Install the Ubuntu software packages==
Check the following and then return.
 
http://wiki.lib.sun.ac.za/index.php/SUNScholar/Install_DSpace/S03#Step_3.2
 
 
 
==Step 1 - Login to the server==
 
http://wiki.lib.sun.ac.za/index.php/SUNScholar/Prepare_Ubuntu/S01
 
 
 
<font color="red">
 
'''Complete ALL of the following as the "dspace" user!'''
 
</font>
 
 
 
==Step 2 - Install the Ubuntu software packages==
 
 
Type the following:
 
Type the following:
  sudo apt-get install xpdf poppler-utils
+
  sudo apt-get install imagemagick ghostscript
  
==Step 3 - Install the java packages==
+
==Step 2 - Configuration==
===Step 3A - Install "jai_imageio.jar"===
+
Edit the ''"dspace.cfg"'' file.
  mkdir $HOME/temp
+
  nano $HOME/{{Source}}/dspace/config/dspace.cfg
 
+
===Enable===
cd $HOME/temp
+
Search for following and change to true:
 
+
  webui.browse.thumbnail.show = true
curl -O http://download.java.net/media/jai-imageio/builds/release/1.1/jai_imageio-1_1-lib-linux-i586.tar.gz
+
  webui.item.thumbnail.show = true
 
+
webui.preview.enabled = true
  tar -xzvf jai_imageio-1_1-lib-linux-i586.tar.gz
 
 
 
<pre>
 
  mvn install:install-file \
 
                    -Dfile=jai_imageio-1_1/lib/jai_imageio.jar  \
 
                    -DgroupId=com.sun.media                    \
 
                    -DartifactId=jai_imageio                    \
 
                    -Dversion=1.0_01                            \
 
                    -Dpackaging=jar                            \
 
                    -DgeneratePom=true
 
</pre>
 
  
===Step 3B - Install "jai_core.jar"===
+
===Dimensions===
mkdir $HOME/temp
+
Check the value for ''thumbnail.maxwidth'' and that it corresponds to the size you want for preview images for the UI.
  
cd $HOME/temp
 
 
wget --no-check-certificate https://m2.duraspace.org/content/repositories/thirdparty/org/fcrepo/jai_core/1.1.2_01/jai_core-1.1.2_01.jar
 
 
<pre>
 
mvn install:install-file \
 
                    -Dfile=jai_core-1.1.2_01.jar  \
 
                    -DgroupId=javax.media                      \
 
                    -DartifactId=jai_core                      \
 
                    -Dversion=1.1.2_01                        \
 
                    -Dpackaging=jar                            \
 
                    -DgeneratePom=true
 
</pre>
 
 
==Step 4 - Configuration==
 
===Step 4A===
 
First, be sure there is a value for ''thumbnail.maxwidth'' and that it corresponds to the size you want for preview images for the UI.
 
 
Edit the ''"dspace.cfg"'' file.
 
nano $HOME/source/config/dspace.cfg
 
 
Search for the following and modify.
 
Search for the following and modify.
 
<pre>
 
<pre>
Line 72: Line 33:
 
</pre>
 
</pre>
  
===Step 4B===
+
===Filters===
Search for "filter.plugins" and replace with the following.
+
Enable filters as follows:
 
<pre>
 
<pre>
filter.plugins = \
+
#Names of the enabled MediaFilter or FormatFilter plugins
        PDF Text Extractor, \
+
filter.plugins = PDF Text Extractor, HTML Text Extractor, Word Text Extractor, \
        PDF Thumbnail, \
+
                PowerPoint Text Extractor, \
        HTML Text Extractor, \
+
                Branded Preview JPEG, \
        Word Text Extractor, \
+
                ImageMagick Image Thumbnail, ImageMagick PDF Thumbnail
        PowerPoint Text Extractor, \  
 
        JPEG Thumbnail, \
 
        Branded Preview JPEG
 
 
</pre>
 
</pre>
  
===Step 4C===
+
===Names===
Change the MediaFilter plugin configuration to remove the old ''"org.dspace.app.mediafilter.PDFFilter"'' and add the new filters ''"org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor"'' and ''"org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail"''. Replace with the following.
+
Assign names for filters as follows:
 
 
 
<pre>
 
<pre>
 +
#Assign 'human-understandable' names to each filter
 
plugin.named.org.dspace.app.mediafilter.FormatFilter = \
 
plugin.named.org.dspace.app.mediafilter.FormatFilter = \
   org.dspace.app.mediafilter.XPDF2Text = PDF Text Extractor, \
+
   org.dspace.app.mediafilter.PDFFilter = PDF Text Extractor, \
  org.dspace.app.mediafilter.XPDF2Thumbnail = PDF Thumbnail, \
 
 
   org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \
 
   org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \
 
   org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \
 
   org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \
 
   org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text Extractor, \
 
   org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text Extractor, \
   org.dspace.app.mediafilter.JPEGFilter = JPEG Thumbnail, \
+
   org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG, \
   org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG
+
  org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter = ImageMagick Image Thumbnail, \
 +
   org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter = ImageMagick PDF Thumbnail
 +
</pre>
 +
===Input Formats===
 +
Assign MIME file types to media filters as follows:
 +
<pre>
 +
#Configure each filter's input format(s)
 +
filter.org.dspace.app.mediafilter.PDFFilter.inputFormats = Adobe PDF
 +
filter.org.dspace.app.mediafilter.HTMLFilter.inputFormats = HTML, Text
 +
filter.org.dspace.app.mediafilter.WordFilter.inputFormats = Microsoft Word
 +
filter.org.dspace.app.mediafilter.PowerPointFilter.inputFormats = Microsoft Powerpoint, Microsoft Powerpoint XML
 +
filter.org.dspace.app.mediafilter.BrandedPreviewJPEGFilter.inputFormats = BMP, GIF, JPEG, image/png
 +
filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats = BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000
 +
filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = Adobe PDF
 
</pre>
 
</pre>
  
===Step 4D===
+
===Permissions===
Then replace ''"filter.org.dspace.app.mediafilter.PDFFilter.inputFormats = Adobe PDF"'' with the following:
+
Configure media filter permissions. Search for "filter.org.dspace.app.mediafilter.publicPermission" and modify as follows:
 
<pre>
 
<pre>
filter.org.dspace.app.mediafilter.XPDF2Thumbnail.inputFormats = Adobe PDF
+
#Publicly accessible thumbnails of restricted content.
filter.org.dspace.app.mediafilter.XPDF2Text.inputFormats = Adobe PDF
+
#List the MediaFilter name's that would get publicly accessible permissions
 +
#Any media filters not listed will instead inherit the permissions of the parent bitstream
 +
filter.org.dspace.app.mediafilter.publicPermission = BrandedPreviewJPEGFilter, ImageMagickImageThumbnailFilter, ImageMagickPdfThumbnailFilter
 
</pre>
 
</pre>
  
===Step 4E===
+
===List Emphasis===
Above the comment, "#Custom settings for PDFFilter", add the following:
+
Search for <tt>'''xmlui.theme.mirage.item-list.emphasis'''</tt>. There are two options available namely "metadata" or "file", select "file".
 +
 
 +
See example below.
 
<pre>
 
<pre>
#The paths to the XPDF utilities
+
### Settings for Item lists in Mirage theme ###
xpdf.path.pdftotext = /usr/bin/pdftotext
+
# What should the emphasis be in the display of item lists?
xpdf.path.pdftoppm  = /usr/bin/pdftoppm
+
# Possible values : 'file', 'metadata'. If your repository is
xpdf.path.pdfinfo  = /usr/bin/pdfinfo
+
# used mainly for scientific papers 'metadata' is probably the
 +
# best way. If you have a lot of images and other files 'file'
 +
# will be the best starting point
 +
# (metdata is the default value if this option is not specified)
 +
xmlui.theme.mirage.item-list.emphasis = file
 
</pre>
 
</pre>
  
==Step 4 - Build and Install==
+
Save the ''"dspace.cfg"'' file and exit nano.
To build, type the following:
 
cd $HOME/source
 
  
mvn -U clean package -Pxpdf-mediafilter-support
+
{{NANO}}
To install, type the following: (Replace XXX with your DSpace version number)
 
cd $HOME/source/dspace/target/dspace-XXX-build
 
  
ant update
+
==Step 4 - [[SUNScholar/Rebuild_DSpace|Rebuild DSpace]]==
  
  ant clean_backups
+
==Step 5 - Test the media filers==
 +
Type the following to test. Select an item that has pdf files attached and use it as replacement for "123456789/29097".
 +
  $HOME/bin/dspace filter-media -v -i 123456789/29097
  
==Step 5 - Update dspace rebuild script==
+
==Step 6 - Create new thumbnails==
If the test build works then add the switch"-Pxpdf-mediafilter-support" to the dspace rebuild script, so that:
+
The script is configured to do 1000 items at a time only. This saves on memory and CPU time. Therefore on a large system you may need to run the script several times. Also make sure that the dspace user has full read/write access to all items in the asset store folders.
mvn -U clean package
 
becomes
 
mvn -U clean package -Pxpdf-mediafilter-support
 
  
See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Rebuild_DSpace
+
  $HOME/bin/dspace filter-media -v -f -m 1000 -p "ImageMagick PDF Thumbnail"
==Step 6 - Test the media filers==
 
Type the following to test. Select an item that has pdf files attached and use it as replacement for "123456789/29097".
 
  $HOME/bin/dspace filter-media -n -v -i 123456789/29097
 
==Step 7 - Create new thumbnails==
 
The script is configured to do 1000 items at a time only. This saves on memory and CPU time. Therefore on a large system you may need to run the script several times. Also make sure that the dspace user has full read/write access to all items in the assetstore folders.
 
  
  $HOME/bin/dspace filter-media -n -v -f -m 1000 -p "PDF Thumbnail"
+
  $HOME/bin/dspace filter-media -v -f -m 1000 -p "ImageMagick Image Thumbnail"
  
==Step 8 - Add a daily admin task==
+
==Step 7 - Add a daily admin task==
 
See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Daily_Admin. Check the '''"filter-media"''' options!
 
See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Daily_Admin. Check the '''"filter-media"''' options!
  
Line 148: Line 116:
 
*https://wiki.duraspace.org/display/DSDOC5x/Mediafilters+for+Transforming+DSpace+Content
 
*https://wiki.duraspace.org/display/DSDOC5x/Mediafilters+for+Transforming+DSpace+Content
 
*https://wiki.duraspace.org/display/DSDOC5x/Configuration+Reference#ConfigurationReference-XPDFFilter
 
*https://wiki.duraspace.org/display/DSDOC5x/Configuration+Reference#ConfigurationReference-XPDFFilter
*https://wiki.duraspace.org/display/DSDOC5x/ImageMagick+Media+Filters
+
[[Category:Customisation]]

Latest revision as of 13:36, 26 August 2016

Back to Thumbnails

PLEASE NOTE:

  • The media filters have changed by incorporating the use of ImageMagick and Ghostscript. See the link below for details about enabling media filters.
https://wiki.duraspace.org/display/DSDOC5x/ImageMagick+Media+Filters
  • After a while we noticed our server load increasing radically when doing the nightly media-filter jobs.
  • We isolated the problem to the "Branded Preview JPEG" filter.
  • This filter has been disabled as these branded previews are not important to us.

Step 1 - Install the Ubuntu software packages

Type the following:

sudo apt-get install imagemagick ghostscript

Step 2 - Configuration

Edit the "dspace.cfg" file.

nano $HOME/source/dspace/config/dspace.cfg

Enable

Search for following and change to true:

webui.browse.thumbnail.show = true
webui.item.thumbnail.show = true
webui.preview.enabled = true

Dimensions

Check the value for thumbnail.maxwidth and that it corresponds to the size you want for preview images for the UI.

Search for the following and modify.

# maximum width and height of generated thumbnails
thumbnail.maxwidth  = 160
thumbnail.maxheight = 160

Filters

Enable filters as follows:

#Names of the enabled MediaFilter or FormatFilter plugins
filter.plugins = PDF Text Extractor, HTML Text Extractor, Word Text Extractor, \
                 PowerPoint Text Extractor, \
                 Branded Preview JPEG, \
                 ImageMagick Image Thumbnail, ImageMagick PDF Thumbnail

Names

Assign names for filters as follows:

#Assign 'human-understandable' names to each filter
plugin.named.org.dspace.app.mediafilter.FormatFilter = \
  org.dspace.app.mediafilter.PDFFilter = PDF Text Extractor, \
  org.dspace.app.mediafilter.HTMLFilter = HTML Text Extractor, \
  org.dspace.app.mediafilter.WordFilter = Word Text Extractor, \
  org.dspace.app.mediafilter.PowerPointFilter = PowerPoint Text Extractor, \
  org.dspace.app.mediafilter.BrandedPreviewJPEGFilter = Branded Preview JPEG, \
  org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter = ImageMagick Image Thumbnail, \
  org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter = ImageMagick PDF Thumbnail

Input Formats

Assign MIME file types to media filters as follows:

#Configure each filter's input format(s)
filter.org.dspace.app.mediafilter.PDFFilter.inputFormats = Adobe PDF
filter.org.dspace.app.mediafilter.HTMLFilter.inputFormats = HTML, Text
filter.org.dspace.app.mediafilter.WordFilter.inputFormats = Microsoft Word
filter.org.dspace.app.mediafilter.PowerPointFilter.inputFormats = Microsoft Powerpoint, Microsoft Powerpoint XML
filter.org.dspace.app.mediafilter.BrandedPreviewJPEGFilter.inputFormats = BMP, GIF, JPEG, image/png
filter.org.dspace.app.mediafilter.ImageMagickImageThumbnailFilter.inputFormats = BMP, GIF, image/png, JPG, TIFF, JPEG, JPEG 2000
filter.org.dspace.app.mediafilter.ImageMagickPdfThumbnailFilter.inputFormats = Adobe PDF

Permissions

Configure media filter permissions. Search for "filter.org.dspace.app.mediafilter.publicPermission" and modify as follows:

#Publicly accessible thumbnails of restricted content.
#List the MediaFilter name's that would get publicly accessible permissions
#Any media filters not listed will instead inherit the permissions of the parent bitstream
filter.org.dspace.app.mediafilter.publicPermission = BrandedPreviewJPEGFilter, ImageMagickImageThumbnailFilter, ImageMagickPdfThumbnailFilter

List Emphasis

Search for xmlui.theme.mirage.item-list.emphasis. There are two options available namely "metadata" or "file", select "file".

See example below.

### Settings for Item lists in Mirage theme ###
# What should the emphasis be in the display of item lists?
# Possible values : 'file', 'metadata'. If your repository is
# used mainly for scientific papers 'metadata' is probably the
# best way. If you have a lot of images and other files 'file'
# will be the best starting point
# (metdata is the default value if this option is not specified)
xmlui.theme.mirage.item-list.emphasis = file

Save the "dspace.cfg" file and exit nano.


NANO Editor Help
CTL+O = Save the file and then press Enter
CTL+X = Exit "nano"
CTL+K = Delete line
CTL+U = Undelete line
CTL+W = Search for %%string%%
CTL+\ = Search for %%string%% and replace with $$string$$
CTL+C = Show line numbers

More info = http://en.wikipedia.org/wiki/Nano_(text_editor)


Step 4 - Rebuild DSpace

Step 5 - Test the media filers

Type the following to test. Select an item that has pdf files attached and use it as replacement for "123456789/29097".

$HOME/bin/dspace filter-media -v -i 123456789/29097

Step 6 - Create new thumbnails

The script is configured to do 1000 items at a time only. This saves on memory and CPU time. Therefore on a large system you may need to run the script several times. Also make sure that the dspace user has full read/write access to all items in the asset store folders.

$HOME/bin/dspace filter-media -v -f -m 1000 -p "ImageMagick PDF Thumbnail"
$HOME/bin/dspace filter-media -v -f -m 1000 -p "ImageMagick Image Thumbnail"

Step 7 - Add a daily admin task

See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Daily_Admin. Check the "filter-media" options!

References