Difference between revisions of "SUNScholar/Daily Admin/4.X"

From Libopedia
Jump to navigation Jump to search
 
(35 intermediate revisions by the same user not shown)
Line 8: Line 8:
  
 
==Step 2. Create "dspace" user crontab==
 
==Step 2. Create "dspace" user crontab==
The "dspace" user has to perform tasks automatically at regular intervals, such as sending out subscription emails.
 
 
On a Linux/Unix system this is easy to accomplish using the "cron" functionality.
 
 
 
Edit the crontab, by typing the following in a terminal:
 
Edit the crontab, by typing the following in a terminal:
  
Line 21: Line 17:
 
{{NANO}}
 
{{NANO}}
  
===Sample crontab===
+
==Sample crontab==
 +
Delete all of the contents and then copy and paste the following into the NANO text editor, and then save. See help for NANO above.
 +
 
 
<pre>
 
<pre>
####################################
+
## SAMPLE CRONTAB FOR A PRODUCTION DSPACE
# Initialize Environment Variables #
+
## You obviously may wish to tweak this for your own installation,
####################################
+
## but this should give you an idea of what you likely wish to schedule via cron.
# Deliver cron email to system administrator
+
##
 +
## NOTE: You may also need to add additional sysadmin related tasks to your crontab
 +
## (e.g. zipping up old log files, or even removing old logs, etc).
 +
 +
####################
 +
# GLOBAL VARIABLES #
 +
####################
 +
# Deliver cron email to the system administrator
 
MAILTO="root"
 
MAILTO="root"
  
#Ensure that HOME is set properly for our service
+
################
HOME=/home/dspace
+
# HOURLY TASKS #
 
+
################
#Add Java to PATH (for all DSpace cron jobs)
+
# (Recommended to be run multiple times per day, if possible)
#Also add all major 'bin' directories
+
# At a minimum these tasks should be run daily.
PATH=/usr/bin:/bin:/usr/local/bin
 
 
 
#Specify default Java options (for all DSpace cron jobs)
 
JAVA_OPTS=-Xmx512M -Xms512M -Dfile.encoding=UTF-8
 
  
 +
# Regenerate DSpace Sitemaps every 8 hours (12AM, 8AM, 4PM).
 +
# SiteMaps ensure that your content is more findable in Google, Google Scholar, and other major search engines.
 +
0 0,8,16 * * * $HOME/bin/dspace generate-sitemaps > /dev/null
 +
 
###############
 
###############
# Item Counts #
+
# DAILY TASKS #
 
###############
 
###############
# Update item counts every 5 minutes
+
# (Recommended to be run once per day. Feel free to tweak the scheduled times below.)
*/5 * * * * $HOME/bin/dspace itemcounter > /dev/null
+
 +
# Update the OAI-PMH index with the newest content (and re-optimize that index) at midnight every day
 +
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING OAI-PMH
 +
# (This ensures new content is available via OAI-PMH and ensures the OAI-PMH index is optimized for better performance)
 +
0 0 * * * $HOME/bin/dspace oai import -o > /dev/null
 +
 +
# Clean and Update the Discovery indexes at midnight every day
 +
# (This ensures that any deleted documents are cleaned from the Discovery search/browse index)
 +
0 0 * * * $HOME/bin/dspace index-discovery > /dev/null
 +
 +
# Re-Optimize the Discovery indexes at 12:30 every day
 +
# (This ensures that the Discovery Solr Index is re-optimized for better performance)
 +
30 0 * * * $HOME/bin/dspace index-discovery -o > /dev/null
 +
 +
# Cleanup Web Spiders from DSpace Statistics Solr Index at 01:00 every day
 +
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING SOLR STATISTICS
 +
# (This removes any known web spiders from your usage statistics)
 +
0 1 * * * $HOME/bin/dspace stats-util -i > /dev/null
 +
 +
# Re-Optimize DSpace Statistics Solr Index at 01:30 every day
 +
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING SOLR STATISTICS
 +
# (This ensures that the Statistics Solr Index is re-optimized for better performance)
 +
30 1 * * * $HOME/bin/dspace stats-util -o > /dev/null
 +
 +
# Send out subscription e-mails at 02:00 every day
 +
# (This sends an email to any users who have "subscribed" to a Collection, notifying them of newly added content.)
 +
0 2 * * * $HOME/bin/dspace sub-daily > /dev/null
 +
 +
# Run the media filter at 03:00 every day.
 +
# (This task ensures that thumbnails are generated for newly add images,
 +
# and also ensures full text search is available for newly added PDF/Word/PPT/HTML documents)
 +
0 3 * * * $HOME/bin/dspace filter-media -n -v > $HOME/log/media-filter.log 2>&1
  
##########
+
# Run any Curation Tasks queued from the Admin UI at 04:00 every day
# Hourly #
+
# (Ensures that any curation task that an administrator "queued" from the Admin UI is executed
##########
+
# asynchronously behind the scenes)
# Regenerate System Sitemaps every 8 hours
+
0 4 * * * $HOME/bin/dspace curate -q admin_ui > /dev/null
# at 3 minutes past the hour
+
 
0,8,16 * * * $HOME/bin/dspace generate-sitemaps > /dev/null
+
# Check for items to release from embargo in DSpace.
 +
#(This applies to embargoes created with DSpace versions <= 3.2) 
 +
0 5 * * * $HOME/bin/dspace embargo-lifter > $HOME/log/embargo-release.log 2>&1
  
###########
 
# Nightly #
 
###########
 
# 12:30AM
 
 
# Run DSpace statistical analysis tools (12months takes approx 40secs)
 
# Run DSpace statistical analysis tools (12months takes approx 40secs)
 
30 0 * * * $HOME/bin/dspace stat-general  > /dev/null
 
30 0 * * * $HOME/bin/dspace stat-general  > /dev/null
 
35 0 * * * $HOME/bin/dspace stat-monthly  > /dev/null
 
35 0 * * * $HOME/bin/dspace stat-monthly  > /dev/null
  
# 1:00AM
 
 
# Generate DSpace statistical analysis reports
 
# Generate DSpace statistical analysis reports
 
00 1 * * * $HOME/bin/dspace stat-report-general  > /dev/null
 
00 1 * * * $HOME/bin/dspace stat-report-general  > /dev/null
 
05 1 * * * $HOME/bin/dspace stat-report-monthly  > /dev/null
 
05 1 * * * $HOME/bin/dspace stat-report-monthly  > /dev/null
 +
 +
################
 +
# WEEKLY TASKS #
 +
################
 +
# (Recommended to be run once per week, but can be run more or less frequently, based on your local needs/policies)
  
# 2:00AM
+
# Run the checksum checker at 04:00 every Sunday
# Run the DSpace media filter scripts and save all output to a log file
+
# By default it runs through every file (-l) and also prunes old results (-p)
# (Creates image thumbnails, indexes full text, and  
+
# (This re-verifies the checksums of all files stored in DSpace. If any files have been changed/corrupted, checksums will differ.)
# performs any automated format conversions)
+
0 4 * * * $HOME/bin/dspace checker -l -p > /dev/null
0 2 * * * $HOME/bin/dspace filter-media > $HOME/log/media-filter.log 2>&1
+
#
 +
# NOTE: LARGER SITES MAY WISH TO USE DIFFERENT OPTIONS. The above "-l" option tells DSpace to check *everything*.
 +
# If your site is very large, you may need to only check a portion of your content per week. The below commented-out task
 +
# would instead check all the content it can within *one hour*. The next week it would start again where it left off.
 +
#0 4 * * 0 $HOME/bin/dspace checker -d 1h -p > /dev/null
 +
 
 +
# Mail the results of the checksum checker (see above) to the configured "mail.admin" at 05:00 every Sunday.
 +
# (This ensures the system administrator is notified whether any checksums were found to be different.)
 +
0 5 * * 0 $HOME/bin/dspace checker-emailer > /dev/null
 +
 +
#################
 +
# MONTHLY TASKS #
 +
#################
 +
# (Recommended to be run once per month, but can be run more or less frequently, based on your local needs/policies)
  
# 3:00AM
+
# Permanently delete any bitstreams flagged as "deleted" in DSpace, on the first of every month at 01:00
# Update old PostgreSQL DB based simple, advanced search and browse indexes
+
# (This ensures that any files which were deleted from DSpace are actually removed from your local filesystem.
#0 3 * * $HOME/bin/dspace index-db-browse > /dev/null
+
# By default they are just marked as deleted, but are not removed from the filesystem.)
 +
0 1 1 * * $HOME/bin/dspace cleanup > /dev/null
  
# 5:00AM
+
# Remove all DSpace/Tomcat log files which are more than 30 days old
# Check for items to release from embargo in DSpace. 
+
# on the first of every month
0 5 * * * $HOME/bin/dspace embargo-lifter > $HOME/log/embargo-release.log 2>&1
+
01 0 1 * * find $HOME/dspace/log/*.log.* -mtime +30 -exec rm {} \;
 +
01 0 1 * * find $HOME/tomcat/logs/*.log -mtime +30 -exec rm {} \;
 +
 +
################
 +
# YEARLY TASKS #
 +
################
 +
# (Recommended to be run once per year)
  
# 6:00AM
+
# At 2:00AM every January 1, "shard" the DSpace Statistics Solr index.
# Run XOAI incremental import (and optimization)
+
# This ensures each year has its own Solr index, which improves performance.
0 6 * * * $HOME/bin/dspace oai import -o > /dev/null
+
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING SOLR STATISTICS
 +
# NOTE: This is scheduled here for 2:00AM so that it happens *after* the daily cleaning & re-optimization of this index.
 +
0 2 1 1 * $HOME/bin/dspace stats-util -s > /dev/null
  
# 6:30AM
+
################
# Cleanup Web Spiders from DSpace Statistics Solr Index
+
# HOUSEKEEPING #
# -i deletes all spiders matched by IP address, DNS name or Agent name
+
################
30 6 * * * $HOME/bin/dspace stats-util -i > /dev/null
+
# (Recommended to be run daily)
  
# 7:00AM
+
# Delete any ~/config/*/*.old files more than 30 days old (created by "ant update")
# Optimize Discovery Solr Index
+
0 2 1 * * find $HOME/config -name "*-*-*.old" -mtime +30 -exec rm {} \;
0 7 * * * $HOME/bin/dspace index-discovery -o > /dev/null
+
# Delete any ~/*.bak-*-*/ directories more than 30 days old (created by "ant update")
 +
0 2 1 * * find $HOME/*.bak-*-* -maxdepth 0 -type d -mtime +30 -exec rm -rf {} \;
 +
</pre>
  
# 7:30AM
+
Save and exit the file.
# Optimize DSpace Statistics Solr Index
 
30 7 * * * $HOME/bin/dspace stats-util -o > /dev/null
 
  
# 8:00AM
+
==System Log==
# Send out DSpace subscription emails
+
To enable logging of cron events, edit the following file:
# (This alerts users of newly deposited items of interest)
+
sudo nano /etc/rsyslog.d/50-default.conf
0 8 * * * $HOME/bin/dspace sub-daily > /dev/null
+
Enable the cron log, see example below:
 +
<pre>
 +
#
 +
# First some standard log files.  Log by facility.
 +
#
 +
auth,authpriv.*                 /var/log/auth.log
 +
*.*;auth,authpriv.none          -/var/log/syslog
 +
cron.*                          -/var/log/cron.log
 +
</pre>
  
##########
+
{{NANO}}
# Weekly #
 
##########
 
  
###########
+
Now restart the syslog service as follows:
# Monthly #
+
sudo service rsyslog restart
###########
 
# 12:01AM
 
# Remove all DSpace log files which are more than 30 days old
 
# on the first of every month
 
01 0 1 * * find $HOME/log/*.log.* -mtime +30 -exec rm {} \;
 
  
# 1:00AM
+
==References==
# Completely remove any deleted bitstreams in DSpace
+
*https://wiki.duraspace.org/display/DSDOC4x/Command+Line+Operations
# on the first of every month
+
*https://wiki.duraspace.org/display/DSDOC4x/Scheduled+Tasks+via+Cron
0 1 1 * * $HOME/bin/dspace cleanup > /dev/null
+
*https://github.com/DSpace/demo.dspace.org/blob/master/scripts/linux/crontab
  
# 2:00AM
+
[[Category:System Administration]]
# Delete any ~/config/*/*.old files more than 30 days old (created by "ant update")
+
[[Category:Installation]]
0 2 1 * * find $HOME/config -name "*-*-*.old" -mtime +30 -exec rm {} \;
+
__NOTOC__
# Delete any ~/*.bak-*-*/ directories more than 30 days old (created by "ant update")
 
0 2 1 * * find $HOME/*.bak-*-* -maxdepth 0 -type d -mtime +30 -exec rm -rf {} \;
 
</pre>
 

Latest revision as of 23:59, 9 June 2016

Back to Daily Admin

Step 1. Login

http://wiki.lib.sun.ac.za/index.php/SUNScholar/Prepare_Ubuntu/S01

Click on the link above to find out how to login to the server and then return here.

Step 2. Create "dspace" user crontab

Edit the crontab, by typing the following in a terminal:

su - dspace
crontab -e

If asked to select an editor, choose nano


NANO Editor Help
CTL+O = Save the file and then press Enter
CTL+X = Exit "nano"
CTL+K = Delete line
CTL+U = Undelete line
CTL+W = Search for %%string%%
CTL+\ = Search for %%string%% and replace with $$string$$
CTL+C = Show line numbers

More info = http://en.wikipedia.org/wiki/Nano_(text_editor)


Sample crontab

Delete all of the contents and then copy and paste the following into the NANO text editor, and then save. See help for NANO above.

## SAMPLE CRONTAB FOR A PRODUCTION DSPACE
## You obviously may wish to tweak this for your own installation,
## but this should give you an idea of what you likely wish to schedule via cron.
##
## NOTE: You may also need to add additional sysadmin related tasks to your crontab
## (e.g. zipping up old log files, or even removing old logs, etc).
 
####################
# GLOBAL VARIABLES #
####################
# Deliver cron email to the system administrator
MAILTO="root"

################
# HOURLY TASKS #
################
# (Recommended to be run multiple times per day, if possible)
# At a minimum these tasks should be run daily.

# Regenerate DSpace Sitemaps every 8 hours (12AM, 8AM, 4PM).
# SiteMaps ensure that your content is more findable in Google, Google Scholar, and other major search engines.
0 0,8,16 * * * $HOME/bin/dspace generate-sitemaps > /dev/null
 
###############
# DAILY TASKS #
###############
# (Recommended to be run once per day. Feel free to tweak the scheduled times below.)
 
# Update the OAI-PMH index with the newest content (and re-optimize that index) at midnight every day
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING OAI-PMH
# (This ensures new content is available via OAI-PMH and ensures the OAI-PMH index is optimized for better performance)
0 0 * * * $HOME/bin/dspace oai import -o > /dev/null
 
# Clean and Update the Discovery indexes at midnight every day
# (This ensures that any deleted documents are cleaned from the Discovery search/browse index)
0 0 * * * $HOME/bin/dspace index-discovery > /dev/null
 
# Re-Optimize the Discovery indexes at 12:30 every day
# (This ensures that the Discovery Solr Index is re-optimized for better performance)
30 0 * * * $HOME/bin/dspace index-discovery -o > /dev/null
 
# Cleanup Web Spiders from DSpace Statistics Solr Index at 01:00 every day
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING SOLR STATISTICS
# (This removes any known web spiders from your usage statistics)
0 1 * * * $HOME/bin/dspace stats-util -i > /dev/null
 
# Re-Optimize DSpace Statistics Solr Index at 01:30 every day
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING SOLR STATISTICS 
# (This ensures that the Statistics Solr Index is re-optimized for better performance)
30 1 * * * $HOME/bin/dspace stats-util -o > /dev/null
 
# Send out subscription e-mails at 02:00 every day
# (This sends an email to any users who have "subscribed" to a Collection, notifying them of newly added content.)
0 2 * * * $HOME/bin/dspace sub-daily > /dev/null
 
# Run the media filter at 03:00 every day.
# (This task ensures that thumbnails are generated for newly add images,
# and also ensures full text search is available for newly added PDF/Word/PPT/HTML documents)
0 3 * * * $HOME/bin/dspace filter-media -n -v > $HOME/log/media-filter.log 2>&1

# Run any Curation Tasks queued from the Admin UI at 04:00 every day
# (Ensures that any curation task that an administrator "queued" from the Admin UI is executed
# asynchronously behind the scenes)
0 4 * * * $HOME/bin/dspace curate -q admin_ui > /dev/null

# Check for items to release from embargo in DSpace.
#(This applies to embargoes created with DSpace versions <= 3.2)  
0 5 * * * $HOME/bin/dspace embargo-lifter > $HOME/log/embargo-release.log 2>&1

# Run DSpace statistical analysis tools (12months takes approx 40secs)
30 0 * * * $HOME/bin/dspace stat-general  > /dev/null
35 0 * * * $HOME/bin/dspace stat-monthly  > /dev/null

# Generate DSpace statistical analysis reports
00 1 * * * $HOME/bin/dspace stat-report-general  > /dev/null
05 1 * * * $HOME/bin/dspace stat-report-monthly  > /dev/null
 
################
# WEEKLY TASKS #
################
# (Recommended to be run once per week, but can be run more or less frequently, based on your local needs/policies)

# Run the checksum checker at 04:00 every Sunday
# By default it runs through every file (-l) and also prunes old results (-p)
# (This re-verifies the checksums of all files stored in DSpace. If any files have been changed/corrupted, checksums will differ.)
0 4 * * * $HOME/bin/dspace checker -l -p > /dev/null
#
# NOTE: LARGER SITES MAY WISH TO USE DIFFERENT OPTIONS. The above "-l" option tells DSpace to check *everything*.
# If your site is very large, you may need to only check a portion of your content per week. The below commented-out task
# would instead check all the content it can within *one hour*. The next week it would start again where it left off.
#0 4 * * 0 $HOME/bin/dspace checker -d 1h -p > /dev/null
  
# Mail the results of the checksum checker (see above) to the configured "mail.admin" at 05:00 every Sunday.
# (This ensures the system administrator is notified whether any checksums were found to be different.)
0 5 * * 0 $HOME/bin/dspace checker-emailer > /dev/null
 
#################
# MONTHLY TASKS #
#################
# (Recommended to be run once per month, but can be run more or less frequently, based on your local needs/policies)

# Permanently delete any bitstreams flagged as "deleted" in DSpace, on the first of every month at 01:00
# (This ensures that any files which were deleted from DSpace are actually removed from your local filesystem.
#  By default they are just marked as deleted, but are not removed from the filesystem.)
0 1 1 * * $HOME/bin/dspace cleanup > /dev/null

# Remove all DSpace/Tomcat log files which are more than 30 days old
# on the first of every month
01 0 1 * * find $HOME/dspace/log/*.log.* -mtime +30 -exec rm {} \;
01 0 1 * * find $HOME/tomcat/logs/*.log -mtime +30 -exec rm {} \;
 
################
# YEARLY TASKS #
################
# (Recommended to be run once per year)

# At 2:00AM every January 1, "shard" the DSpace Statistics Solr index.
# This ensures each year has its own Solr index, which improves performance.
# NOTE: ONLY NECESSARY IF YOU ARE RUNNING SOLR STATISTICS
# NOTE: This is scheduled here for 2:00AM so that it happens *after* the daily cleaning & re-optimization of this index.
0 2 1 1 * $HOME/bin/dspace stats-util -s > /dev/null

################
# HOUSEKEEPING #
################
# (Recommended to be run daily)

# Delete any ~/config/*/*.old files more than 30 days old (created by "ant update")
0 2 1 * * find $HOME/config -name "*-*-*.old" -mtime +30 -exec rm {} \;
# Delete any ~/*.bak-*-*/ directories more than 30 days old (created by "ant update")
0 2 1 * * find $HOME/*.bak-*-* -maxdepth 0 -type d -mtime +30 -exec rm -rf {} \;

Save and exit the file.

System Log

To enable logging of cron events, edit the following file:

sudo nano /etc/rsyslog.d/50-default.conf

Enable the cron log, see example below:

#
# First some standard log files.  Log by facility.
#
auth,authpriv.*                 /var/log/auth.log
*.*;auth,authpriv.none          -/var/log/syslog
cron.*                          -/var/log/cron.log

NANO Editor Help
CTL+O = Save the file and then press Enter
CTL+X = Exit "nano"
CTL+K = Delete line
CTL+U = Undelete line
CTL+W = Search for %%string%%
CTL+\ = Search for %%string%% and replace with $$string$$
CTL+C = Show line numbers

More info = http://en.wikipedia.org/wiki/Nano_(text_editor)


Now restart the syslog service as follows:

sudo service rsyslog restart

References