Difference between revisions of "SUNScholar/Harvesting"

From Libopedia
Jump to navigation Jump to search
 
(45 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<center>
 
<center>
  '''[[SUNScholar/Customisation|Back to Customisation]]'''
+
  '''[[SUNScholar/Operational_Guide|BACK TO OPERATIONAL GUIDE]]'''
 
</center>
 
</center>
  
;Config
+
===Introduction===
Edit the following file;
+
This wiki page provides a brief explanation of how to harvest items from a collection on another repository system.
nano /home/dspace/'''[http://wiki.lib.sun.ac.za/index.php/SUNScholar/Install_DSpace/S03#Step_3.2 source]'''/dspace/config/modules/oai.cfg
 
#Select whether storage will be the SOLR database or the PostgreSQL database
 
#Define OAI URL's.
 
#Define the OAI folder paths.
 
#Define harvester settings.
 
See sample below.
 
;Sample
 
<pre>
 
#---------------------------------------------------------------#
 
#--------------------XOAI CONFIGURATIONS------------------------#
 
#---------------------------------------------------------------#
 
# These configs are used by the XOAI                            #
 
#---------------------------------------------------------------#
 
  
# Storage: solr | database
+
Also see: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Remote_Harvest
storage=database
 
  
# Base solr index
+
===Requirements===
solr.url=http://localhost/solr/oai
+
Check that the remote repository has a valid OAI-PMH interface with which to interact. See the help links below.
# OAI persistent identifier prefix.
 
# Format - oai:PREFIX:HANDLE
 
identifier.prefix = scholar.sun.ac.za
 
# Base url for bitstreams
 
bitstream.baseUrl = http://scholar.sun.ac.za
 
  
# Base Configuration Directory
+
*http://www.openarchives.org/Register/ValidateSite
config.dir = /home/dspace/config/crosswalks/oai
+
*http://validator.oaipmh.com
 +
*http://re.cs.uct.ac.za
  
# Description
+
===Step 1 - Create a collection to receive harvested items===
description.file = /home/dspace/config/crosswalks/oai/description.xml
+
Go to the community on your repository system that will host the collection and create the collection as normal.
  
# Cache enabled?
+
===Step 2 - Configure the collection for harvesting===
cache.enabled = true
+
Now select the collection as a collection that will harvest items from another repository and submit details of the remote collection.
  
# Base Cache Directory
+
See screenshot below.
cache.dir = /home/dspace/var/oai
 
  
#---------------------------------------------------------------#
+
[[File:Harvesting-collection.png|border]]
#--------------OAI HARVESTING CONFIGURATIONS--------------------#
 
#---------------------------------------------------------------#
 
# These configs are only used by the OAI-ORE related functions  #
 
#---------------------------------------------------------------#
 
  
### Harvester settings
+
===Step 3 - Begin harvesting===
 +
After selecting the type of harvest you wish to do, click on the "Start" harvest button.
  
# Crosswalk settings; the {name} value must correspond to a declated ingestion crosswalk
+
===Step 4 - Schedule automatic harvesting updates===
# harvester.oai.metadataformats.{name} = {namespace},{optional display name}
+
Go to the "control panel" and select the automatic harvesting of the collections so that the collections are properly synchronised in future after the initial harvest.
# The display name is only used in the xmlui for the jspui there are entries in the
 
# Messages.properties in the form jsp.tools.edit-collection.form.label21.select.{name}
 
harvester.oai.metadataformats.dc = http://www.openarchives.org/OAI/2.0/oai_dc/, Simple Dublin Core
 
harvester.oai.metadataformats.qdc = http://purl.org/dc/terms/, Qualified Dublin Core
 
harvester.oai.metadataformats.dim = http://www.dspace.org/xmlns/dspace/dim, DSpace Intermediate Metadata
 
  
# This field works in much the same way as harvester.oai.metadataformats.PluginName
+
See screenshot below.
# The {name} must correspond to a declared ingestion crosswalk, while the
 
# {namespace} must be supported by the target OAI-PMH provider when harvesting content.
 
# harvester.oai.oreSerializationFormat.{name} = {namespace}
 
  
# Determines whether the harvester scheduling process should be started
+
[[File:Harvesting-control.png|border]]
# automatically when the DSpace webapp is deployed.
 
# default: false
 
harvester.autoStart=false
 
  
# Amount of time subtracted from the from argument of the PMH request to account
+
===Documentation===
# for the time taken to negotiate a connection. Measured in seconds. Default value is 120.
+
*http://www.openarchives.org/OAI/2.0/guidelines-harvester.htm
#harvester.timePadding = 120
+
*https://openknowledge.worldbank.org/harvesting-the-okr
 
+
===References===
# How frequently the harvest scheduler checks the remote provider for updates,
+
*https://wiki.duraspace.org/display/DSDOC5x/XMLUI+Configuration+and+Customization#XMLUIConfigurationandCustomization-HarvestingItemsfromXMLUIviaOAI-OREorOAI-PMH
# messured in minutes. The default vaule is 12 hours (or 720 minutes)
+
*https://wiki.duraspace.org/display/DSDOC4x/XMLUI+Configuration+and+Customization#XMLUIConfigurationandCustomization-HarvestingItemsfromXMLUIviaOAI-OREorOAI-PMH
#harvester.harvestFrequency = 720
+
*https://wiki.duraspace.org/display/DSDOC3x/XMLUI+Configuration+and+Customization#XMLUIConfigurationandCustomization-HarvestingItemsfromXMLUIviaOAI-OREorOAI-PMH
 
+
[[Category:Customisation]]
# The heartbeat is the frequency at which the harvest scheduler queries the local
+
[[Category:Operations]]
# database to determine if any collections are due for a harvest cycle (based on
 
# the harvestFrequency) value. The scheduler is optimized to then sleep until the
 
# next collection is actually ready to be harvested. The minHeartbeat and
 
# maxHeartbeat are the lower and upper bounds on this timeframe. Measured in seconds.
 
# Default minHeartbeat is 30.  Default maxHeartbeat is 3600.
 
#harvester.minHeartbeat = 30
 
#harvester.maxHeartbeat = 3600
 
 
 
# How many harvest process threads the scheduler can spool up at once. Default value is 3.
 
#harvester.maxThreads = 3
 
 
 
# How much time passess before a harvest thread is terminated. The termination process
 
# waits for the current item to complete ingest and saves progress made up to that point.
 
# Measured in hours. Default value is 24.
 
#harvester.threadTimeout = 24
 
 
 
# When harvesting an item that contains an unknown schema or field within a schema what
 
# should the harvester do? Either add a new registry item for the field or schema, ignore
 
# the specific field or schema (importing everything else about the item), or fail with
 
# an error. The default value if undefined is: fail.
 
# Possible values: 'fail', 'add', or 'ignore'
 
harvester.unknownField  = add
 
harvester.unknownSchema = fail
 
 
 
# The webapp responsible for minting the URIs for ORE Resource Maps.
 
# If using oai, the dspace.oai.uri config value must be set.
 
# The URIs generated for ORE ReMs follow the following convention for both cases.
 
# format: [baseURI]/metadata/handle/[theHandle]/ore.xml
 
# Default value is oai
 
#ore.authoritative.source = oai
 
 
 
# A harvest process will attempt to scan the metadata of the incoming items
 
# (dc.identifier.uri field, to be exact) to see if it looks like a handle.
 
# If so, it matches the pattern against the values of this parameter.
 
# If there is a match the new item is assigned the handle from the metadata value
 
# instead of minting a new one. Default value: hdl.handle.net
 
#harvester.acceptedHandleServer = hdl.handle.net, handle.myu.edu
 
 
 
# Pattern to reject as an invalid handle prefix (known test string, for example)
 
# when attempting to find the handle of harvested items. If there is a match with
 
# this config parameter, a new handle will be minted instead. Default value: 123456789.
 
#harvester.rejectedHandlePrefix = 123456789, myTestHandle
 
</pre>
 
;Daily Task
 
'''[[SUNScholar/Daily_Admin|Click here]]''' to define the following task to update the OAI database daily.
 
/home/dspace/bin/dspace oai import -o
 
;Help
 
*http://wiki.lib.sun.ac.za/index.php/SUNScholar/OAI-PMH
 
;References
 
* https://wiki.duraspace.org/display/DSDOC3x/OAI
 
* https://wiki.duraspace.org/display/DSDOC18/OAI
 
* https://wiki.duraspace.org/display/DSDOC18/XMLUI+Configuration+and+Customization#XMLUIConfigurationandCustomization-AutomaticHarvesting(Scheduler)
 
* https://wiki.duraspace.org/display/DSDOC18/XMLUI+Configuration+and+Customization#XMLUIConfigurationandCustomization-HarvestingItemsfromXMLUIviaOAI-OREorOAI-PMH
 

Latest revision as of 16:10, 29 May 2016

BACK TO OPERATIONAL GUIDE

Introduction

This wiki page provides a brief explanation of how to harvest items from a collection on another repository system.

Also see: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Remote_Harvest

Requirements

Check that the remote repository has a valid OAI-PMH interface with which to interact. See the help links below.

Step 1 - Create a collection to receive harvested items

Go to the community on your repository system that will host the collection and create the collection as normal.

Step 2 - Configure the collection for harvesting

Now select the collection as a collection that will harvest items from another repository and submit details of the remote collection.

See screenshot below.

Harvesting-collection.png

Step 3 - Begin harvesting

After selecting the type of harvest you wish to do, click on the "Start" harvest button.

Step 4 - Schedule automatic harvesting updates

Go to the "control panel" and select the automatic harvesting of the collections so that the collections are properly synchronised in future after the initial harvest.

See screenshot below.

Harvesting-control.png

Documentation

References