Difference between revisions of "SUNScholar/Harvesting"

From Libopedia
Jump to navigation Jump to search
 
(42 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
<center>
 
<center>
  '''[[SUNScholar/Customisation|Back to Customisation]]'''
+
  '''[[SUNScholar/Operational_Guide|BACK TO OPERATIONAL GUIDE]]'''
 
</center>
 
</center>
  
==[[SUNScholar/Harvesting/3.2|For DSpace 3.2]]==
+
===Introduction===
 +
This wiki page provides a brief explanation of how to harvest items from a collection on another repository system.
  
==Config==
+
  Also see: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Remote_Harvest
Edit the following file;
 
  nano /home/dspace/'''[http://wiki.lib.sun.ac.za/index.php/SUNScholar/Install_DSpace/S03#Step_3.2 source]'''/dspace/config/modules/oai.cfg
 
#Select whether storage will be the SOLR database or the PostgreSQL database
 
#Define OAI URL's.
 
#Define the OAI folder paths.
 
#Define harvester settings.
 
See sample below.
 
===Sample===
 
<pre>
 
#---------------------------------------------------------------#
 
#--------------------XOAI CONFIGURATIONS------------------------#
 
#---------------------------------------------------------------#
 
# These configs are used by the XOAI                            #
 
#---------------------------------------------------------------#
 
  
# Storage: solr | database
+
===Requirements===
storage=database
+
Check that the remote repository has a valid OAI-PMH interface with which to interact. See the help links below.
  
# Base solr index
+
*http://www.openarchives.org/Register/ValidateSite
solr.url=http://localhost/solr/oai
+
*http://validator.oaipmh.com
# OAI persistent identifier prefix.
+
*http://re.cs.uct.ac.za
# Format - oai:PREFIX:HANDLE
 
identifier.prefix = scholar.sun.ac.za
 
# Base url for bitstreams
 
bitstream.baseUrl = http://scholar.sun.ac.za
 
  
# Base Configuration Directory
+
===Step 1 - Create a collection to receive harvested items===
config.dir = /home/dspace/config/crosswalks/oai
+
Go to the community on your repository system that will host the collection and create the collection as normal.
  
# Description
+
===Step 2 - Configure the collection for harvesting===
description.file = /home/dspace/config/crosswalks/oai/description.xml
+
Now select the collection as a collection that will harvest items from another repository and submit details of the remote collection.
  
# Cache enabled?
+
See screenshot below.
cache.enabled = true
 
  
# Base Cache Directory
+
[[File:Harvesting-collection.png|border]]
cache.dir = /home/dspace/var/oai
 
  
#---------------------------------------------------------------#
+
===Step 3 - Begin harvesting===
#--------------OAI HARVESTING CONFIGURATIONS--------------------#
+
After selecting the type of harvest you wish to do, click on the "Start" harvest button.
#---------------------------------------------------------------#
 
# These configs are only used by the OAI-ORE related functions  #
 
#---------------------------------------------------------------#
 
  
### Harvester settings
+
===Step 4 - Schedule automatic harvesting updates===
 +
Go to the "control panel" and select the automatic harvesting of the collections so that the collections are properly synchronised in future after the initial harvest.
  
# Crosswalk settings; the {name} value must correspond to a declated ingestion crosswalk
+
See screenshot below.
# harvester.oai.metadataformats.{name} = {namespace},{optional display name}
 
# The display name is only used in the xmlui for the jspui there are entries in the
 
# Messages.properties in the form jsp.tools.edit-collection.form.label21.select.{name}
 
harvester.oai.metadataformats.dc = http://www.openarchives.org/OAI/2.0/oai_dc/, Simple Dublin Core
 
harvester.oai.metadataformats.qdc = http://purl.org/dc/terms/, Qualified Dublin Core
 
harvester.oai.metadataformats.dim = http://www.dspace.org/xmlns/dspace/dim, DSpace Intermediate Metadata
 
  
# This field works in much the same way as harvester.oai.metadataformats.PluginName
+
[[File:Harvesting-control.png|border]]
# The {name} must correspond to a declared ingestion crosswalk, while the
 
# {namespace} must be supported by the target OAI-PMH provider when harvesting content.
 
# harvester.oai.oreSerializationFormat.{name} = {namespace}
 
  
# Determines whether the harvester scheduling process should be started
+
===Documentation===
# automatically when the DSpace webapp is deployed.
+
*http://www.openarchives.org/OAI/2.0/guidelines-harvester.htm
# default: false
+
*https://openknowledge.worldbank.org/harvesting-the-okr
harvester.autoStart=false
+
===References===
 
+
*https://wiki.duraspace.org/display/DSDOC5x/XMLUI+Configuration+and+Customization#XMLUIConfigurationandCustomization-HarvestingItemsfromXMLUIviaOAI-OREorOAI-PMH
# Amount of time subtracted from the from argument of the PMH request to account
+
*https://wiki.duraspace.org/display/DSDOC4x/XMLUI+Configuration+and+Customization#XMLUIConfigurationandCustomization-HarvestingItemsfromXMLUIviaOAI-OREorOAI-PMH
# for the time taken to negotiate a connection. Measured in seconds. Default value is 120.
+
*https://wiki.duraspace.org/display/DSDOC3x/XMLUI+Configuration+and+Customization#XMLUIConfigurationandCustomization-HarvestingItemsfromXMLUIviaOAI-OREorOAI-PMH
#harvester.timePadding = 120
+
[[Category:Customisation]]
 
+
[[Category:Operations]]
# How frequently the harvest scheduler checks the remote provider for updates,
 
# messured in minutes. The default vaule is 12 hours (or 720 minutes)
 
#harvester.harvestFrequency = 720
 
 
 
# The heartbeat is the frequency at which the harvest scheduler queries the local
 
# database to determine if any collections are due for a harvest cycle (based on
 
# the harvestFrequency) value. The scheduler is optimized to then sleep until the
 
# next collection is actually ready to be harvested. The minHeartbeat and
 
# maxHeartbeat are the lower and upper bounds on this timeframe. Measured in seconds.
 
# Default minHeartbeat is 30.  Default maxHeartbeat is 3600.
 
#harvester.minHeartbeat = 30
 
#harvester.maxHeartbeat = 3600
 
 
 
# How many harvest process threads the scheduler can spool up at once. Default value is 3.
 
#harvester.maxThreads = 3
 
 
 
# How much time passess before a harvest thread is terminated. The termination process
 
# waits for the current item to complete ingest and saves progress made up to that point.
 
# Measured in hours. Default value is 24.
 
#harvester.threadTimeout = 24
 
 
 
# When harvesting an item that contains an unknown schema or field within a schema what
 
# should the harvester do? Either add a new registry item for the field or schema, ignore
 
# the specific field or schema (importing everything else about the item), or fail with
 
# an error. The default value if undefined is: fail.
 
# Possible values: 'fail', 'add', or 'ignore'
 
harvester.unknownField  = add
 
harvester.unknownSchema = fail
 
 
 
# The webapp responsible for minting the URIs for ORE Resource Maps.
 
# If using oai, the dspace.oai.uri config value must be set.
 
# The URIs generated for ORE ReMs follow the following convention for both cases.
 
# format: [baseURI]/metadata/handle/[theHandle]/ore.xml
 
# Default value is oai
 
#ore.authoritative.source = oai
 
 
 
# A harvest process will attempt to scan the metadata of the incoming items
 
# (dc.identifier.uri field, to be exact) to see if it looks like a handle.
 
# If so, it matches the pattern against the values of this parameter.
 
# If there is a match the new item is assigned the handle from the metadata value
 
# instead of minting a new one. Default value: hdl.handle.net
 
#harvester.acceptedHandleServer = hdl.handle.net, handle.myu.edu
 
 
 
# Pattern to reject as an invalid handle prefix (known test string, for example)
 
# when attempting to find the handle of harvested items. If there is a match with
 
# this config parameter, a new handle will be minted instead. Default value: 123456789.
 
#harvester.rejectedHandlePrefix = 123456789, myTestHandle
 
</pre>
 
 
 
==Daily Task==
 
'''[[SUNScholar/Daily_Admin|Click here]]''' to define the following task to update the OAI database daily.
 
/home/dspace/bin/dspace oai import -o
 
 
 
==Help==
 
*http://wiki.lib.sun.ac.za/index.php/SUNScholar/OAI-PMH
 
 
 
==References==
 
* https://wiki.duraspace.org/display/DSDOC3x/OAI
 
* https://wiki.duraspace.org/display/DSDOC18/OAI
 
* https://wiki.duraspace.org/display/DSDOC18/XMLUI+Configuration+and+Customization#XMLUIConfigurationandCustomization-AutomaticHarvesting(Scheduler)
 
* https://wiki.duraspace.org/display/DSDOC18/XMLUI+Configuration+and+Customization#XMLUIConfigurationandCustomization-HarvestingItemsfromXMLUIviaOAI-OREorOAI-PMH
 
 
 
{{SOLR-HELP}}
 
 
 
{{SOLR-WEBAPP}}
 

Latest revision as of 16:10, 29 May 2016

BACK TO OPERATIONAL GUIDE

Introduction

This wiki page provides a brief explanation of how to harvest items from a collection on another repository system.

Also see: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Remote_Harvest

Requirements

Check that the remote repository has a valid OAI-PMH interface with which to interact. See the help links below.

Step 1 - Create a collection to receive harvested items

Go to the community on your repository system that will host the collection and create the collection as normal.

Step 2 - Configure the collection for harvesting

Now select the collection as a collection that will harvest items from another repository and submit details of the remote collection.

See screenshot below.

Harvesting-collection.png

Step 3 - Begin harvesting

After selecting the type of harvest you wish to do, click on the "Start" harvest button.

Step 4 - Schedule automatic harvesting updates

Go to the "control panel" and select the automatic harvesting of the collections so that the collections are properly synchronised in future after the initial harvest.

See screenshot below.

Harvesting-control.png

Documentation

References