SUNScholar/Operational Guide

BACK TO CUSTOMISATION BACK TO OPERATIONAL CAPACITY BUILDING BACK TO GUIDELINES

Introduction
This wiki page attempts to present some advice about the operational management of a research archive using DSpace software.

All of the feature customisation and system administration documentation assumes:


 * 1) That DSpace has been installed according to: http://wiki.lib.sun.ac.za/index.php/SUNScholar/DSpace.
 * 2) That you are working as the "dspace" user, which was created during the installation of the Ubuntu server software.

Community/Collection Hierarchy
In DSpace there are two "containers", namely communities and collections. Communities are containers for collections and sub-communities. Collections are containers for items.

Items are individual submissions of research outputs such as articles, book chapters, data sets etc.. and each item has a unique digital object identifier (DOI).

Items also have metadata associated with them that help to create machine and human readable indexes.

What type of hierarchy is best?
This depends on the purpose of the archive, is it general or specifically subject based?

General
At Stellenbosch University library we have a general research based archive, therefore we decided to use the existing institutional organisational structure as the basis for our hierarchy.
 * 1) Faculties are designated as the top level communities.
 * 2) Departments are collections contained by the top level faculty communities.




 * Research that does not fit easily into either of the above, is stored in a top level "University Collections" community. These are normally research groups that collaborate across several disciplines. Beneath the top level "University Collections" community, we created collections for these cross-discipline research groups.
 * In addition, under the top level "University Collections" community we created a "Work-In-Progress" (WIP) community. Beneath this community we created collections for items that are pending review and then movement or mapping to other collections. For example submission and review of our student thesis and dissertations are a collection under our top level "University Collections" community.


 * "University Collections"


 * "Work-In-Progress"

Subject
With a subject based research archive the top level communities are created based on the central subject themes of the research group.

The taxonomy of these central subject themes is normally supplied by the research group.

See the screenshot below for an example from the RADAR research group who are situated on the Stellenbosch University campus.



Click here to see a list of disciplinary repositories.

The wisdom of experience
In the beginning the hierarchy seemed adequate, then issues started to appear.

Issue 1 - Harvesters
The main issue was and still is... the harvesters from other web sites. Some harvesters are "smart" and can create harvesting filters using the metadata available and others are "dumb", they simply want to harvest a single collection. Unfortunately we have to cater to the dumbest harvesters, so a possible project for us is to create "main" collections for the following under the "University Collections" top level community, such as: The collections above should satisfy the needs of the "dumb" harvesters who expect everything to be in one location. This naturally very negatively impacts the easy browse facility in DSpace, so the idea is to map items from the "main" collections to suitable destination collections. This could be an automated process if there was confidence in the standardisation and quality of the metadata per item. Unfortunately this is not always the case, so we are going to have to have more "eyeballs" to assist with the mapping of items and doing metadata quality control.
 * Student masters and doctorates
 * Researcher publications

Issue 2 - Lower the barrier for item submissions (Easy to submit!)
See: https://wwwf.imperial.ac.uk/blog/openaccess/2015/07/28/making-open-access-simple-the-imperial-college-approach-to-oa

Our well constructed and detailed collections hierarchy is great for the librarians to able to source items quickly BUT it is a nightmare for users to navigate when using the default DSpace submission system.

In addition the researchers expect the library to manage and organise items in the repository as it is basically information management and the researchers quite rightly expect the library to perform these functions on their behalf and not have to be burdened with long tedious electronic forms during submissions that have many metadata fields to complete.

A possible solution to this problem involves several actions, namely:
 * 1) Greatly simplify the hierarchy as mentioned above in the "harvesters" discussion and then map items to the relevant collections.
 * 2) Create very simple submission forms for the "main" collections to facilitate quick easy submissions for researchers and students.
 * 3) Create very simple submission workflows, and if possible, only incorporate a one step review before acceptance into the archive.
 * 4) Solict the assistance of many more "eyeballs" to do the above mapping and to capture the required metadata after the simple quick submissions mentioned above.

Together the actions mentioned above should solve both of the main issues by using the default workflow system in DSpace,  if we can get many more "eyeballs" to help with the mediation of future "quick" submissions and the "re-mapping" of existing items .

Leveraging the power of machine indexes
DSpace can create an index from any field in any metadata schema.

Therefore it is not necessary to create collections which can easily be created virtually with DSpace indexes.

For example, it is very useful to know how many research articles are produced by researchers in a department.

To accomplish this create an index on dc.type as part of the discovery feature in DSpace. Then when you browse to a collection and select the "Type" browse by feature you will be presented with a list of items by type for that collection. See the screenshot below for an example from the Dept of Electrical and Electronic Engineering.

A very useful feature for research repositories would be the ability for the user to create "custom" search filters. for example: A user would like to know the research output of a department by type for a particular period and then be able to export that data in an open interoperable file format such as CSV.



Please note: The accuracy of these indexes depends on the accuracy of the item metadata, therefore the more "eyeballs" there are ensuring metadata quality, the better.

System and Feature Stability
The installation, customisation and development of DSpace are not standardised. This is understandable when trying to maximise the reach of DSpace, however later on this causes feature and system instabilities.

Therefore as a potential user, bear in mind the context in which DSpace was developed, hopefully in the future DSpace will concentrate on standardisation in all aspects and promote system and feature stability.

Feature and system stability can be accomplished by implementing a standardised reference architecture and software release cadence as best practices, for the implementation of DSpace software as a trusted/accredited institutional digital repository.

For a stable production system, it seems that you should only enable those features which are absolutely necessary for your situation.

If you know of any Java web application developers who strongly support the open access movement, please encourage them to join the Duraspace foundation and work on DSpace!

Also please note:
 * http://programmers.stackexchange.com/questions/102090/why-isnt-java-used-for-modern-web-application-development
 * http://it.slashdot.org/story/15/06/16/1559212/report-aging-java-components-to-blame-for-massively-buggy-open-source-software
 * [[File:Software_applications_have_on_average_24_vulnerabilities_inherited_from_buggy_components_ITworld.pdf]]