SUNScholar/Self-Hosting Value Proposition

Back to Repository Preservation

Introduction
This wiki page attempts to introduce the self-hosting value proposition for an institutional research repository.

Self-hosting is an institutional policy adopted by those public academic research institutions that are acutely aware of the critical importance of data sovereignty and data authenticity.

The value being proposed:


 * It is eminently more economical and sustainable in the long term for academic libraries to build capacity and become online open access scholarly communication publishers themselves without resorting to outside commercial services ever again and risking another very expensive "serials bundle crisis" that was and is precipitated by very greedy commercial academic journal publishers.


 * Libraries are uniquely positioned to be in the "forever business" of maintaining the digital scholarly record because most libraries are already staffed by information professionals and therefore the library only needs a few simple hires to transform into a non-profit digital research publisher and long term digital curator.

Assumptions
The following is assumed:
 * 1) The academic library is the responsible publisher and therefore employs and trains the required personnel.
 * 2) Normal monthly overheads, such as building rental and maintenance, utilities and insurance etc... are deemed to be absorbed by the academic institution as part of normal operations, with or without this value proposition and are therefore not a factor.  However, they are a BIG factor for commercial publishing entities and this is where an academic library benefits the most, it and the institution are already absorbing institutional overhead costs! 
 * 3) The proposition involves normal annual running costs, with fixed asset costs amortised over a four year period for the library.
 * 4) The proposition does not cover costs for ad-hoc or continuing digitisation projects. Digitisation is assumed to be costed separately.
 * 5) The open system focus is on the ability to provide "green" open access publishing infrastructure and support, the immediate deposit, immediate access, model (ID/IA) as per Steven Harnard's definition.
 * 6) The open system is based strictly on open standards and open source software, so that there are no intellectual property or service contract costs, payable to third parties, at all.
 * 7) The central IT department provides internet infrastructure and data centre support, and that a service agreement exists for such services, between the library and the central IT department.

Cost per item download (CPD)
To be able to make comparisons with other library publishing systems and to provide a single simple metric, cost per item downloaded (CPD) will be calculated.

Calculation Data Used

 * 1) Number of items downloaded in 2014 as per our Piwik stats = 102000
 * 2) Production Server Hardware Cost = R250,000 to be amortised over 4 years, the length of a typical hardware warranty.
 * 3) Backup Server Hardware Cost = R100,000 to be amortised over 4 years, the length of a typical hardware warranty.
 * 4) Salary - OSCD - Open Scholarly Communications Director (New Hire) = R500,000
 * 5) Salary - OSCM - Open Scholarly Communications Manager (New Hire) = R350,000
 * 6) Salary - OSCL - Open Scholarly Communications Librarian (New Hire) = R250,000
 * 7) Salary - OSPI - Open Scholarly Publications Infrastructure Manager (New Hire) = R350,000
 * 8) Salary - OSPS - Open Scholarly Publications Software Technologist (New Hire) = R250,000
 * 9) Salary - OSPH - Open Scholarly Publications Hardware Technologist (New Hire) = R250,000

The cost of hardware per annum, is therefore:
(R250,000/4) [One production server amortised over 4 years] + ((R100,000/4) x 2) [Two backup servers amortised over 4 years] = R112,500 pa
 * One production server
 * Two backup servers

Notes:

 * 1) For production hardware specifications see: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Install_Ubuntu/S01#Hardware
 * 2) For backup hardware requirements see: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Disaster_Recovery
 * 3) The initial server hardware capital expenditure (CAPEX) of R450,000 ($45,000 approx) has to be financed and planned for, thereafter an annual budget provision for server hardware replacement must be planned for.

The cost of personnel per annum, is therefore:
R500,000 [1 x OSCD] + R700,000 [OSCM + OSPI] + (4 x R250,000) [4 x OSCL] + (R250,000 + R250,000) [OSPS + OSPH] = R2,700,000 pa

Operational

 * One scholarly communications director
 * One scholarly communications operational manager
 * Four scholarly communications librarians

Technical

 * One scholarly publications systems technical manager
 * One scholarly publications systems web programmer
 * One scholarly publications systems technology administrator

Notes:

 * 1) For detailed scholarly communications management personnel requirements see: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Capacity_Building/Digital_Repository_Content_Management
 * 2) For detailed scholarly publications management personnel requirements see: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Capacity_Building/Digital_Repository_Systems_Management

The total cost per annum is therefore:
R112,500 [Hardware] + R2,700,000 [Personnel] = R2,812,500 pa

Therefore cost per item downloaded on SUNScholar for 2014 is:
R2,812,500 [Total Cost] divided by 102000 [No of items downloaded] = R27.57 rounded out to the nearest cent.

Observations

 * 1) Assuming that each academic library in South Africa pays R40 million per year to commercial publishers on average, then a fully open scholarly communication publishing system only costs 5% of the annual e-resources budget!
 * 2) At an average 2014 R/U$ exchange rate of R10.00 to the dollar, the CPD = $3.00 approx. The more that items are downloaded, the cheaper it becomes per item!
 * 3) SUNScholar has approximately 14000 full-text items therefore the cost to publish per item for 2014 was: R2,812,500/14000 = R200 or $20 approx. The more full text items available for download, the better! This makes better economical use of the resources.
 * 4) If existing library staff, who are digitally competent and are passionate about open access, could be trained and reassigned, then the CPD drops radically - since no new hires are required.

Conclusion

 * The costs in relation to the large commercial publishers, is extremely economical.
 * It would be very interesting to see any research with open published data, regarding the costs for commercially published items.
 * For the reader, I leave the determination of the pro's and con's of self-hosting as an evaluation for themselves and their institution.
 * Example Pro's
 * Significantly more financial budget control because the institution is not at the mercy of monopolistic commercial academic publishers or monopolistic commercial repository service providers.
 * Physical custody of all academic research and teaching digital assets.
 * Complete control of the publishing workflow which allows the institution to customise the system to be fit for the purpose for which it was created, rather than trying to customise a one-size-fits-all system.
 * Example Con's
 * Training of librarians in new scholarly publishing competencies.
 * Training of librarians to advocate for open access research publishing.
 * Hiring persons in academic libraries that have expert internet publishing technology skills.
 * Adopting internet publishing technology as standard and supported practice for academic libraries.
 * Convincing academic libraries to become good open source software community members.

News

 * http://bjoern.brembs.net/2016/01/how-much-should-a-scholarly-article-cost-the-taxpayer
 * http://www.jisc.ac.uk/blog/how-to-prepare-for-the-financial-side-of-open-access-17-oct-2014
 * http://www.semantico.com/2014/07/freemium-and-the-forever-business-payment-models-in-scholarly-publishing
 * http://blogs.lse.ac.uk/impactofsocialsciences/2014/04/24/value-of-digital-assets-in-research-centres
 * http://www.nature.com/news/open-access-the-true-cost-of-science-publishing-1.12676
 * http://www.nature.com/news/specials/scipublishing/index.html

Disclaimer
Best guess estimates have been made with regard to annual salaries.

More accurate published data from libraries worldwide would be very helpful.

See graph below for disk usage on our production server.



The disks use RAID6 with a total capacity of 5TB=100% for the /home (red line above) folder where the digital assets are stored.

The server was upgraded in Dec/Jan 2015/2016. The previous server only had 500GB disk space which was used in only 4 years.