Open Data

From Libopedia
Jump to navigation Jump to search
Back to Open Campus

Introduction

In order to assist with the reproducibility of research; the library can archive research publication data, hereafter refered to as "research data".

Definitions

  • https://en.wikipedia.org/wiki/Research_data_archiving
    "Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences." Retrieved: 2016/05/09
  • https://en.wikipedia.org/wiki/Open_data
    "Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." Retrieved: 2016/05/09
  • https://en.wikipedia.org/wiki/Big_data
    "Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Retrieved: 2016/05/09

What is Open Data?

Tim Berners-Lee
The next Web of open, linked data

The Open Data Charter

Standards

Metadata

Why Open Data?

Open data encourages a "knowledge based consensus" using the scientific method.

Watch the video below for an example:

What is research data?

Watch the following videos for an introduction:

https://youtu.be/RVZbk3GEVSw - Data Sharing, Part 1 of 3 - Request
https://youtu.be/RtSv0gSbCP8 - Data Sharing, Part 2 of 3 - File Formats
https://youtu.be/-MIH8PkuUo4 - Data Sharing, Part 3 of 3 - Documentation
https://youtu.be/nNBiCcBlwRA - How to avoid a data management nightmare
https://youtu.be/q2aiDJzJPuw - An introduction to the basics of Research Data
https://vimeo.com/156313024 - Establishing a shared research data service in the UK

Rdm.png

Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness and http://www.rcuk.ac.uk/media/news/160728

Panton Principals For Research Data

The FAIR principles

Digital Object Identifiers (DOI)

See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Digital_Object_Identifier/DOI

What is big data?

Big-data-sex.jpg

Img bigdata.png

Big-data-use-cases.jpg

What are the major differences between research data (RD) and big data (BD)?

Assuming that research data and big data are both open data; what are the differences?

  1. BD datasets are huge => 1 Terabyte (TB), RD datasets are much smaller <= 1 Gigabyte (GB).
  2. BD is unstructured and uses a NOSQL database such as MongoDB or OrientDB.
    RD is very structured and uses a SQL database such as PostgreSQL.
  3. BD has a many sources, RD has usually one source.
  4. BD is collected in real time, RD is collected after analysis.

Basically BD tools are used to "surface" patterns from huge datasets, usually in real time, and make predictions, whereas RD is used to store the results of BD analysis.

BD is part of current research and is usually a new service delivered by the research office in collaboration with the central IT department.

Whereas RD is the research output of the analysis of BD, and is normally archived by the library in collaboration with the research office and central IT.

Possible Open Research Data Archiving Implementation At Stellenbosch University

Data-schematic.png

Data Implementation Diagram Notes

For biomedical data see: https://galaxyproject.org

Software Analysis

For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software

CKAN

Very good installation documentation on Ubuntu 14.04 LTS. Good customisation documentation. Operations documentation not available.

Installation
Customisation
Operations

Dataverse

No installation guide for Ubuntu servers. Good customisation and operational documentation.

Installation
Customisation
Operations

Geonode

Very good installation documentation for Ubuntu servers. Customisation and operations documentation available but are confusing.

Installation
Customisation
Operations

More Information On Other Open Data Systems

http://wiki.lib.sun.ac.za/index.php/OpenGIS
http://wiki.lib.sun.ac.za/index.php/OpenSurvey
http://wiki.lib.sun.ac.za/index.php/OpenBiology

Electronic Laboratory Notebook (ELN)

Research Data Management Plans (RDMP)

Tools

Click on the heading above.

References

CKAN4RDM Discussion

Catalogs

South African

African

Australian

European

American

International

Open Government Sites

Infrastructure

List of data repositories

Books

Organisations

Training

Conferences

Data Archive Software

Data Munging/Wrangling/Normalisation Software

See: https://en.wikipedia.org/wiki/Data_wrangling for a definition

Data Visualisation Software

Rankings

Integration

Service Providers

Research

Bibliographies

Analysis

News

Graphics