Open Data

From Libopedia
(Redirected from OpenData)
Jump to: navigation, search
Back to Open Campus

Introduction

Science must be reproducible, therefore archiving of digital experimental data and methods is critically important.

Definitions

  • https://en.wikipedia.org/wiki/Open_data
    "Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." Retrieved: 2016/05/09
  • https://en.wikipedia.org/wiki/Big_data
    "Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Retrieved: 2016/05/09
  • https://en.wikipedia.org/wiki/Research_data_archiving
    "Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences." Retrieved: 2016/05/09

The FAIR principles

What is Open Data?

Tim Berners-Lee
The next Web of open, linked data

The Open Data Charter

Standards

Metadata

Why Open Data?

Open data encourages a "knowledge based consensus" using the scientific method.

Watch the video below for an example:

What is research data?

Watch the following videos for an introduction:

https://youtu.be/RVZbk3GEVSw - Data Sharing, Part 1 of 3 - Request
https://youtu.be/RtSv0gSbCP8 - Data Sharing, Part 2 of 3 - File Formats
https://youtu.be/-MIH8PkuUo4 - Data Sharing, Part 3 of 3 - Documentation
https://youtu.be/nNBiCcBlwRA - How to avoid a data management nightmare
https://youtu.be/q2aiDJzJPuw - An introduction to the basics of Research Data
https://vimeo.com/156313024 - Establishing a shared research data service in the UK

Rdm.png

Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness

Panton Principals For Research Data

Digital Object Identifiers (DOI)

See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Digital_Object_Identifier/DOI

What is big data?

Big-data-sex.jpg

Img bigdata.png

What are the major differences between research data (RD) and big data (BD)?

  1. BD datasets are huge => 1 Terabyte (TB), RD datasets are much smaller <= 1 Gigabyte (GB).
  2. BD is unstructured and uses a NOSQL database such as MongoDB or OrientDB.
    RD is very structured and uses a SQL database such as PostgreSQL.
  3. BD has a many sources, RD has usually one source.
  4. BD is collected in real time, RD is collected after analysis.

Basically BD tools are used to "surface" patterns from huge datasets, usually in real time, and make predictions, whereas RD is used to store the results of BD analysis.

BD is part of current research and is usually a new service delivered by the research office in collaboration with the central IT department.

Whereas RD is the research output of the analysis of BD, and is normally archived by the library in collaboration with the research office and central IT.

Possible Open Research Data Archiving Implementation At Stellenbosch University

Data-schematic.png

Data Implementation Diagram Notes

For biomedical data see: https://galaxyproject.org

Software Analysis

For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software

CKAN

Very good installation documentation on Ubuntu 12.04 LTS. Good customisation documentation. Operations documentation not available.

Installation
Customisation
Operations

Not available

Dataverse

No installation guide for Ubuntu 12.04 LTS. Good customisation and operational documentation.

Installation
Customisation
Operations

Geonode

Very good installation documentation for Ubuntu 12.04 LTS. Customisation and operations documentation available but are confusing.

Installation
Customisation
Operations

More Information On Other Open Data Systems

http://wiki.lib.sun.ac.za/index.php/OpenGIS
http://wiki.lib.sun.ac.za/index.php/OpenSurvey
http://wiki.lib.sun.ac.za/index.php/OpenBiology

Electronic Laboratory Notebook (ELN)

Research Data Management Plans (RDMP)

Introduction

Watch the following videos:

https://youtu.be/VhSfw5o1dUo - John Scally: Research Data Management in the Library
https://youtu.be/gYDb-GP1CA4 - The what, why and how of data management planning

References

CKAN4RDM Discussion

Catalogs

South African

African

Australian

European

American

International

Open Government Sites

Infrastructure

List of data repositories

Books

Organisations

Training

Conferences

Data Archive Software

Data Munging/Wrangling/Normalisation Software

See: https://en.wikipedia.org/wiki/Data_wrangling for a definition

Data Visualisation Software

Rankings

Integration

Service Providers

Research

Analysis

News

Graphics