Difference between revisions of "Open Data"

From Libopedia
Jump to navigation Jump to search
 
(14 intermediate revisions by the same user not shown)
Line 10: Line 10:
 
*https://en.wikipedia.org/wiki/Open_data<br>"Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." Retrieved: 2016/05/09
 
*https://en.wikipedia.org/wiki/Open_data<br>"Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." Retrieved: 2016/05/09
 
*https://en.wikipedia.org/wiki/Big_data<br>"Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Retrieved: 2016/05/09
 
*https://en.wikipedia.org/wiki/Big_data<br>"Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Retrieved: 2016/05/09
 
==The FAIR principles==
 
*https://www.force11.org/group/fairgroup/fairprinciples
 
  
 
==What is Open Data?==
 
==What is Open Data?==
Line 35: Line 32:
 
*http://www.isa-tools.org
 
*http://www.isa-tools.org
 
*http://opendatacommons.org
 
*http://opendatacommons.org
 +
*https://www.w3.org/TR/dwbp
  
 
===Metadata===
 
===Metadata===
Line 58: Line 56:
 
[[File:Rdm.png|border|555px]]
 
[[File:Rdm.png|border|555px]]
  
Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness
+
Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness and http://www.rcuk.ac.uk/media/news/160728
  
 
===Panton Principals For Research Data===
 
===Panton Principals For Research Data===
Line 67: Line 65:
 
*http://en.wikipedia.org/wiki/Rufus_Pollock
 
*http://en.wikipedia.org/wiki/Rufus_Pollock
 
*http://en.wikipedia.org/wiki/John_Wilbanks
 
*http://en.wikipedia.org/wiki/John_Wilbanks
 +
===The FAIR principles===
 +
*https://www.force11.org/group/fairgroup/fairprinciples
  
 
===Digital Object Identifiers (DOI)===
 
===Digital Object Identifiers (DOI)===
Line 72: Line 72:
  
 
==What is big data?==
 
==What is big data?==
[[File:Big-data-sex.jpg|555px]]
 
 
 
*http://en.wikipedia.org/wiki/Big_data
 
*http://en.wikipedia.org/wiki/Big_data
 
*http://en.wikipedia.org/wiki/Extract,_transform,_load
 
*http://en.wikipedia.org/wiki/Extract,_transform,_load
Line 83: Line 81:
 
*http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz3gycOps5c
 
*http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz3gycOps5c
 
*http://www.lynda.com/Hadoop-tutorials/Techniques-Concepts-Big-Data/158656-2.html
 
*http://www.lynda.com/Hadoop-tutorials/Techniques-Concepts-Big-Data/158656-2.html
 +
 +
[[File:Big-data-sex.jpg|555px]]
  
 
[[File:Img_bigdata.png|border|555px]]
 
[[File:Img_bigdata.png|border|555px]]
 +
 +
[[File:Big-data-use-cases.jpg|555px]]
  
 
==What are the major differences between research data (RD) and big data (BD)?==
 
==What are the major differences between research data (RD) and big data (BD)?==
Line 113: Line 115:
 
For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software
 
For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software
 
====CKAN====
 
====CKAN====
Very good installation documentation on Ubuntu 12.04 LTS. Good customisation documentation. Operations documentation not available.
+
Very good installation documentation on Ubuntu 14.04 LTS. Good customisation documentation. Operations documentation not available.
 
=====Installation=====
 
=====Installation=====
 
*http://docs.ckan.org/en/latest/maintaining/installing/index.html
 
*http://docs.ckan.org/en/latest/maintaining/installing/index.html
Line 119: Line 121:
 
*http://docs.ckan.org/en/latest/sysadmin-guide.html
 
*http://docs.ckan.org/en/latest/sysadmin-guide.html
 
=====Operations=====
 
=====Operations=====
Not available
+
*http://docs.ckan.org/en/latest/user-guide.html
 +
 
 
====Dataverse====
 
====Dataverse====
No installation guide for Ubuntu 12.04 LTS. Good customisation and operational documentation.
+
No installation guide for Ubuntu servers. Good customisation and operational documentation.
 
=====Installation=====
 
=====Installation=====
 
*http://guides.dataverse.org/en/latest/installation/index.html
 
*http://guides.dataverse.org/en/latest/installation/index.html
Line 132: Line 135:
  
 
====Geonode====
 
====Geonode====
Very good installation documentation for Ubuntu 12.04 LTS. Customisation and operations documentation available but are confusing.
+
Very good installation documentation for Ubuntu servers. Customisation and operations documentation available but are confusing.
 
=====Installation=====
 
=====Installation=====
*http://docs.geonode.org/en/latest/tutorials/admin/install/quick_install.html
+
*http://docs.geonode.org/en/master/tutorials/install_and_admin/quick_install.html
 
=====Customisation=====
 
=====Customisation=====
 
*http://docs.geonode.org/en/latest/reference/index.html
 
*http://docs.geonode.org/en/latest/reference/index.html
Line 155: Line 158:
  
 
==Research Data Management Plans (RDMP)==
 
==Research Data Management Plans (RDMP)==
===Introduction===
 
Watch the following videos:
 
https://youtu.be/VhSfw5o1dUo - John Scally: Research Data Management in the Library
 
https://youtu.be/gYDb-GP1CA4 - The what, why and how of data management planning
 
 
 
===[[Open_Data/RDMP_Tools|Tools]]===
 
===[[Open_Data/RDMP_Tools|Tools]]===
 
Click on the heading above.
 
Click on the heading above.
Line 186: Line 184:
 
===South African===
 
===South African===
 
*http://www.data.gov.za
 
*http://www.data.gov.za
*http://www.code4sa.org
 
*https://www.datafirst.uct.ac.za
 
 
*http://sada.nrf.ac.za
 
*http://sada.nrf.ac.za
 
*http://www.statssa.gov.za or http://beta2.statssa.gov.za or http://www.sastat.org.za
 
*http://www.statssa.gov.za or http://beta2.statssa.gov.za or http://www.sastat.org.za
Line 193: Line 189:
 
*http://southafrica.opendataforafrica.org
 
*http://southafrica.opendataforafrica.org
 
*http://data.worldbank.org/country/south-africa
 
*http://data.worldbank.org/country/south-africa
 
+
*http://www.code4sa.org
 +
*https://www.datafirst.uct.ac.za
 
===African===
 
===African===
 
*http://www.data.gov.et
 
*http://www.data.gov.et
Line 239: Line 236:
 
==List of data repositories==
 
==List of data repositories==
 
*http://oad.simmons.edu/oadwiki/Data_repositories
 
*http://oad.simmons.edu/oadwiki/Data_repositories
 +
*http://www.nature.com/sdata/policies/repositories
  
 
==Books==
 
==Books==
Line 244: Line 242:
  
 
==Organisations==
 
==Organisations==
 +
*https://www.odpi.org
 
*http://www.opendatafoundation.org
 
*http://www.opendatafoundation.org
 
*http://www.codata.org
 
*http://www.codata.org
Line 260: Line 259:
 
*http://www.odbms.org
 
*http://www.odbms.org
 
*http://learn-rdm.eu
 
*http://learn-rdm.eu
 +
*http://www.od4d.net
  
 
==Training==
 
==Training==
Line 368: Line 368:
  
 
==News==
 
==News==
 +
*http://www.internationaldataweek.org
 
*http://www.opendatascience.com
 
*http://www.opendatascience.com
 
*http://blog.ouseful.info/2015/08/11/fragments-scraping-tabula-data-from-pdfs/
 
*http://blog.ouseful.info/2015/08/11/fragments-scraping-tabula-data-from-pdfs/

Latest revision as of 12:17, 30 September 2016

Back to Open Campus

Introduction

In order to assist with the reproducibility of research; the library can archive research publication data, hereafter refered to as "research data".

Definitions

  • https://en.wikipedia.org/wiki/Research_data_archiving
    "Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences." Retrieved: 2016/05/09
  • https://en.wikipedia.org/wiki/Open_data
    "Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." Retrieved: 2016/05/09
  • https://en.wikipedia.org/wiki/Big_data
    "Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Retrieved: 2016/05/09

What is Open Data?

Tim Berners-Lee
The next Web of open, linked data

The Open Data Charter

Standards

Metadata

Why Open Data?

Open data encourages a "knowledge based consensus" using the scientific method.

Watch the video below for an example:

What is research data?

Watch the following videos for an introduction:

https://youtu.be/RVZbk3GEVSw - Data Sharing, Part 1 of 3 - Request
https://youtu.be/RtSv0gSbCP8 - Data Sharing, Part 2 of 3 - File Formats
https://youtu.be/-MIH8PkuUo4 - Data Sharing, Part 3 of 3 - Documentation
https://youtu.be/nNBiCcBlwRA - How to avoid a data management nightmare
https://youtu.be/q2aiDJzJPuw - An introduction to the basics of Research Data
https://vimeo.com/156313024 - Establishing a shared research data service in the UK

Rdm.png

Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness and http://www.rcuk.ac.uk/media/news/160728

Panton Principals For Research Data

The FAIR principles

Digital Object Identifiers (DOI)

See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Digital_Object_Identifier/DOI

What is big data?

Big-data-sex.jpg

Img bigdata.png

Big-data-use-cases.jpg

What are the major differences between research data (RD) and big data (BD)?

Assuming that research data and big data are both open data; what are the differences?

  1. BD datasets are huge => 1 Terabyte (TB), RD datasets are much smaller <= 1 Gigabyte (GB).
  2. BD is unstructured and uses a NOSQL database such as MongoDB or OrientDB.
    RD is very structured and uses a SQL database such as PostgreSQL.
  3. BD has a many sources, RD has usually one source.
  4. BD is collected in real time, RD is collected after analysis.

Basically BD tools are used to "surface" patterns from huge datasets, usually in real time, and make predictions, whereas RD is used to store the results of BD analysis.

BD is part of current research and is usually a new service delivered by the research office in collaboration with the central IT department.

Whereas RD is the research output of the analysis of BD, and is normally archived by the library in collaboration with the research office and central IT.

Possible Open Research Data Archiving Implementation At Stellenbosch University

Data-schematic.png

Data Implementation Diagram Notes

For biomedical data see: https://galaxyproject.org

Software Analysis

For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software

CKAN

Very good installation documentation on Ubuntu 14.04 LTS. Good customisation documentation. Operations documentation not available.

Installation
Customisation
Operations

Dataverse

No installation guide for Ubuntu servers. Good customisation and operational documentation.

Installation
Customisation
Operations

Geonode

Very good installation documentation for Ubuntu servers. Customisation and operations documentation available but are confusing.

Installation
Customisation
Operations

More Information On Other Open Data Systems

http://wiki.lib.sun.ac.za/index.php/OpenGIS
http://wiki.lib.sun.ac.za/index.php/OpenSurvey
http://wiki.lib.sun.ac.za/index.php/OpenBiology

Electronic Laboratory Notebook (ELN)

Research Data Management Plans (RDMP)

Tools

Click on the heading above.

References

CKAN4RDM Discussion

Catalogs

South African

African

Australian

European

American

International

Open Government Sites

Infrastructure

List of data repositories

Books

Organisations

Training

Conferences

Data Archive Software

Data Munging/Wrangling/Normalisation Software

See: https://en.wikipedia.org/wiki/Data_wrangling for a definition

Data Visualisation Software

Rankings

Integration

Service Providers

Research

Bibliographies

Analysis

News

Graphics