Difference between revisions of "Open Data"

From Libopedia
Jump to navigation Jump to search
 
(47 intermediate revisions by the same user not shown)
Line 4: Line 4:
  
 
==Introduction==
 
==Introduction==
'''<font color="red">Science must be reproducible, therefore archiving of digital experimental data and methods is critically important.</font>'''
+
'''<font color="red">In order to assist with the reproducibility of research; the library can archive research publication data, hereafter refered to as "research data".</font>'''
 +
 
 +
==Definitions==
 +
*https://en.wikipedia.org/wiki/Research_data_archiving<br>"Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences." Retrieved: 2016/05/09
 +
*https://en.wikipedia.org/wiki/Open_data<br>"Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." Retrieved: 2016/05/09
 +
*https://en.wikipedia.org/wiki/Big_data<br>"Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Retrieved: 2016/05/09
  
 
==What is Open Data?==
 
==What is Open Data?==
Line 27: Line 32:
 
*http://www.isa-tools.org
 
*http://www.isa-tools.org
 
*http://opendatacommons.org
 
*http://opendatacommons.org
 +
*https://www.w3.org/TR/dwbp
  
 
===Metadata===
 
===Metadata===
Line 41: Line 47:
  
 
==What is research data?==
 
==What is research data?==
Watch the following animated fictional videos for an introduction:
+
Watch the following videos for an introduction:
 
  https://youtu.be/RVZbk3GEVSw - Data Sharing, Part 1 of 3 - Request
 
  https://youtu.be/RVZbk3GEVSw - Data Sharing, Part 1 of 3 - Request
 
  https://youtu.be/RtSv0gSbCP8 - Data Sharing, Part 2 of 3 - File Formats
 
  https://youtu.be/RtSv0gSbCP8 - Data Sharing, Part 2 of 3 - File Formats
Line 47: Line 53:
 
  https://youtu.be/nNBiCcBlwRA - How to avoid a data management nightmare
 
  https://youtu.be/nNBiCcBlwRA - How to avoid a data management nightmare
 
  https://youtu.be/q2aiDJzJPuw - An introduction to the basics of Research Data
 
  https://youtu.be/q2aiDJzJPuw - An introduction to the basics of Research Data
 
+
https://vimeo.com/156313024 - Establishing a shared research data service in the UK
 
[[File:Rdm.png|border|555px]]
 
[[File:Rdm.png|border|555px]]
  
Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness
+
Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness and http://www.rcuk.ac.uk/media/news/160728
  
 
===Panton Principals For Research Data===
 
===Panton Principals For Research Data===
Line 59: Line 65:
 
*http://en.wikipedia.org/wiki/Rufus_Pollock
 
*http://en.wikipedia.org/wiki/Rufus_Pollock
 
*http://en.wikipedia.org/wiki/John_Wilbanks
 
*http://en.wikipedia.org/wiki/John_Wilbanks
 +
===The FAIR principles===
 +
*https://www.force11.org/group/fairgroup/fairprinciples
  
 
===Digital Object Identifiers (DOI)===
 
===Digital Object Identifiers (DOI)===
Line 64: Line 72:
  
 
==What is big data?==
 
==What is big data?==
[[File:Big-data-sex.jpg]]
 
 
 
*http://en.wikipedia.org/wiki/Big_data
 
*http://en.wikipedia.org/wiki/Big_data
 
*http://en.wikipedia.org/wiki/Extract,_transform,_load
 
*http://en.wikipedia.org/wiki/Extract,_transform,_load
Line 75: Line 81:
 
*http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz3gycOps5c
 
*http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz3gycOps5c
 
*http://www.lynda.com/Hadoop-tutorials/Techniques-Concepts-Big-Data/158656-2.html
 
*http://www.lynda.com/Hadoop-tutorials/Techniques-Concepts-Big-Data/158656-2.html
 +
 +
[[File:Big-data-sex.jpg|555px]]
  
 
[[File:Img_bigdata.png|border|555px]]
 
[[File:Img_bigdata.png|border|555px]]
 +
 +
[[File:Big-data-use-cases.jpg|555px]]
  
 
==What are the major differences between research data (RD) and big data (BD)?==
 
==What are the major differences between research data (RD) and big data (BD)?==
 +
''Assuming that research data and big data are both open data; what are the differences?''
 
#BD datasets are huge => 1 [http://en.wikipedia.org/wiki/Terabyte Terabyte] (TB), RD datasets are much smaller <= 1 [http://en.wikipedia.org/wiki/Gigabyte Gigabyte] (GB).
 
#BD datasets are huge => 1 [http://en.wikipedia.org/wiki/Terabyte Terabyte] (TB), RD datasets are much smaller <= 1 [http://en.wikipedia.org/wiki/Gigabyte Gigabyte] (GB).
 
#BD is unstructured and uses a [http://en.wikipedia.org/wiki/NoSQL NOSQL] database such as [http://en.wikipedia.org/wiki/MongoDB MongoDB] or [http://en.wikipedia.org/wiki/OrientDB OrientDB].<br>RD is very structured and uses a [http://en.wikipedia.org/wiki/SQL SQL] database such as [http://en.wikipedia.org/wiki/PostgreSQL PostgreSQL].
 
#BD is unstructured and uses a [http://en.wikipedia.org/wiki/NoSQL NOSQL] database such as [http://en.wikipedia.org/wiki/MongoDB MongoDB] or [http://en.wikipedia.org/wiki/OrientDB OrientDB].<br>RD is very structured and uses a [http://en.wikipedia.org/wiki/SQL SQL] database such as [http://en.wikipedia.org/wiki/PostgreSQL PostgreSQL].
Line 104: Line 115:
 
For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software
 
For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software
 
====CKAN====
 
====CKAN====
Very good installation documentation on Ubuntu 12.04 LTS. Good customisation documentation. Operations documentation not available.
+
Very good installation documentation on Ubuntu 14.04 LTS. Good customisation documentation. Operations documentation not available.
 
=====Installation=====
 
=====Installation=====
 
*http://docs.ckan.org/en/latest/maintaining/installing/index.html
 
*http://docs.ckan.org/en/latest/maintaining/installing/index.html
Line 110: Line 121:
 
*http://docs.ckan.org/en/latest/sysadmin-guide.html
 
*http://docs.ckan.org/en/latest/sysadmin-guide.html
 
=====Operations=====
 
=====Operations=====
Not available
+
*http://docs.ckan.org/en/latest/user-guide.html
 +
 
 
====Dataverse====
 
====Dataverse====
No installation guide for Ubuntu 12.04 LTS. Good customisation and operational documentation.
+
No installation guide for Ubuntu servers. Good customisation and operational documentation.
 
=====Installation=====
 
=====Installation=====
 
*http://guides.dataverse.org/en/latest/installation/index.html
 
*http://guides.dataverse.org/en/latest/installation/index.html
Line 123: Line 135:
  
 
====Geonode====
 
====Geonode====
Very good installation documentation for Ubuntu 12.04 LTS. Customisation and operations documentation available but are confusing.
+
Very good installation documentation for Ubuntu servers. Customisation and operations documentation available but are confusing.
 
=====Installation=====
 
=====Installation=====
*http://docs.geonode.org/en/latest/tutorials/admin/install/quick_install.html
+
*http://docs.geonode.org/en/master/tutorials/install_and_admin/quick_install.html
 
=====Customisation=====
 
=====Customisation=====
 
*http://docs.geonode.org/en/latest/reference/index.html
 
*http://docs.geonode.org/en/latest/reference/index.html
Line 138: Line 150:
 
==Electronic Laboratory Notebook (ELN)==
 
==Electronic Laboratory Notebook (ELN)==
 
*https://en.wikipedia.org/wiki/Electronic_lab_notebook
 
*https://en.wikipedia.org/wiki/Electronic_lab_notebook
 +
*http://jupyter.org
 
*http://www.elabftw.net
 
*http://www.elabftw.net
 
*http://www.limswiki.org/index.php/ELN_vendor
 
*http://www.limswiki.org/index.php/ELN_vendor
Line 145: Line 158:
  
 
==Research Data Management Plans (RDMP)==
 
==Research Data Management Plans (RDMP)==
===Introduction===
+
===[[Open_Data/RDMP_Tools|Tools]]===
Watch the following videos:
+
Click on the heading above.
https://youtu.be/VhSfw5o1dUo - John Scally: Research Data Management in the Library
 
https://youtu.be/gYDb-GP1CA4 - The what, why and how of data management planning
 
 
 
 
===[[SUNScholar/References|References]]===
 
===[[SUNScholar/References|References]]===
 
*https://dmp.cdlib.org
 
*https://dmp.cdlib.org
Line 174: Line 184:
 
===South African===
 
===South African===
 
*http://www.data.gov.za
 
*http://www.data.gov.za
*http://www.code4sa.org
 
*https://www.datafirst.uct.ac.za
 
 
*http://sada.nrf.ac.za
 
*http://sada.nrf.ac.za
 
*http://www.statssa.gov.za or http://beta2.statssa.gov.za or http://www.sastat.org.za
 
*http://www.statssa.gov.za or http://beta2.statssa.gov.za or http://www.sastat.org.za
Line 181: Line 189:
 
*http://southafrica.opendataforafrica.org
 
*http://southafrica.opendataforafrica.org
 
*http://data.worldbank.org/country/south-africa
 
*http://data.worldbank.org/country/south-africa
 
+
*http://www.code4sa.org
 +
*https://www.datafirst.uct.ac.za
 
===African===
 
===African===
 
*http://www.data.gov.et
 
*http://www.data.gov.et
Line 203: Line 212:
 
*http://project.opendatamonitor.eu
 
*http://project.opendatamonitor.eu
 
*http://opendatainception.io
 
*http://opendatainception.io
 +
*http://www.opendatanetwork.com
  
 
===Open Government Sites===
 
===Open Government Sites===
Line 226: Line 236:
 
==List of data repositories==
 
==List of data repositories==
 
*http://oad.simmons.edu/oadwiki/Data_repositories
 
*http://oad.simmons.edu/oadwiki/Data_repositories
 +
*http://www.nature.com/sdata/policies/repositories
  
 
==Books==
 
==Books==
Line 231: Line 242:
  
 
==Organisations==
 
==Organisations==
 +
*https://www.odpi.org
 
*http://www.opendatafoundation.org
 
*http://www.opendatafoundation.org
 
*http://www.codata.org
 
*http://www.codata.org
Line 246: Line 258:
 
*https://researchdata.ands.org.au
 
*https://researchdata.ands.org.au
 
*http://www.odbms.org
 
*http://www.odbms.org
 +
*http://learn-rdm.eu
 +
*http://www.od4d.net
  
 
==Training==
 
==Training==
 +
*https://www.coursera.org/learn/data-management
 +
*https://class.coursera.org/datavisualization-001
 +
*http://learn-rdm.eu
 
*http://www.dataversity.net
 
*http://www.dataversity.net
 
*http://datascienceacademy.com
 
*http://datascienceacademy.com
Line 253: Line 270:
 
*http://datasciencemasters.org
 
*http://datasciencemasters.org
 
*http://datacarpentry.github.io
 
*http://datacarpentry.github.io
*https://class.coursera.org/datavisualization-001
 
 
*https://www.datacamp.com
 
*https://www.datacamp.com
 
*http://schoolofdata.org
 
*http://schoolofdata.org
 
*http://youtu.be/q2aiDJzJPuw
 
*http://youtu.be/q2aiDJzJPuw
 +
*http://opendata.stackexchange.com
  
 
==Conferences==
 
==Conferences==
 +
*http://csvconf.com
 
*http://www.africaopendata.net
 
*http://www.africaopendata.net
 
*http://openvisconf.com
 
*http://openvisconf.com
Line 274: Line 292:
 
*http://datalift.org
 
*http://datalift.org
 
*http://dataconservancy.org
 
*http://dataconservancy.org
 +
*http://www.project-redcap.org
  
 
==Data Munging/Wrangling/Normalisation Software==
 
==Data Munging/Wrangling/Normalisation Software==
Line 281: Line 300:
  
 
==Data Visualisation Software==
 
==Data Visualisation Software==
 +
*http://lisacharlotterost.github.io/2016/05/17/one-chart-tools
 +
*[[Media:2015-data-design.pdf|2015 - Infoactive - Data + Design - A simple introduction to preparing and visualizing information]]
 
*http://www.coolinfographics.com/tools
 
*http://www.coolinfographics.com/tools
 
*http://datavisualization.ch/tools/selected-tools
 
*http://datavisualization.ch/tools/selected-tools
*[[Media:2015-data-design.pdf|2015 - Infoactive - Data + Design - A simple introduction to preparing and visualizing information]]
 
 
*http://www.creativebloq.com/design-tools/data-visualization-712402
 
*http://www.creativebloq.com/design-tools/data-visualization-712402
 
*http://thenextweb.com/dd/2015/04/21/the-14-best-data-visualization-tools
 
*http://thenextweb.com/dd/2015/04/21/the-14-best-data-visualization-tools
Line 339: Line 359:
 
*http://schoolofdata.org
 
*http://schoolofdata.org
 
*http://www.gigasciencejournal.com
 
*http://www.gigasciencejournal.com
 +
==Bibliographies==
 +
*http://digital-scholarship.org/rdcb/rdcb.htm
  
 
==Analysis==
 
==Analysis==
Line 346: Line 368:
  
 
==News==
 
==News==
 +
*http://www.internationaldataweek.org
 +
*http://www.opendatascience.com
 
*http://blog.ouseful.info/2015/08/11/fragments-scraping-tabula-data-from-pdfs/
 
*http://blog.ouseful.info/2015/08/11/fragments-scraping-tabula-data-from-pdfs/
 
*http://www.computerworld.com/article/2902920/the-data-science-ecosystem-part-2-data-wrangling.html
 
*http://www.computerworld.com/article/2902920/the-data-science-ecosystem-part-2-data-wrangling.html
Line 364: Line 388:
 
File:Researcher-Data-Insights-Infographic-FINAL.jpg
 
File:Researcher-Data-Insights-Infographic-FINAL.jpg
 
</gallery>
 
</gallery>
 +
[[Category:Help]]

Latest revision as of 12:17, 30 September 2016

Back to Open Campus

Introduction

In order to assist with the reproducibility of research; the library can archive research publication data, hereafter refered to as "research data".

Definitions

  • https://en.wikipedia.org/wiki/Research_data_archiving
    "Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences." Retrieved: 2016/05/09
  • https://en.wikipedia.org/wiki/Open_data
    "Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." Retrieved: 2016/05/09
  • https://en.wikipedia.org/wiki/Big_data
    "Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Retrieved: 2016/05/09

What is Open Data?

Tim Berners-Lee
The next Web of open, linked data

The Open Data Charter

Standards

Metadata

Why Open Data?

Open data encourages a "knowledge based consensus" using the scientific method.

Watch the video below for an example:

What is research data?

Watch the following videos for an introduction:

https://youtu.be/RVZbk3GEVSw - Data Sharing, Part 1 of 3 - Request
https://youtu.be/RtSv0gSbCP8 - Data Sharing, Part 2 of 3 - File Formats
https://youtu.be/-MIH8PkuUo4 - Data Sharing, Part 3 of 3 - Documentation
https://youtu.be/nNBiCcBlwRA - How to avoid a data management nightmare
https://youtu.be/q2aiDJzJPuw - An introduction to the basics of Research Data
https://vimeo.com/156313024 - Establishing a shared research data service in the UK

Rdm.png

Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness and http://www.rcuk.ac.uk/media/news/160728

Panton Principals For Research Data

The FAIR principles

Digital Object Identifiers (DOI)

See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Digital_Object_Identifier/DOI

What is big data?

Big-data-sex.jpg

Img bigdata.png

Big-data-use-cases.jpg

What are the major differences between research data (RD) and big data (BD)?

Assuming that research data and big data are both open data; what are the differences?

  1. BD datasets are huge => 1 Terabyte (TB), RD datasets are much smaller <= 1 Gigabyte (GB).
  2. BD is unstructured and uses a NOSQL database such as MongoDB or OrientDB.
    RD is very structured and uses a SQL database such as PostgreSQL.
  3. BD has a many sources, RD has usually one source.
  4. BD is collected in real time, RD is collected after analysis.

Basically BD tools are used to "surface" patterns from huge datasets, usually in real time, and make predictions, whereas RD is used to store the results of BD analysis.

BD is part of current research and is usually a new service delivered by the research office in collaboration with the central IT department.

Whereas RD is the research output of the analysis of BD, and is normally archived by the library in collaboration with the research office and central IT.

Possible Open Research Data Archiving Implementation At Stellenbosch University

Data-schematic.png

Data Implementation Diagram Notes

For biomedical data see: https://galaxyproject.org

Software Analysis

For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software

CKAN

Very good installation documentation on Ubuntu 14.04 LTS. Good customisation documentation. Operations documentation not available.

Installation
Customisation
Operations

Dataverse

No installation guide for Ubuntu servers. Good customisation and operational documentation.

Installation
Customisation
Operations

Geonode

Very good installation documentation for Ubuntu servers. Customisation and operations documentation available but are confusing.

Installation
Customisation
Operations

More Information On Other Open Data Systems

http://wiki.lib.sun.ac.za/index.php/OpenGIS
http://wiki.lib.sun.ac.za/index.php/OpenSurvey
http://wiki.lib.sun.ac.za/index.php/OpenBiology

Electronic Laboratory Notebook (ELN)

Research Data Management Plans (RDMP)

Tools

Click on the heading above.

References

CKAN4RDM Discussion

Catalogs

South African

African

Australian

European

American

International

Open Government Sites

Infrastructure

List of data repositories

Books

Organisations

Training

Conferences

Data Archive Software

Data Munging/Wrangling/Normalisation Software

See: https://en.wikipedia.org/wiki/Data_wrangling for a definition

Data Visualisation Software

Rankings

Integration

Service Providers

Research

Bibliographies

Analysis

News

Graphics