Open Data

Back to Open Campus

Introduction
''' In order to assist with the reproducibility of research; the library can archive research publication data, hereafter refered to as "research data". '''

Definitions

 * https://en.wikipedia.org/wiki/Research_data_archiving "Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences." Retrieved: 2016/05/09
 * https://en.wikipedia.org/wiki/Open_data "Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." Retrieved: 2016/05/09
 * https://en.wikipedia.org/wiki/Big_data "Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Retrieved: 2016/05/09

What is Open Data?

 * http://theodi.org/guides/what-open-data
 * http://opendefinition.org
 * http://en.wikipedia.org/wiki/Open_data
 * http://www.socrata.com/open-data-field-guide
 * http://libguides.wits.ac.za/digitisation_preservation_and_digitalcuration


 * Tim Berners-Lee: The next Web of open, linked data

https://www.youtube.com/watch?v=OM6XIICm_qo

The Open Data Charter

 * http://opendatacharter.net

Standards

 * http://www.infodocket.com/2015/12/29/accessingusing-data-on-the-web-new-uses-cases-best-practices-and-recommendations-published-by-w3c
 * http://www.ddialliance.org
 * http://opendatahandbook.org
 * http://isacommons.org
 * http://www.isa-tools.org
 * http://opendatacommons.org
 * https://www.w3.org/TR/dwbp

Metadata

 * http://rd-alliance.github.io/metadata-directory
 * http://dublincore.org/groups/sam
 * http://wiki.dublincore.org/index.php/DCMI_Science_And_Metadata

Why Open Data?
Open data encourages a "knowledge based consensus" using the scientific method.

Watch the video below for an example:

File:Knowledge-based-consensus.mp4

What is research data?
Watch the following videos for an introduction: https://youtu.be/RVZbk3GEVSw - Data Sharing, Part 1 of 3 - Request https://youtu.be/RtSv0gSbCP8 - Data Sharing, Part 2 of 3 - File Formats https://youtu.be/-MIH8PkuUo4 - Data Sharing, Part 3 of 3 - Documentation https://youtu.be/nNBiCcBlwRA - How to avoid a data management nightmare https://youtu.be/q2aiDJzJPuw - An introduction to the basics of Research Data https://vimeo.com/156313024 - Establishing a shared research data service in the UK

Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness and http://www.rcuk.ac.uk/media/news/160728

Panton Principals For Research Data

 * http://en.wikipedia.org/wiki/Panton_Principles
 * http://pantonprinciples.org
 * http://en.wikipedia.org/wiki/Peter_Murray-Rust
 * http://en.wikipedia.org/wiki/Cameron_Neylon
 * http://en.wikipedia.org/wiki/Rufus_Pollock
 * http://en.wikipedia.org/wiki/John_Wilbanks

The FAIR principles

 * https://www.force11.org/group/fairgroup/fairprinciples

Digital Object Identifiers (DOI)
See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Digital_Object_Identifier/DOI

What is big data?

 * http://en.wikipedia.org/wiki/Big_data
 * http://en.wikipedia.org/wiki/Extract,_transform,_load
 * http://www.sas.com/en_us/insights/big-data/what-is-big-data.html
 * http://radar.oreilly.com/2012/01/what-is-big-data.html
 * http://www.mongodb.com/big-data-explained
 * http://bigdatauniversity.com
 * http://www.bigdata.amadeus.com
 * http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz3gycOps5c
 * http://www.lynda.com/Hadoop-tutorials/Techniques-Concepts-Big-Data/158656-2.html







What are the major differences between research data (RD) and big data (BD)?
Assuming that research data and big data are both open data; what are the differences?
 * 1) BD datasets are huge => 1 Terabyte (TB), RD datasets are much smaller <= 1 Gigabyte (GB).
 * 2) BD is unstructured and uses a NOSQL database such as MongoDB or OrientDB. RD is very structured and uses a SQL database such as PostgreSQL.
 * 3) BD has a many sources, RD has usually one source.
 * 4) BD is collected in real time, RD is collected after analysis.

Basically BD tools are used to "surface" patterns from huge datasets, usually in real time, and make predictions, whereas RD is used to store the results of BD analysis.

BD is part of current research and is usually a new service delivered by the research office in collaboration with the central IT department.

Whereas RD is the research output of the analysis of BD, and is normally archived by the library in collaboration with the research office and central IT.

Possible Open Research Data Archiving Implementation At Stellenbosch University

 * Data-schematic.png

Data Implementation Diagram Notes

 * At the top is the research data catalog/portal/user help web site... http://data.sun.ac.za using Mediawiki with the Semantic Forms extension installed.
 * 1st from the left is general research data... http://open.data.sun.ac.za using: http://ckan.org
 * 2nd from the left is survey data from academic publications... http://pubs.data.sun.ac.za using: http://dataverse.org and https://pkp.sfu.ca/dataverse-network-plugin-release
 * And finally to the right is GIS specific research data... http://maps.data.sun.ac.za using: http://geonode.org

For biomedical data see: https://galaxyproject.org

Software Analysis
For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software

CKAN
Very good installation documentation on Ubuntu 14.04 LTS. Good customisation documentation. Operations documentation not available.

Installation

 * http://docs.ckan.org/en/latest/maintaining/installing/index.html

Customisation

 * http://docs.ckan.org/en/latest/sysadmin-guide.html

Operations

 * http://docs.ckan.org/en/latest/user-guide.html

Dataverse
No installation guide for Ubuntu servers. Good customisation and operational documentation.

Installation

 * http://guides.dataverse.org/en/latest/installation/index.html
 * http://guides.dataverse.org/en/latest/developers/ubuntu.html

Customisation

 * http://guides.dataverse.org/en/latest/developers/index.html

Operations

 * http://guides.dataverse.org/en/latest/user/index.html

Geonode
Very good installation documentation for Ubuntu servers. Customisation and operations documentation available but are confusing.

Installation

 * http://docs.geonode.org/en/master/tutorials/install_and_admin/quick_install.html

Customisation

 * http://docs.geonode.org/en/latest/reference/index.html

Operations

 * http://docs.geonode.org/en/latest/organizational/index.html

More Information On Other Open Data Systems
http://wiki.lib.sun.ac.za/index.php/OpenGIS http://wiki.lib.sun.ac.za/index.php/OpenSurvey http://wiki.lib.sun.ac.za/index.php/OpenBiology

Electronic Laboratory Notebook (ELN)

 * https://en.wikipedia.org/wiki/Electronic_lab_notebook
 * http://jupyter.org
 * http://www.elabftw.net
 * http://www.limswiki.org/index.php/ELN_vendor
 * http://mylabbook.org
 * http://lablog.sourceforge.net
 * https://www.google.com/keep

Tools
Click on the heading above.

CKAN4RDM Discussion

 * http://lists.okfn.org/mailman/listinfo/ckan4rdm
 * http://ckan.org/2013/09/25/edawax
 * http://eprints.lincoln.ac.uk/9778/1/CKANEvaluation.pdf
 * http://ckan.org/2013/11/28/ckan4rdm-st-andrews
 * http://research-computing.wp.st-andrews.ac.uk/2013/11/27/using-ckan-for-research-data-management
 * http://ckan.org/2013/02/27/ckan-for-research-data-management-workshop
 * http://orbital.blogs.lincoln.ac.uk/2013/02/27/ckan-for-rdm-workshop

Catalogs

 * https://mran.revolutionanalytics.com/documents/data

South African

 * http://www.data.gov.za
 * http://sada.nrf.ac.za
 * http://www.statssa.gov.za or http://beta2.statssa.gov.za or http://www.sastat.org.za
 * http://www.afdb.org/en/knowledge/statistics/open-data-for-africa/
 * http://southafrica.opendataforafrica.org
 * http://data.worldbank.org/country/south-africa
 * http://www.code4sa.org
 * https://www.datafirst.uct.ac.za

African

 * http://www.data.gov.et
 * http://africaopendata.org
 * http://opendataforafrica.org

Australian

 * https://researchdata.ands.org.au

European

 * http://open-data.europa.eu

American

 * http://www.data.gov

International

 * http://www.re3data.org
 * http://databib.org
 * http://data.okfn.org
 * http://datahub.io
 * http://openspending.org
 * http://data.worldbank.org
 * http://project.opendatamonitor.eu
 * http://opendatainception.io
 * http://www.opendatanetwork.com

Open Government Sites

 * United Nations (http://data.un.org/)
 * U.S. (http://www.data.gov/)
 * [List of cities/states with open data](http://simplystatistics.org/2012/01/02/list-of-cities-states-with-open-data-help-me-find/)
 * United Kingdom (http://data.gov.uk/)
 * France (http://www.data.gouv.fr/)
 * Ghana (http://data.gov.gh/)
 * Australia (http://data.gov.au/)
 * Germany (https://www.govdata.de/)
 * Hong Kong (http://www.gov.hk/en/theme/psi/datasets/)
 * Japan (http://www.data.go.jp/)
 * Many more (http://www.data.gov/opendatasites)

Infrastructure

 * http://opendataplatform.org
 * http://www.opencompute.org
 * http://wiki.lib.sun.ac.za/index.php/OpenStack
 * http://blog.backblaze.com/2013/02/20/180tb-of-good-vibrations-storage-pod-3-0
 * http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets
 * http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage

List of data repositories

 * http://oad.simmons.edu/oadwiki/Data_repositories
 * http://www.nature.com/sdata/policies/repositories

Books

 * [[Media:Issues-in-open-research-data.pdf|2014 - Moore - Issues in open research data]]

Organisations

 * https://www.odpi.org
 * http://www.opendatafoundation.org
 * http://www.codata.org
 * http://nedicc.com
 * http://www.dirisa.ac.za
 * http://www.icsu-wds.org
 * http://theodi.org
 * http://www.opendataresearch.org
 * http://opendataday.org
 * http://www.archiveteam.org
 * https://rd-alliance.org
 * http://www.nationaldataservice.org
 * http://cardio.dcc.ac.uk
 * http://www.opengovpartnership.org
 * https://researchdata.ands.org.au
 * http://www.odbms.org
 * http://learn-rdm.eu
 * http://www.od4d.net

Training

 * https://www.coursera.org/learn/data-management
 * https://class.coursera.org/datavisualization-001
 * http://learn-rdm.eu
 * http://www.dataversity.net
 * http://datascienceacademy.com
 * http://altbibl.io/dst4l
 * http://datasciencemasters.org
 * http://datacarpentry.github.io
 * https://www.datacamp.com
 * http://schoolofdata.org
 * http://youtu.be/q2aiDJzJPuw
 * http://opendata.stackexchange.com

Conferences

 * http://csvconf.com
 * http://www.africaopendata.net
 * http://openvisconf.com
 * http://www.scidatacon2014.org

Data Archive Software

 * General Data Management (GDM) - http://ckan.org - http://docs.ckan.org/en/latest - http://www.ohloh.net/p/ckan
 * Research Data Management (RDM) - http://thedata.org - http://thedata.harvard.edu/guides/genindex.html - http://www.ohloh.net/p/dvn
 * http://irods.org
 * http://datadryad.org
 * http://drupal.org/project/dkan
 * http://openglam.org
 * http://opendatakit.org
 * http://www.ihsn.org/home/software/nada
 * http://datalift.org
 * http://dataconservancy.org
 * http://www.project-redcap.org

Data Munging/Wrangling/Normalisation Software
See: https://en.wikipedia.org/wiki/Data_wrangling for a definition
 * http://openrefine.org
 * http://journal.code4lib.org/articles/11013

Data Visualisation Software

 * http://lisacharlotterost.github.io/2016/05/17/one-chart-tools
 * [[Media:2015-data-design.pdf|2015 - Infoactive - Data + Design - A simple introduction to preparing and visualizing information]]
 * http://www.coolinfographics.com/tools
 * http://datavisualization.ch/tools/selected-tools
 * http://www.creativebloq.com/design-tools/data-visualization-712402
 * http://thenextweb.com/dd/2015/04/21/the-14-best-data-visualization-tools
 * http://opensource.com/life/15/6/eight-open-source-data-visualization-tools
 * https://en.wikipedia.org/wiki/KNIME
 * http://gephi.github.io
 * http://bokeh.pydata.org
 * http://www.htmlwidgets.org
 * http://www.dbvis.com
 * http://selection.datavisualization.ch
 * http://oicweave.org
 * http://scikit-learn.org
 * http://flowingdata.com
 * http://www.infosthetics.com
 * http://www.informationisbeautiful.net
 * http://hint.fm
 * http://www-958.ibm.com/software/analytics/manyeyes
 * http://misoproject.com
 * http://bl.ocksplorer.org
 * https://keen.io
 * http://www.cytoscape.org
 * http://www.eclipse.org/birt
 * http://www.spagobi.org
 * http://www.pentaho.com
 * http://codap.concord.org
 * http://timeline.knightlab.com
 * https://github.com/datawrapper/datawrapper
 * https://github.com/nnnick/Chart.js
 * https://github.com/mikesall/charted
 * https://github.com/mbostock/d3
 * https://github.com/densitydesign/raw
 * http://dygraphs.com
 * http://leafletjs.com
 * https://developers.google.com/chart
 * https://public.tableau.com

Rankings

 * http://barometer.opendataresearch.org

Integration

 * http://odin-project.eu
 * http://projects.iq.harvard.edu/ojs-dvn/book/faq-ojs-dataverse-integration-project
 * http://pkp.sfu.ca/ojs/docs/userguide/2.3.3/authorSubmission4.html

Service Providers

 * http://arkivum.com

Research

 * http://www.opendataresearch.org
 * http://datascience.codata.org
 * https://unlockingresearch.blog.lib.cam.ac.uk/?p=285
 * http://jlsc-pub.org/10/volume/3/issue/2
 * http://researchdataq.org
 * http://okfnlabs.org
 * http://schoolofdata.org
 * http://www.gigasciencejournal.com

Bibliographies

 * http://digital-scholarship.org/rdcb/rdcb.htm

Analysis

 * https://ropensci.org
 * https://github.com/0xdata/h2o
 * http://africacheck.org

News

 * http://www.internationaldataweek.org
 * http://www.opendatascience.com
 * http://blog.ouseful.info/2015/08/11/fragments-scraping-tabula-data-from-pdfs/
 * http://www.computerworld.com/article/2902920/the-data-science-ecosystem-part-2-data-wrangling.html
 * http://fumiopen.blogspot.com/2014/09/guides-to-publishing-open-data.html
 * http://royalsociety.org/policy/projects/science-public-enterprise/report
 * http://duraspace.org/five-flavors-open-access-duraspace-research-data
 * http://www.itbusinessedge.com/blogs/integration/seven-resources-for-learning-about-open-data.html
 * http://radar.oreilly.com/tag/data-economy