Open Data
Back to Open Campus
Contents
- 1 Introduction
- 2 Definitions
- 3 What is Open Data?
- 4 Why Open Data?
- 5 What is research data?
- 6 What is big data?
- 7 What are the major differences between research data (RD) and big data (BD)?
- 8 Possible Open Research Data Archiving Implementation At Stellenbosch University
- 9 Electronic Laboratory Notebook (ELN)
- 10 Research Data Management Plans (RDMP)
- 11 Catalogs
- 12 Infrastructure
- 13 List of data repositories
- 14 Books
- 15 Organisations
- 16 Training
- 17 Conferences
- 18 Data Archive Software
- 19 Data Munging/Wrangling/Normalisation Software
- 20 Data Visualisation Software
- 21 Rankings
- 22 Integration
- 23 Service Providers
- 24 Research
- 25 Bibliographies
- 26 Analysis
- 27 News
- 28 Graphics
Introduction
In order to assist with the reproducibility of research; the library can archive research publication data, hereafter refered to as "research data".
Definitions
- https://en.wikipedia.org/wiki/Research_data_archiving
"Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences." Retrieved: 2016/05/09 - https://en.wikipedia.org/wiki/Open_data
"Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." Retrieved: 2016/05/09 - https://en.wikipedia.org/wiki/Big_data
"Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Retrieved: 2016/05/09
What is Open Data?
- http://theodi.org/guides/what-open-data
- http://opendefinition.org
- http://en.wikipedia.org/wiki/Open_data
- http://www.socrata.com/open-data-field-guide
- http://libguides.wits.ac.za/digitisation_preservation_and_digitalcuration
- Tim Berners-Lee
- The next Web of open, linked data
The Open Data Charter
Standards
- http://www.infodocket.com/2015/12/29/accessingusing-data-on-the-web-new-uses-cases-best-practices-and-recommendations-published-by-w3c
- http://www.ddialliance.org
- http://opendatahandbook.org
- http://isacommons.org
- http://www.isa-tools.org
- http://opendatacommons.org
- https://www.w3.org/TR/dwbp
Metadata
- http://rd-alliance.github.io/metadata-directory
- http://dublincore.org/groups/sam
- http://wiki.dublincore.org/index.php/DCMI_Science_And_Metadata
Why Open Data?
Open data encourages a "knowledge based consensus" using the scientific method.
Watch the video below for an example:
What is research data?
Watch the following videos for an introduction:
https://youtu.be/RVZbk3GEVSw - Data Sharing, Part 1 of 3 - Request https://youtu.be/RtSv0gSbCP8 - Data Sharing, Part 2 of 3 - File Formats https://youtu.be/-MIH8PkuUo4 - Data Sharing, Part 3 of 3 - Documentation https://youtu.be/nNBiCcBlwRA - How to avoid a data management nightmare https://youtu.be/q2aiDJzJPuw - An introduction to the basics of Research Data https://vimeo.com/156313024 - Establishing a shared research data service in the UK
Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness and http://www.rcuk.ac.uk/media/news/160728
Panton Principals For Research Data
- http://en.wikipedia.org/wiki/Panton_Principles
- http://pantonprinciples.org
- http://en.wikipedia.org/wiki/Peter_Murray-Rust
- http://en.wikipedia.org/wiki/Cameron_Neylon
- http://en.wikipedia.org/wiki/Rufus_Pollock
- http://en.wikipedia.org/wiki/John_Wilbanks
The FAIR principles
Digital Object Identifiers (DOI)
See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Digital_Object_Identifier/DOI
What is big data?
- http://en.wikipedia.org/wiki/Big_data
- http://en.wikipedia.org/wiki/Extract,_transform,_load
- http://www.sas.com/en_us/insights/big-data/what-is-big-data.html
- http://radar.oreilly.com/2012/01/what-is-big-data.html
- http://www.mongodb.com/big-data-explained
- http://bigdatauniversity.com
- http://www.bigdata.amadeus.com
- http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz3gycOps5c
- http://www.lynda.com/Hadoop-tutorials/Techniques-Concepts-Big-Data/158656-2.html
What are the major differences between research data (RD) and big data (BD)?
Assuming that research data and big data are both open data; what are the differences?
- BD datasets are huge => 1 Terabyte (TB), RD datasets are much smaller <= 1 Gigabyte (GB).
- BD is unstructured and uses a NOSQL database such as MongoDB or OrientDB.
RD is very structured and uses a SQL database such as PostgreSQL. - BD has a many sources, RD has usually one source.
- BD is collected in real time, RD is collected after analysis.
Basically BD tools are used to "surface" patterns from huge datasets, usually in real time, and make predictions, whereas RD is used to store the results of BD analysis.
BD is part of current research and is usually a new service delivered by the research office in collaboration with the central IT department.
Whereas RD is the research output of the analysis of BD, and is normally archived by the library in collaboration with the research office and central IT.
Possible Open Research Data Archiving Implementation At Stellenbosch University
Data Implementation Diagram Notes
- At the top is the research data catalog/portal/user help web site... http://data.sun.ac.za using Mediawiki with the Semantic Forms extension installed.
- 1st from the left is general research data... http://open.data.sun.ac.za using: http://ckan.org
- 2nd from the left is survey data from academic publications... http://pubs.data.sun.ac.za using: http://dataverse.org and https://pkp.sfu.ca/dataverse-network-plugin-release
- And finally to the right is GIS specific research data... http://maps.data.sun.ac.za using: http://geonode.org
For biomedical data see: https://galaxyproject.org
Software Analysis
For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software
CKAN
Very good installation documentation on Ubuntu 14.04 LTS. Good customisation documentation. Operations documentation not available.
Installation
Customisation
Operations
Dataverse
No installation guide for Ubuntu servers. Good customisation and operational documentation.
Installation
- http://guides.dataverse.org/en/latest/installation/index.html
- http://guides.dataverse.org/en/latest/developers/ubuntu.html
Customisation
Operations
Geonode
Very good installation documentation for Ubuntu servers. Customisation and operations documentation available but are confusing.
Installation
Customisation
Operations
More Information On Other Open Data Systems
http://wiki.lib.sun.ac.za/index.php/OpenGIS http://wiki.lib.sun.ac.za/index.php/OpenSurvey http://wiki.lib.sun.ac.za/index.php/OpenBiology
Electronic Laboratory Notebook (ELN)
- https://en.wikipedia.org/wiki/Electronic_lab_notebook
- http://jupyter.org
- http://www.elabftw.net
- http://www.limswiki.org/index.php/ELN_vendor
- http://mylabbook.org
- http://lablog.sourceforge.net
- https://www.google.com/keep
Research Data Management Plans (RDMP)
Tools
Click on the heading above.
References
- https://dmp.cdlib.org
- http://www.dcc.ac.uk/how-discover-requirements
- http://dmponline.dcc.ac.uk
- http://researchdataq.org
- 2016 - OCLC - MAKING RESEARCH DATA MANAGEMENT SUSTAINABLE
- 2015 - NISO - A PRIMER FOR RESEARCH DATA MANAGEMENT
- 2014 - Lewis - University of Sheffield - Research Data Management Infrastructure - Options
- 2014 - DCC - Data Management Plan Checklist V 4.0 - Flyer
- 2013 - DCC - Data Management Plan Checklist V 4.0
CKAN4RDM Discussion
- http://lists.okfn.org/mailman/listinfo/ckan4rdm
- http://ckan.org/2013/09/25/edawax
- http://eprints.lincoln.ac.uk/9778/1/CKANEvaluation.pdf
- http://ckan.org/2013/11/28/ckan4rdm-st-andrews
- http://research-computing.wp.st-andrews.ac.uk/2013/11/27/using-ckan-for-research-data-management
- http://ckan.org/2013/02/27/ckan-for-research-data-management-workshop
- http://orbital.blogs.lincoln.ac.uk/2013/02/27/ckan-for-rdm-workshop
Catalogs
South African
- http://www.data.gov.za
- http://sada.nrf.ac.za
- http://www.statssa.gov.za or http://beta2.statssa.gov.za or http://www.sastat.org.za
- http://www.afdb.org/en/knowledge/statistics/open-data-for-africa/
- http://southafrica.opendataforafrica.org
- http://data.worldbank.org/country/south-africa
- http://www.code4sa.org
- https://www.datafirst.uct.ac.za
African
Australian
European
American
International
- http://www.re3data.org
- http://databib.org
- http://data.okfn.org
- http://datahub.io
- http://openspending.org
- http://data.worldbank.org
- http://project.opendatamonitor.eu
- http://opendatainception.io
- http://www.opendatanetwork.com
Open Government Sites
- United Nations [1](http://data.un.org/)
- U.S. [2](http://www.data.gov/)
- [List of cities/states with open data](http://simplystatistics.org/2012/01/02/list-of-cities-states-with-open-data-help-me-find/)
- United Kingdom [3](http://data.gov.uk/)
- France [4](http://www.data.gouv.fr/)
- Ghana [5](http://data.gov.gh/)
- Australia [6](http://data.gov.au/)
- Germany [7](https://www.govdata.de/)
- Hong Kong [8](http://www.gov.hk/en/theme/psi/datasets/)
- Japan [9](http://www.data.go.jp/)
- Many more [10](http://www.data.gov/opendatasites)
Infrastructure
- http://opendataplatform.org
- http://www.opencompute.org
- http://wiki.lib.sun.ac.za/index.php/OpenStack
- http://blog.backblaze.com/2013/02/20/180tb-of-good-vibrations-storage-pod-3-0
- http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets
- http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage
List of data repositories
Books
Organisations
- https://www.odpi.org
- http://www.opendatafoundation.org
- http://www.codata.org
- http://nedicc.com
- http://www.dirisa.ac.za
- http://www.icsu-wds.org
- http://theodi.org
- http://www.opendataresearch.org
- http://opendataday.org
- http://www.archiveteam.org
- https://rd-alliance.org
- http://www.nationaldataservice.org
- http://cardio.dcc.ac.uk
- http://www.opengovpartnership.org
- https://researchdata.ands.org.au
- http://www.odbms.org
- http://learn-rdm.eu
- http://www.od4d.net
Training
- https://www.coursera.org/learn/data-management
- https://class.coursera.org/datavisualization-001
- http://learn-rdm.eu
- http://www.dataversity.net
- http://datascienceacademy.com
- http://altbibl.io/dst4l
- http://datasciencemasters.org
- http://datacarpentry.github.io
- https://www.datacamp.com
- http://schoolofdata.org
- http://youtu.be/q2aiDJzJPuw
- http://opendata.stackexchange.com
Conferences
- http://csvconf.com
- http://www.africaopendata.net
- http://openvisconf.com
- http://www.scidatacon2014.org
Data Archive Software
- General Data Management (GDM) - http://ckan.org - http://docs.ckan.org/en/latest - http://www.ohloh.net/p/ckan
- Research Data Management (RDM) - http://thedata.org - http://thedata.harvard.edu/guides/genindex.html - http://www.ohloh.net/p/dvn
- http://irods.org
- http://datadryad.org
- http://drupal.org/project/dkan
- http://openglam.org
- http://opendatakit.org
- http://www.ihsn.org/home/software/nada
- http://datalift.org
- http://dataconservancy.org
- http://www.project-redcap.org
Data Munging/Wrangling/Normalisation Software
See: https://en.wikipedia.org/wiki/Data_wrangling for a definition
Data Visualisation Software
- http://lisacharlotterost.github.io/2016/05/17/one-chart-tools
- 2015 - Infoactive - Data + Design - A simple introduction to preparing and visualizing information
- http://www.coolinfographics.com/tools
- http://datavisualization.ch/tools/selected-tools
- http://www.creativebloq.com/design-tools/data-visualization-712402
- http://thenextweb.com/dd/2015/04/21/the-14-best-data-visualization-tools
- http://opensource.com/life/15/6/eight-open-source-data-visualization-tools
- https://en.wikipedia.org/wiki/KNIME
- http://gephi.github.io
- http://bokeh.pydata.org
- http://www.htmlwidgets.org
- http://www.dbvis.com
- http://selection.datavisualization.ch
- http://oicweave.org
- http://scikit-learn.org
- http://flowingdata.com
- http://www.infosthetics.com
- http://www.informationisbeautiful.net
- http://hint.fm
- http://www-958.ibm.com/software/analytics/manyeyes
- http://misoproject.com
- http://bl.ocksplorer.org
- https://keen.io
- http://www.cytoscape.org
- http://www.eclipse.org/birt
- http://www.spagobi.org
- http://www.pentaho.com
- http://codap.concord.org
- http://timeline.knightlab.com
- https://github.com/datawrapper/datawrapper
- https://github.com/nnnick/Chart.js
- https://github.com/mikesall/charted
- https://github.com/mbostock/d3
- https://github.com/densitydesign/raw
- http://dygraphs.com
- http://leafletjs.com
- https://developers.google.com/chart
- https://public.tableau.com
Rankings
Integration
- http://odin-project.eu
- http://projects.iq.harvard.edu/ojs-dvn/book/faq-ojs-dataverse-integration-project
- http://pkp.sfu.ca/ojs/docs/userguide/2.3.3/authorSubmission4.html
Service Providers
Research
- http://www.opendataresearch.org
- http://datascience.codata.org
- https://unlockingresearch.blog.lib.cam.ac.uk/?p=285
- http://jlsc-pub.org/10/volume/3/issue/2
- http://researchdataq.org
- http://okfnlabs.org
- http://schoolofdata.org
- http://www.gigasciencejournal.com
Bibliographies
Analysis
News
- http://www.internationaldataweek.org
- http://www.opendatascience.com
- http://blog.ouseful.info/2015/08/11/fragments-scraping-tabula-data-from-pdfs/
- http://www.computerworld.com/article/2902920/the-data-science-ecosystem-part-2-data-wrangling.html
- http://fumiopen.blogspot.com/2014/09/guides-to-publishing-open-data.html
- http://royalsociety.org/policy/projects/science-public-enterprise/report
- http://duraspace.org/five-flavors-open-access-duraspace-research-data
- http://www.itbusinessedge.com/blogs/integration/seven-resources-for-learning-about-open-data.html
- http://radar.oreilly.com/tag/data-economy