Open Data
Back to Open Campus
Contents
- 1 Introduction
- 2 Definitions
- 3 The FAIR principles
- 4 What is Open Data?
- 5 Why Open Data?
- 6 What is research data?
- 7 What is big data?
- 8 What are the major differences between research data (RD) and big data (BD)?
- 9 Possible Open Research Data Archiving Implementation At Stellenbosch University
- 10 Electronic Laboratory Notebook (ELN)
- 11 Research Data Management Plans (RDMP)
- 12 Catalogs
- 13 Infrastructure
- 14 List of data repositories
- 15 Books
- 16 Organisations
- 17 Training
- 18 Conferences
- 19 Data Archive Software
- 20 Data Munging/Wrangling/Normalisation Software
- 21 Data Visualisation Software
- 22 Rankings
- 23 Integration
- 24 Service Providers
- 25 Research
- 26 Analysis
- 27 News
- 28 Graphics
Introduction
Science must be reproducible, therefore archiving of digital experimental data and methods is critically important.
Definitions
- https://en.wikipedia.org/wiki/Open_data
"Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control." Retrieved: 2016/05/09 - https://en.wikipedia.org/wiki/Big_data
"Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate." Retrieved: 2016/05/09 - https://en.wikipedia.org/wiki/Research_data_archiving
"Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences." Retrieved: 2016/05/09
The FAIR principles
What is Open Data?
- http://theodi.org/guides/what-open-data
- http://opendefinition.org
- http://en.wikipedia.org/wiki/Open_data
- http://www.socrata.com/open-data-field-guide
- http://libguides.wits.ac.za/digitisation_preservation_and_digitalcuration
- Tim Berners-Lee
- The next Web of open, linked data
The Open Data Charter
Standards
- http://www.infodocket.com/2015/12/29/accessingusing-data-on-the-web-new-uses-cases-best-practices-and-recommendations-published-by-w3c
- http://www.ddialliance.org
- http://opendatahandbook.org
- http://isacommons.org
- http://www.isa-tools.org
- http://opendatacommons.org
Metadata
- http://rd-alliance.github.io/metadata-directory
- http://dublincore.org/groups/sam
- http://wiki.dublincore.org/index.php/DCMI_Science_And_Metadata
Why Open Data?
Open data encourages a "knowledge based consensus" using the scientific method.
Watch the video below for an example:
What is research data?
Watch the following videos for an introduction:
https://youtu.be/RVZbk3GEVSw - Data Sharing, Part 1 of 3 - Request https://youtu.be/RtSv0gSbCP8 - Data Sharing, Part 2 of 3 - File Formats https://youtu.be/-MIH8PkuUo4 - Data Sharing, Part 3 of 3 - Documentation https://youtu.be/nNBiCcBlwRA - How to avoid a data management nightmare https://youtu.be/q2aiDJzJPuw - An introduction to the basics of Research Data https://vimeo.com/156313024 - Establishing a shared research data service in the UK
Also see: http://www.dcc.ac.uk/resources/briefing-papers/five-steps-research-data-readiness
Panton Principals For Research Data
- http://en.wikipedia.org/wiki/Panton_Principles
- http://pantonprinciples.org
- http://en.wikipedia.org/wiki/Peter_Murray-Rust
- http://en.wikipedia.org/wiki/Cameron_Neylon
- http://en.wikipedia.org/wiki/Rufus_Pollock
- http://en.wikipedia.org/wiki/John_Wilbanks
Digital Object Identifiers (DOI)
See: http://wiki.lib.sun.ac.za/index.php/SUNScholar/Digital_Object_Identifier/DOI
What is big data?
- http://en.wikipedia.org/wiki/Big_data
- http://en.wikipedia.org/wiki/Extract,_transform,_load
- http://www.sas.com/en_us/insights/big-data/what-is-big-data.html
- http://radar.oreilly.com/2012/01/what-is-big-data.html
- http://www.mongodb.com/big-data-explained
- http://bigdatauniversity.com
- http://www.bigdata.amadeus.com
- http://www.ft.com/intl/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz3gycOps5c
- http://www.lynda.com/Hadoop-tutorials/Techniques-Concepts-Big-Data/158656-2.html
What are the major differences between research data (RD) and big data (BD)?
- BD datasets are huge => 1 Terabyte (TB), RD datasets are much smaller <= 1 Gigabyte (GB).
- BD is unstructured and uses a NOSQL database such as MongoDB or OrientDB.
RD is very structured and uses a SQL database such as PostgreSQL. - BD has a many sources, RD has usually one source.
- BD is collected in real time, RD is collected after analysis.
Basically BD tools are used to "surface" patterns from huge datasets, usually in real time, and make predictions, whereas RD is used to store the results of BD analysis.
BD is part of current research and is usually a new service delivered by the research office in collaboration with the central IT department.
Whereas RD is the research output of the analysis of BD, and is normally archived by the library in collaboration with the research office and central IT.
Possible Open Research Data Archiving Implementation At Stellenbosch University
Data Implementation Diagram Notes
- At the top is the research data catalog/portal/user help web site... http://data.sun.ac.za using Mediawiki with the Semantic Forms extension installed.
- 1st from the left is general research data... http://open.data.sun.ac.za using: http://ckan.org
- 2nd from the left is survey data from academic publications... http://pubs.data.sun.ac.za using: http://dataverse.org and https://pkp.sfu.ca/dataverse-network-plugin-release
- And finally to the right is GIS specific research data... http://maps.data.sun.ac.za using: http://geonode.org
For biomedical data see: https://galaxyproject.org
Software Analysis
For further evaluation criteria, see: http://wiki.lib.sun.ac.za/index.php/List_of_Repository_Software
CKAN
Very good installation documentation on Ubuntu 12.04 LTS. Good customisation documentation. Operations documentation not available.
Installation
Customisation
Operations
Not available
Dataverse
No installation guide for Ubuntu 12.04 LTS. Good customisation and operational documentation.
Installation
- http://guides.dataverse.org/en/latest/installation/index.html
- http://guides.dataverse.org/en/latest/developers/ubuntu.html
Customisation
Operations
Geonode
Very good installation documentation for Ubuntu 12.04 LTS. Customisation and operations documentation available but are confusing.
Installation
Customisation
Operations
More Information On Other Open Data Systems
http://wiki.lib.sun.ac.za/index.php/OpenGIS http://wiki.lib.sun.ac.za/index.php/OpenSurvey http://wiki.lib.sun.ac.za/index.php/OpenBiology
Electronic Laboratory Notebook (ELN)
- https://en.wikipedia.org/wiki/Electronic_lab_notebook
- http://www.elabftw.net
- http://www.limswiki.org/index.php/ELN_vendor
- http://mylabbook.org
- http://lablog.sourceforge.net
- https://www.google.com/keep
Research Data Management Plans (RDMP)
Introduction
Watch the following videos:
https://youtu.be/VhSfw5o1dUo - John Scally: Research Data Management in the Library https://youtu.be/gYDb-GP1CA4 - The what, why and how of data management planning
References
- https://dmp.cdlib.org
- http://www.dcc.ac.uk/how-discover-requirements
- http://dmponline.dcc.ac.uk
- http://researchdataq.org
- 2016 - OCLC - MAKING RESEARCH DATA MANAGEMENT SUSTAINABLE
- 2015 - NISO - A PRIMER FOR RESEARCH DATA MANAGEMENT
- 2014 - Lewis - University of Sheffield - Research Data Management Infrastructure - Options
- 2014 - DCC - Data Management Plan Checklist V 4.0 - Flyer
- 2013 - DCC - Data Management Plan Checklist V 4.0
CKAN4RDM Discussion
- http://lists.okfn.org/mailman/listinfo/ckan4rdm
- http://ckan.org/2013/09/25/edawax
- http://eprints.lincoln.ac.uk/9778/1/CKANEvaluation.pdf
- http://ckan.org/2013/11/28/ckan4rdm-st-andrews
- http://research-computing.wp.st-andrews.ac.uk/2013/11/27/using-ckan-for-research-data-management
- http://ckan.org/2013/02/27/ckan-for-research-data-management-workshop
- http://orbital.blogs.lincoln.ac.uk/2013/02/27/ckan-for-rdm-workshop
Catalogs
South African
- http://www.data.gov.za
- http://www.code4sa.org
- https://www.datafirst.uct.ac.za
- http://sada.nrf.ac.za
- http://www.statssa.gov.za or http://beta2.statssa.gov.za or http://www.sastat.org.za
- http://www.afdb.org/en/knowledge/statistics/open-data-for-africa/
- http://southafrica.opendataforafrica.org
- http://data.worldbank.org/country/south-africa
African
Australian
European
American
International
- http://www.re3data.org
- http://databib.org
- http://data.okfn.org
- http://datahub.io
- http://openspending.org
- http://data.worldbank.org
- http://project.opendatamonitor.eu
- http://opendatainception.io
- http://www.opendatanetwork.com
Open Government Sites
- United Nations [1](http://data.un.org/)
- U.S. [2](http://www.data.gov/)
- [List of cities/states with open data](http://simplystatistics.org/2012/01/02/list-of-cities-states-with-open-data-help-me-find/)
- United Kingdom [3](http://data.gov.uk/)
- France [4](http://www.data.gouv.fr/)
- Ghana [5](http://data.gov.gh/)
- Australia [6](http://data.gov.au/)
- Germany [7](https://www.govdata.de/)
- Hong Kong [8](http://www.gov.hk/en/theme/psi/datasets/)
- Japan [9](http://www.data.go.jp/)
- Many more [10](http://www.data.gov/opendatasites)
Infrastructure
- http://opendataplatform.org
- http://www.opencompute.org
- http://wiki.lib.sun.ac.za/index.php/OpenStack
- http://blog.backblaze.com/2013/02/20/180tb-of-good-vibrations-storage-pod-3-0
- http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets
- http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage
List of data repositories
Books
Organisations
- http://www.opendatafoundation.org
- http://www.codata.org
- http://nedicc.com
- http://www.dirisa.ac.za
- http://www.icsu-wds.org
- http://theodi.org
- http://www.opendataresearch.org
- http://opendataday.org
- http://www.archiveteam.org
- https://rd-alliance.org
- http://www.nationaldataservice.org
- http://cardio.dcc.ac.uk
- http://www.opengovpartnership.org
- https://researchdata.ands.org.au
- http://www.odbms.org
- http://learn-rdm.eu
Training
- http://learn-rdm.eu
- http://www.dataversity.net
- http://datascienceacademy.com
- http://altbibl.io/dst4l
- http://datasciencemasters.org
- http://datacarpentry.github.io
- https://class.coursera.org/datavisualization-001
- https://www.datacamp.com
- http://schoolofdata.org
- http://youtu.be/q2aiDJzJPuw
- http://opendata.stackexchange.com
Conferences
- http://csvconf.com
- http://www.africaopendata.net
- http://openvisconf.com
- http://www.scidatacon2014.org
Data Archive Software
- General Data Management (GDM) - http://ckan.org - http://docs.ckan.org/en/latest - http://www.ohloh.net/p/ckan
- Research Data Management (RDM) - http://thedata.org - http://thedata.harvard.edu/guides/genindex.html - http://www.ohloh.net/p/dvn
- http://irods.org
- http://datadryad.org
- http://drupal.org/project/dkan
- http://openglam.org
- http://opendatakit.org
- http://www.ihsn.org/home/software/nada
- http://datalift.org
- http://dataconservancy.org
- http://www.project-redcap.org
Data Munging/Wrangling/Normalisation Software
See: https://en.wikipedia.org/wiki/Data_wrangling for a definition
Data Visualisation Software
- http://lisacharlotterost.github.io/2016/05/17/one-chart-tools
- 2015 - Infoactive - Data + Design - A simple introduction to preparing and visualizing information
- http://www.coolinfographics.com/tools
- http://datavisualization.ch/tools/selected-tools
- http://www.creativebloq.com/design-tools/data-visualization-712402
- http://thenextweb.com/dd/2015/04/21/the-14-best-data-visualization-tools
- http://opensource.com/life/15/6/eight-open-source-data-visualization-tools
- https://en.wikipedia.org/wiki/KNIME
- http://gephi.github.io
- http://bokeh.pydata.org
- http://www.htmlwidgets.org
- http://www.dbvis.com
- http://selection.datavisualization.ch
- http://oicweave.org
- http://scikit-learn.org
- http://flowingdata.com
- http://www.infosthetics.com
- http://www.informationisbeautiful.net
- http://hint.fm
- http://www-958.ibm.com/software/analytics/manyeyes
- http://misoproject.com
- http://bl.ocksplorer.org
- https://keen.io
- http://www.cytoscape.org
- http://www.eclipse.org/birt
- http://www.spagobi.org
- http://www.pentaho.com
- http://codap.concord.org
- http://timeline.knightlab.com
- https://github.com/datawrapper/datawrapper
- https://github.com/nnnick/Chart.js
- https://github.com/mikesall/charted
- https://github.com/mbostock/d3
- https://github.com/densitydesign/raw
- http://dygraphs.com
- http://leafletjs.com
- https://developers.google.com/chart
- https://public.tableau.com
Rankings
Integration
- http://odin-project.eu
- http://projects.iq.harvard.edu/ojs-dvn/book/faq-ojs-dataverse-integration-project
- http://pkp.sfu.ca/ojs/docs/userguide/2.3.3/authorSubmission4.html
Service Providers
Research
- http://www.opendataresearch.org
- http://datascience.codata.org
- https://unlockingresearch.blog.lib.cam.ac.uk/?p=285
- http://jlsc-pub.org/10/volume/3/issue/2
- http://researchdataq.org
- http://okfnlabs.org
- http://schoolofdata.org
- http://www.gigasciencejournal.com
Analysis
News
- http://www.opendatascience.com
- http://blog.ouseful.info/2015/08/11/fragments-scraping-tabula-data-from-pdfs/
- http://www.computerworld.com/article/2902920/the-data-science-ecosystem-part-2-data-wrangling.html
- http://fumiopen.blogspot.com/2014/09/guides-to-publishing-open-data.html
- http://royalsociety.org/policy/projects/science-public-enterprise/report
- http://duraspace.org/five-flavors-open-access-duraspace-research-data
- http://www.itbusinessedge.com/blogs/integration/seven-resources-for-learning-about-open-data.html
- http://radar.oreilly.com/tag/data-economy
