Difference between revisions of "SUNScholar/Repository Website Metrics"
(→Robots) |
|||
| Line 69: | Line 69: | ||
==Robots== | ==Robots== | ||
| − | See | + | See below for an example '''robots.txt''' file. |
<pre> | <pre> | ||
User-agent: * | User-agent: * | ||
| Line 98: | Line 98: | ||
# Disallow: /search | # Disallow: /search | ||
</pre> | </pre> | ||
| + | |||
==Directories== | ==Directories== | ||
*http://repositories.webometrics.info | *http://repositories.webometrics.info | ||
Revision as of 09:16, 15 February 2013
Register with the major harvesters
Suggestions from the webometrics ranking editors
See: http://repositories.webometrics.info/en/Best_Practices
All the scientific production, formal and informal, draft or definitive, published or unpublished, should be available from a unique web site. The institutional repository is a very important asset of the institution as a whole, not only of the library. We recommend the following syntax for the institutional repository web address:
http://repository.university.country
- It is very important to avoid changing the institutional domain as it can generate confusion and it has a devastating effect on the visibility values.
- Avoid cumbersome navigation menus based on Flash, Java or JavaScript that can block the robot access.
- For scientists it is important that the link to the full text would be easily citable.
- Therefore Very Long URLs should be avoided in all situations.
Good examples of repositories with friendly persistent URL's as per webometrics best practices
- http://scholar.sun.ac.za
- http://repository.up.ac.za
- http://repository.uwc.ac.za
- http://repository.unam.na
- http://uir.unisa.ac.za
- http://ir.dut.ac.za
- http://ir.polytechnic.edu.na
- http://dar.aucegypt.edu
- http://www.ubrisa.ub.bw
DSpace Google Setup
Google Scholar
- https://wiki.duraspace.org/display/DSPACE/Ensuring+your+instance+is+indexed
- http://roar.eprints.org/help/google_scholar.html
- http://scholar.google.com/intl/en/scholar/inclusion.html
- https://wiki.duraspace.org/display/DSDOC18/Google+Scholar+Metadata+Mappings
- http://web.lib.sun.ac.za/dspace/docs/1.7.2/Google%20Scholar%20Metadata%20Mappings.html
Google Analytics
Open your main Dspace config file and look for the xmlui.google.analytics.key setting. Enter your google analytics key.
Rebuild the DSpace webapps using the custom rebuild script.
Google Sitemap
First, edit the DSpace config file and setup sitemaps as follows.
#### Sitemap settings ##### # the directory where the generated sitemaps are stored sitemap.dir = http://scholar.sun.ac.za/sitemaps # # Comma-separated list of search engine URLs to 'ping' when a new Sitemap has # been created. Include everything except the Sitemap URL itself (which will # be URL-encoded and appended to form the actual URL 'pinged'). # #sitemap.engineurls = http://www.google.com/webmasters/sitemaps/ping?sitemap= # Add this to the above parameter if you have an application ID with Yahoo # (Replace REPLACE_ME with your application ID) # http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=REPLACE_ME&url= # # No known Sitemap 'ping' URL for MSN/Live search
Once you've enabled your sitemaps, they will be accessible at the following URLs:
- HTML Sitemaps: [dspace.url]/htmlmap
- Google (XML) Sitemaps: [dspace.url]/sitemap
For example you can view SUNScholar maps by clicking on the links below.
http://scholar.sun.ac.za/htmlmap
http://scholar.sun.ac.za/sitemap
Robots
See below for an example robots.txt file.
User-agent: * # Disable access to Discovery search and filters Disallow: /discover Disallow: /search-filter # This should be the FULL URL to your HTML Sitemap. # Make sure to replace "[dspace.url]" with the value of your 'dspace.url' setting in your dspace.cfg file. Sitemap: http://[dspace.url]/htmlmap # If you have configured DSpace (Solr-based) Statistics to be publicly accessible, # then you likely do not want this content to be indexed # Disallow: /displaystats # Uncomment the following line ONLY if sitemaps.org or HTML sitemaps are used # and you have verified that your site is being indexed correctly. # Disallow: /browse # You also may wish to disallow access to the following paths, in order # to stop web spiders from accessing user-based content: # Disallow: /advanced-search # Disallow: /contact # Disallow: /feedback # Disallow: /forgot # Disallow: /login # Disallow: /register # Disallow: /search
Directories
- http://repositories.webometrics.info
- http://roar.eprints.org
- http://www.arwu.org
- http://www.webometrics.info
- http://en.wikipedia.org/wiki/College_and_university_rankings
- http://www.topuniversities.com/world-university-rankings
References
- https://wiki.duraspace.org/pages/viewpage.action?pageId=34642415
- https://wiki.duraspace.org/display/DSDOC18/Configuration#Configuration-SitemapSettings
- https://wiki.duraspace.org/display/DSDOC17/Configuration#Configuration-SitemapSettings
- http://www.dspace.org/1_6_2Documentation/ch05.html#N142ED
- http://www.dspace.org/1_5_2Documentation/ch03.html#N10B41
Back to Web Analytics