Difference between revisions of "SUNScholar/SOLR Statistics"
| Line 2: | Line 2: | ||
==Apache [http://lucene.apache.org/solr/ SOLR] with DSpace => 1.6.2== | ==Apache [http://lucene.apache.org/solr/ SOLR] with DSpace => 1.6.2== | ||
| + | Individual item statistics using Apache SOLR was introduced with DSpace version 1.6.2. Below are instructions to setup SOLR statistics for DSpace version 1.6.2 and 1.7.2 | ||
==Enable the SOLR webapp== | ==Enable the SOLR webapp== | ||
To achieve this become the root user by typing as follows: | To achieve this become the root user by typing as follows: | ||
Revision as of 13:17, 20 August 2012
This procedure assumes that you have used the three step process to install DSpace.
Contents
Apache SOLR with DSpace => 1.6.2
Individual item statistics using Apache SOLR was introduced with DSpace version 1.6.2. Below are instructions to setup SOLR statistics for DSpace version 1.6.2 and 1.7.2
Enable the SOLR webapp
To achieve this become the root user by typing as follows:
sudo -i
Now we create the shortcut to the SOLR webapp in the default Tomcat webapps folder by typing as follows:
cd /var/lib/tomcat6/webapps
ln -s /home/dspace/webapps/solr
Setup SOLR server
Update SOLR statistics config
Edit the DSpace config file by typing the following.
nano /home/dspace/dspace-1.7.2-src-release/dspace/config/dspace.cfg
Go to the bottom of the file and check the settings, see example below.
#---------------------------------------------------------------#
#--------------SOLR STATISTICS CONFIGURATIONS-------------------#
#---------------------------------------------------------------#
# These configs are only used by the SOLR interface/webapp to #
# track usage statistics. #
#---------------------------------------------------------------#
##### Usage Logging #####
# set this to be the port you run the dspace "solr" webapp
# on, by default, we are assuming a test configuration with
# tomcat still running on port 8080
solr.log.server = http://localhost/solr/statistics
# The location for the Geo Database retrieved on update/installation
solr.dbfile = ${dspace.dir}/config/GeoLiteCity.dat
# Timeout for the resolver in the dns lookup
# Time in milliseconds, defaults to 200 for backward compatibility
# Your systems default is usually set in /etc/resolv.conf and varies
# between 2 to 5 seconds, to high a value might result in solr exhausting
# your connection pool
solr.resolver.timeout = 200
# Control if the statistics pages should be only shown to authorized users
# If enabled, only the administrators for the DSpaceObject will be able to
# view the statistics.
# If disabled, anyone with READ permissions on the DSpaceObject will be able
# to view the statistics.
statistics.item.authorization.admin=false
# Enable/disable logging of spiders in solr statistics.
# If false, and IP matches an address in solr.spiderips.urls, event is not logged.
# If true, event will be logged with the 'isBot' field set to true
# (see solr.statistics.query.filter.* for query filter options)
# Default value is true.
#solr.statistics.logBots = true
# control solr statistics querying to filter out spider IPs
# false by default
#solr.statistics.query.filter.spiderIp = false
# control solr statistics querying to look at "isBot" field to determine
# if record is a bot. true by default.
#solr.statistics.query.filter.isBot = true
# URLs to download IP addresses of search engine spiders from
solr.spiderips.urls = http://iplists.com/google.txt, \
http://iplists.com/inktomi.txt, \
http://iplists.com/lycos.txt, \
http://iplists.com/infoseek.txt, \
http://iplists.com/altavista.txt, \
http://iplists.com/excite.txt, \
http://iplists.com/misc.txt, \
http://iplists.com/non_engines.txt
Update SOLR search config file
Edit the SOLR searchconfig file by typing the following.
nano /home/dspace/dspace-1.7.2-src-release/dspace/config/dspace-solr-search.cfg
##### Search Indexing #####
solr.search.server = http://localhost/solr/search
# Should no solr facet be configured for a certain page, this one will be used as default
#Every solr facet field which ends with _dt will be handled as a date
#Handeling as date implies that {field.name}.year will be used for faceting
solr.facets.search=dc.contributor.author,dc.subject,dc.date.issued_dt
solr.facets.site=dc.contributor.author,dc.subject,dc.date.issued_dt
solr.facets.community=dc.contributor.author,dc.subject,dc.date.issued_dt
solr.facets.collection=dc.contributor.author,dc.subject,dc.date.issued_dt
# solr.facets.item=dc.contributor.author,dc.subject,dc.date.issued_dt
# Put any default search filters here, these filters will be applied to any search in discovery
# You can specify multiple filters by separating them using ,
##Default filters are used for every search in discovery, including the separate scope filters below
#solr.default.filterQuery=location:l2
# You can also specify (additional) filter(s)
## for homepage recent submissions
#solr.site.default.filterQuery=
## for community recent submissions
#solr.community.default.filterQuery=
## for collection recent submissions
#solr.collection.default.filterQuery=
## for searches
#solr.search.default.filterQuery=
## for browsing (not used at the moment)
#solr.browse.default.filterQuery=
# The filters which can be selected in the search form
solr.search.filters=dc.title, dc.contributor.author, dc.subject, dc.date.issued.year
# Indexed fields which can sorted on in our search
solr.search.sort=dc.title, dc.date.issued_dt
#Defines whichs fields are indexed as dates
#please be aware that for each date field an _dt will be suffixed so that dc.date.issued will become dc.date.issued_dt
#For each date indexed the year will also be stored separately in a {field.name}.year so it can be used for date faceting
solr.index.type.date=dc.date,dc.date.*
#Recent submission size
solr.recent-submissions.size=5
#The indexed field on which we sort so we can determine which items where recently submitted
recent.submissions.sort-option=dc.date.accessioned_dt
#Use the property below to limit the number of facet filters in the side of the search page
#search.facet.max=10
Rebuild DSpace to use the SOLR webapp
Now rebuild your webapps and continue with the following.
Finally we give SOLR all permissions
chmod -R 0777 /home/dspace/webapps/solr
Test the local SOLR webapp
Install the "lynx" application.
sudo apt-get install lynx
Then type the following in the console:
lynx http://localhost/solr
You should get the following:
If not debug the solr webapp until you can, then complete the following.
Convert the old log files
I prepared the following bash script to do the conversion.
#!/bin/sh cd /home/dspace/log ITEM=`ls dspace.log.*` #echo $ITEM for i in $ITEM ; do /home/dspace/bin/dspace stats-log-converter -i $i -o $i.solr done
After running the script you will have a lot of log files with the .solr extension. This takes quite a while with a lot of log files. Be patient.
Import the converted solr log files
I prepared the following bash script to do the import.
#!/bin/sh cd /home/dspace/log ITEM=`ls *.solr` #echo $ITEM for i in $ITEM ; do echo "###################################" echo "Importing stats for log file:... $i" /home/dspace/bin/dspace stats-log-importer -i $i done
This takes quite a while with a lot of log files. Be patient.
Statistics for DSpace version 1.5.2 and lower
General Reports
Make sure that you run the stats programs regularly.
@daily /home/dspace/bin/stat-general @daily /home/dspace/bin/stat-report-general @monthly /home/dspace/bin/stat-monthly @monthly /home/dspace/bin/stat-report-monthly
The above must be added to the crontab for the "dspace" user.
Allow normal users to browse the statistics
Edit the following in the DSpace config file.
###### Statistical Report Configuration Settings ######
# should the stats be publicly available? should be set to false if you only
# want administrators to access the stats, or you do not intend to generate
# any
report.public = true
# directory where live reports are stored
report.dir = ${dspace.dir}/reports
Help
- http://web.lib.sun.ac.za/dspace/docs/1.7.2/DSpace%20Statistics.html
- https://wiki.duraspace.org/display/DSDOC17/DSpace+Statistics
- http://www.dspace.org/1_6_2Documentation/ch03.html#N10CF2
Back to IR Help
