Difference between revisions of "SUNScholar/SOLR Statistics"
| Line 13: | Line 13: | ||
==Setup SOLR server== | ==Setup SOLR server== | ||
| − | ===Update SOLR statistics config | + | ===Update SOLR statistics config=== |
Edit the DSpace config file by typing the following. | Edit the DSpace config file by typing the following. | ||
nano /home/dspace/dspace-1.7.2-src-release/dspace/config/dspace.cfg | nano /home/dspace/dspace-1.7.2-src-release/dspace/config/dspace.cfg | ||
Revision as of 10:41, 6 July 2012
This procedure assumes that you have used the three step process to install DSpace.
Contents
Apache SOLR with DSpace => 1.6.2
Enable the SOLR webapp
To achieve this become the root user by typing as follows:
sudo -i
Now we create the shortcut to the SOLR webapp in the default Tomcat webapps folder by typing as follows:
cd /var/lib/tomcat6/webapps
ln -s /home/dspace/webapps/solr
Setup SOLR server
Update SOLR statistics config
Edit the DSpace config file by typing the following.
nano /home/dspace/dspace-1.7.2-src-release/dspace/config/dspace.cfg
Go to the bottom of the file and check the settings, see example below.
#---------------------------------------------------------------#
#--------------SOLR STATISTICS CONFIGURATIONS-------------------#
#---------------------------------------------------------------#
# These configs are only used by the SOLR interface/webapp to #
# track usage statistics. #
#---------------------------------------------------------------#
##### Usage Logging #####
# set this to be the port you run the dspace "solr" webapp
# on, by default, we are assuming a test configuration with
# tomcat still running on port 8080
solr.log.server = http://localhost/solr/statistics
# The location for the Geo Database retrieved on update/installation
solr.dbfile = ${dspace.dir}/config/GeoLiteCity.dat
# Timeout for the resolver in the dns lookup
# Time in milliseconds, defaults to 200 for backward compatibility
# Your systems default is usually set in /etc/resolv.conf and varies
# between 2 to 5 seconds, to high a value might result in solr exhausting
# your connection pool
solr.resolver.timeout = 200
# Control if the statistics pages should be only shown to authorized users
# If enabled, only the administrators for the DSpaceObject will be able to
# view the statistics.
# If disabled, anyone with READ permissions on the DSpaceObject will be able
# to view the statistics.
statistics.item.authorization.admin=false
# Enable/disable logging of spiders in solr statistics.
# If false, and IP matches an address in solr.spiderips.urls, event is not logged.
# If true, event will be logged with the 'isBot' field set to true
# (see solr.statistics.query.filter.* for query filter options)
# Default value is true.
#solr.statistics.logBots = true
# control solr statistics querying to filter out spider IPs
# false by default
#solr.statistics.query.filter.spiderIp = false
# control solr statistics querying to look at "isBot" field to determine
# if record is a bot. true by default.
#solr.statistics.query.filter.isBot = true
# URLs to download IP addresses of search engine spiders from
solr.spiderips.urls = http://iplists.com/google.txt, \
http://iplists.com/inktomi.txt, \
http://iplists.com/lycos.txt, \
http://iplists.com/infoseek.txt, \
http://iplists.com/altavista.txt, \
http://iplists.com/excite.txt, \
http://iplists.com/misc.txt, \
http://iplists.com/non_engines.txt
Update SOLR search config file
Edit the SOLR searchconfig file by typing the following.
nano /home/dspace/dspace-1.7.2-src-release/dspace/config/dspace-solr-search.cfg
##### Search Indexing #####
solr.search.server = http://localhost/solr/search
# Should no solr facet be configured for a certain page, this one will be used as default
#Every solr facet field which ends with _dt will be handled as a date
#Handeling as date implies that {field.name}.year will be used for faceting
solr.facets.search=dc.contributor.author,dc.subject,dc.date.issued_dt
solr.facets.site=dc.contributor.author,dc.subject,dc.date.issued_dt
solr.facets.community=dc.contributor.author,dc.subject,dc.date.issued_dt
solr.facets.collection=dc.contributor.author,dc.subject,dc.date.issued_dt
# solr.facets.item=dc.contributor.author,dc.subject,dc.date.issued_dt
# Put any default search filters here, these filters will be applied to any search in discovery
# You can specify multiple filters by separating them using ,
##Default filters are used for every search in discovery, including the separate scope filters below
#solr.default.filterQuery=location:l2
# You can also specify (additional) filter(s)
## for homepage recent submissions
#solr.site.default.filterQuery=
## for community recent submissions
#solr.community.default.filterQuery=
## for collection recent submissions
#solr.collection.default.filterQuery=
## for searches
#solr.search.default.filterQuery=
## for browsing (not used at the moment)
#solr.browse.default.filterQuery=
# The filters which can be selected in the search form
solr.search.filters=dc.title, dc.contributor.author, dc.subject, dc.date.issued.year
# Indexed fields which can sorted on in our search
solr.search.sort=dc.title, dc.date.issued_dt
#Defines whichs fields are indexed as dates
#please be aware that for each date field an _dt will be suffixed so that dc.date.issued will become dc.date.issued_dt
#For each date indexed the year will also be stored separately in a {field.name}.year so it can be used for date faceting
solr.index.type.date=dc.date,dc.date.*
#Recent submission size
solr.recent-submissions.size=5
#The indexed field on which we sort so we can determine which items where recently submitted
recent.submissions.sort-option=dc.date.accessioned_dt
#Use the property below to limit the number of facet filters in the side of the search page
#search.facet.max=10
Rebuild DSpace to use the SOLR webapp
Now rebuild your webapps and continue with the following.
Finally we give SOLR all permissions
chmod -R 0777 /home/dspace/webapps/solr
Test the local SOLR webapp
Install the "lynx" application.
sudo apt-get install lynx
Then type the following in the console:
lynx http://localhost/solr
You should get the following:
If not debug the solr webapp until you can, then complete the following.
Convert the old log files
I prepared the following bash script to do the conversion.
#!/bin/sh cd /home/dspace/log ITEM=`ls dspace.log.*` #echo $ITEM for i in $ITEM ; do /home/dspace/bin/dspace stats-log-converter -i $i -o $i.solr done
After running the script you will have a lot of log files with the .solr extension. This takes quite a while with a lot of log files. Be patient.
Import the converted solr log files
I prepared the following bash script to do the import.
#!/bin/sh cd /home/dspace/log ITEM=`ls *.solr` #echo $ITEM for i in $ITEM ; do echo "###################################" echo "Importing stats for log file:... $i" /home/dspace/bin/dspace stats-log-importer -i $i done
This takes quite a while with a lot of log files. Be patient.
Statistics for DSpace version 1.5.2 and lower
General Reports
Make sure that you run the stats programs regularly.
@daily /home/dspace/bin/stat-general @daily /home/dspace/bin/stat-report-general @monthly /home/dspace/bin/stat-monthly @monthly /home/dspace/bin/stat-report-monthly
The above must be added to the crontab for the "dspace" user.
Allow normal users to browse the statistics
Edit the following in the DSpace config file.
###### Statistical Report Configuration Settings ######
# should the stats be publicly available? should be set to false if you only
# want administrators to access the stats, or you do not intend to generate
# any
report.public = true
# directory where live reports are stored
report.dir = ${dspace.dir}/reports
Help
- http://web.lib.sun.ac.za/dspace/docs/1.7.2/DSpace%20Statistics.html
- https://wiki.duraspace.org/display/DSDOC17/DSpace+Statistics
- http://www.dspace.org/1_6_2Documentation/ch03.html#N10CF2
Back to IR Help
