SUNScholar/SOLR Statistics
Contents
Apache SOLR with DSpace 1.6.2 and upwards
Enable the SOLR webapp
To achieve this become the root user by typing as follows:
sudo -i
Now we create the shortcut to the SOLR webapp in the default Tomcat webapps folder by typing as follows:
cd /var/lib/tomcat6/webapps
ln -s /home/dspace/webapps/solr
Setup SOLR in DSpace config file
The DSpace should look something like the following:
#---------------------------------------------------------------#
#--------------SOLR STATISTICS CONFIGURATIONS-------------------#
#---------------------------------------------------------------#
# These configs are only used by the SOLR interface/webapp to #
# track usage statistics. #
#---------------------------------------------------------------#
##### Usage Logging #####
# set this to be the port you run the dspace "solr" webapp
# on, by default, we are assuming a test configuration with
# tomcat still running on port 8080
solr.log.server = http://localhost/solr/statistics
# The location for the Geo Database retrieved on update/installation
solr.dbfile = /home/dspace/config/GeoLiteCity.dat
# If enabled the statistics system will look for an X-Forward header
# if it finds it, it will use this for the user IP Addrress
# it is enabled by default
useProxies = true
statistics.items.dc.1=dc.identifier
statistics.items.dc.2=dc.date.accessioned
statistics.items.type.1=dcinput
statistics.items.type.2=date
statistics.default.start.datepick = 01/01/1977
solr.spidersfile = /home/dspace/config/spiders.txt
# Control if the statistics pages should be only shown to authorized users
# If enabled, only the administrators for the DSpaceObject will be able to
# view the statistics.
# If disabled, anyone with READ permissions on the DSpaceObject will be able
# to view the statistics.
statistics.item.authorization.admin = false
# control solr statistics querying to filter out spider IPs
# false by default
solr.statistics.query.filter.spiderIp = false
# control solr statistics querying to look at "isBot" field to determine
# if record is a bot. true by default.
solr.statistics.query.filter.isBot = false
# URLs to download IP addresses of search engine spiders from
solr.spiderips.urls = http://iplists.com/google.txt, \
http://iplists.com/inktomi.txt, \
http://iplists.com/lycos.txt, \
http://iplists.com/infoseek.txt, \
http://iplists.com/altavista.txt, \
http://iplists.com/excite.txt, \
http://iplists.com/misc.txt, \
http://iplists.com/non_engines.txt
Now restart the server and continue with the following.
Test the local SOLR webapp
Install the "lynx" application.
Then type the following in a console:
lynx http://localhost/solr
You should get the following:
If not debug the solr webapp until you can, then complete the following.
Convert the old log files
I prepared the following bash script to do the conversion.
#!/bin/sh cd /home/dspace/log ITEM=`ls dspace.log.*` #echo $ITEM for i in $ITEM ; do /home/dspace/bin/dspace stats-log-converter -i $i -o $i.solr done
After running the script you will have a lot of log files with the .solr extension. This takes quite a while with a lot of log files. Be patient.
Import the converted solr log files
I prepared the following bash script to do the import.
#!/bin/sh cd /home/dspace/log ITEM=`ls *.solr` #echo $ITEM for i in $ITEM ; do echo "###################################" echo "Importing stats for log file:... $i" /home/dspace/bin/dspace stats-log-importer -i $i done
This takes quite a while with a lot of log files. Be patient.
Statisitcs for DSpace version 1.5.2 and lower
General Reports
Make sure that you run the stats programs regularly.
@daily /home/dspace/bin/stat-general @daily /home/dspace/bin/stat-report-general @monthly /home/dspace/bin/stat-monthly @monthly /home/dspace/bin/stat-report-monthly
The above must be added to the crontab for the "dspace" user.
Allow normal users to browse the statistics
Edit the following in the DSpace config file.
###### Statistical Report Configuration Settings ######
# should the stats be publicly available? should be set to false if you only
# want administrators to access the stats, or you do not intend to generate
# any
report.public = true
# directory where live reports are stored
report.dir = ${dspace.dir}/reports
Help
Back to IR Help
