Difference between revisions of "SUNScholar/SOLR Statistics"

From Libopedia
Jump to navigation Jump to search
 
m (1 revision)
(No difference)

Revision as of 10:54, 15 August 2010

General Reports

Make sure that you run the stats programs regularly.

@daily		/home/dspace/bin/stat-general
@daily		/home/dspace/bin/stat-report-general
@monthly	/home/dspace/bin/stat-monthly
@monthly	/home/dspace/bin/stat-report-monthly

The above must be added to the crontab for the "dspace" user.

Apache SOLR with DSpace 1.6.2

Add config for Tomcat

To enable the SOLR java webapp add the following to the tomcat server file in /etc/tomcat55/server.xml.

	<Context path="/solr" docBase="/home/dspace/webapps/solr" debug="0"
	        reloadable="true" cachingAllowed="false"
		        allowLinking="true"/>

Add the following to the Apache jakarta config files in /etc/apache2/conf.d/tomcat.conf and /etc/apache2/conf.d/tomcat-ssl.conf

See: http://ir.sun.ac.za/wiki/index.php/Prepare_Ubuntu#Step_6._Setup_Apache2_Tomcat_5.5_Jakarta_Connector for more info.

JkMount /solr localhost
JkMount /solr/* localhost

Restart Apache and then Tomcat.

Test the local SOLR webapp

Install the "lynx" application.

Then type the following in a console:

lynx http://localhost/solr

You should get the following:

Lynx-solr.png

If not debug the solr webapp until you can, then complete the following.

Check DSpace config

The DSpace should look something like the following:

#---------------------------------------------------------------#
#--------------SOLR STATISTICS CONFIGURATIONS-------------------#
#---------------------------------------------------------------#
# These configs are only used by the SOLR interface/webapp to   #
# track usage statistics.                                       #
#---------------------------------------------------------------#

##### Usage Logging #####
# set this to be the port you run the dspace "solr" webapp
# on, by default, we are assuming a test configuration with
# tomcat still running on port 8080
solr.log.server = http://localhost/solr/statistics

# The location for the Geo Database retrieved on update/installation
solr.dbfile = /home/dspace/config/GeoLiteCity.dat

# If enabled the statistics system will look for an X-Forward header
# if it finds it, it will use this for the user IP Addrress
# it is enabled by default
useProxies = true

statistics.items.dc.1=dc.identifier
statistics.items.dc.2=dc.date.accessioned
statistics.items.type.1=dcinput
statistics.items.type.2=date
statistics.default.start.datepick = 01/01/1977

solr.spidersfile = /home/dspace/config/spiders.txt

# Control if the statistics pages should be only shown to authorized users
# If enabled, only the administrators for the DSpaceObject will be able to
# view the statistics.
# If disabled, anyone with READ permissions on the DSpaceObject will be able
# to view the statistics.
statistics.item.authorization.admin = false

# control solr statistics querying to filter out spider IPs
# false by default
solr.statistics.query.filter.spiderIp = false

# control solr statistics querying to look at "isBot" field to determine
# if record is a bot. true by default.
solr.statistics.query.filter.isBot = false

# URLs to download IP addresses of search engine spiders from
solr.spiderips.urls = http://iplists.com/google.txt, \
                      http://iplists.com/inktomi.txt, \
                      http://iplists.com/lycos.txt, \
                      http://iplists.com/infoseek.txt, \
                      http://iplists.com/altavista.txt, \
                      http://iplists.com/excite.txt, \
                      http://iplists.com/misc.txt, \
                      http://iplists.com/non_engines.txt

Convert the old log files

I prepared the following bash script to do the conversion.

#!/bin/sh

cd /home/dspace/log
ITEM=`ls dspace.log.*`
#echo $ITEM

for i in $ITEM ; do
	/home/dspace/bin/dspace stats-log-converter -i $i -o $i.solr
done

After running the script you will have a lot of log files with the .solr extension. This takes quite a while with a lot of log files. Be patient.

Import the converted solr log files

I prepared the following bash script to do the import.

#!/bin/sh

cd /home/dspace/log
ITEM=`ls *.solr`
#echo $ITEM

for i in $ITEM ; do
	echo "###################################"
	echo "Importing stats for log file:... $i"
	/home/dspace/bin/dspace stats-log-importer -i $i
done

This takes quite a while with a lot of log files. Be patient.

Allow normal users to browse the statistics

Edit the following in the DSpace config file.

###### Statistical Report Configuration Settings ######

# should the stats be publicly available?  should be set to false if you only
# want administrators to access the stats, or you do not intend to generate
# any
report.public = true

# directory where live reports are stored
report.dir = ${dspace.dir}/reports

Conclusion

Rebuild DSpace and restart Tomcat. Now when you login to DSpace you should see statistics per collection and item. Make sure you have the correct campus firewall rules to be able to get solr working properly.