Difference between revisions of "SUNScholar/SOLR Statistics"

From Libopedia
Jump to navigation Jump to search
Line 13: Line 13:
  
 
==Setup SOLR in DSpace config file==
 
==Setup SOLR in DSpace config file==
 +
===Update DSpace config file===
 
Edit the DSpace config file by typing the following.
 
Edit the DSpace config file by typing the following.
 
  nano /home/dspace/dspace-1.7.2-src-release/dspace/config/dspace.cfg
 
  nano /home/dspace/dspace-1.7.2-src-release/dspace/config/dspace.cfg
Line 73: Line 74:
 
                       http://iplists.com/non_engines.txt
 
                       http://iplists.com/non_engines.txt
 
</pre>
 
</pre>
 +
 +
===Update SOLR config file===
  
 
==Rebuild DSpace to use the SOLR webapp==
 
==Rebuild DSpace to use the SOLR webapp==

Revision as of 10:38, 6 July 2012

This procedure assumes that you have used the three step process to install DSpace.

Apache SOLR with DSpace => 1.6.2

Enable the SOLR webapp

To achieve this become the root user by typing as follows:

sudo -i

Now we create the shortcut to the SOLR webapp in the default Tomcat webapps folder by typing as follows:

cd /var/lib/tomcat6/webapps
ln -s /home/dspace/webapps/solr

Setup SOLR in DSpace config file

Update DSpace config file

Edit the DSpace config file by typing the following.

nano /home/dspace/dspace-1.7.2-src-release/dspace/config/dspace.cfg

Go to the bottom of the file and check the settings, see example below.

#---------------------------------------------------------------#
#--------------SOLR STATISTICS CONFIGURATIONS-------------------#
#---------------------------------------------------------------#
# These configs are only used by the SOLR interface/webapp to   #
# track usage statistics.                                       #
#---------------------------------------------------------------#

##### Usage Logging #####
# set this to be the port you run the dspace "solr" webapp
# on, by default, we are assuming a test configuration with
# tomcat still running on port 8080
solr.log.server = http://localhost/solr/statistics

# The location for the Geo Database retrieved on update/installation
solr.dbfile = ${dspace.dir}/config/GeoLiteCity.dat

# Timeout for the resolver in the dns lookup
# Time in milliseconds, defaults to 200 for backward compatibility
# Your systems default is usually set in /etc/resolv.conf and varies
# between 2 to 5 seconds, to high a value might result in solr exhausting
# your connection pool
solr.resolver.timeout = 200

# Control if the statistics pages should be only shown to authorized users
# If enabled, only the administrators for the DSpaceObject will be able to
# view the statistics.
# If disabled, anyone with READ permissions on the DSpaceObject will be able
# to view the statistics.
statistics.item.authorization.admin=false

# Enable/disable logging of spiders in solr statistics.
# If false, and IP matches an address in solr.spiderips.urls, event is not logged.
# If true, event will be logged with the 'isBot' field set to true
# (see solr.statistics.query.filter.* for query filter options)
# Default value is true.
#solr.statistics.logBots = true

# control solr statistics querying to filter out spider IPs
# false by default
#solr.statistics.query.filter.spiderIp = false

# control solr statistics querying to look at "isBot" field to determine
# if record is a bot. true by default.
#solr.statistics.query.filter.isBot = true

# URLs to download IP addresses of search engine spiders from
solr.spiderips.urls = http://iplists.com/google.txt, \
                      http://iplists.com/inktomi.txt, \
                      http://iplists.com/lycos.txt, \
                      http://iplists.com/infoseek.txt, \
                      http://iplists.com/altavista.txt, \
                      http://iplists.com/excite.txt, \
                      http://iplists.com/misc.txt, \
                      http://iplists.com/non_engines.txt

Update SOLR config file

Rebuild DSpace to use the SOLR webapp

Now rebuild your webapps and continue with the following.

Finally we give SOLR all permissions

 chmod -R 0777 /home/dspace/webapps/solr

Test the local SOLR webapp

Install the "lynx" application.

sudo apt-get install lynx

Then type the following in the console:

lynx http://localhost/solr

You should get the following:

Lynx-solr.png

If not debug the solr webapp until you can, then complete the following.

Convert the old log files

I prepared the following bash script to do the conversion.

#!/bin/sh

cd /home/dspace/log
ITEM=`ls dspace.log.*`
#echo $ITEM

for i in $ITEM ; do
	/home/dspace/bin/dspace stats-log-converter -i $i -o $i.solr
done

After running the script you will have a lot of log files with the .solr extension. This takes quite a while with a lot of log files. Be patient.

Import the converted solr log files

I prepared the following bash script to do the import.

#!/bin/sh

cd /home/dspace/log
ITEM=`ls *.solr`
#echo $ITEM

for i in $ITEM ; do
	echo "###################################"
	echo "Importing stats for log file:... $i"
	/home/dspace/bin/dspace stats-log-importer -i $i
done

This takes quite a while with a lot of log files. Be patient.

Statistics for DSpace version 1.5.2 and lower

General Reports

Make sure that you run the stats programs regularly.

@daily		/home/dspace/bin/stat-general
@daily		/home/dspace/bin/stat-report-general
@monthly	/home/dspace/bin/stat-monthly
@monthly	/home/dspace/bin/stat-report-monthly

The above must be added to the crontab for the "dspace" user.

Allow normal users to browse the statistics

Edit the following in the DSpace config file.

###### Statistical Report Configuration Settings ######

# should the stats be publicly available?  should be set to false if you only
# want administrators to access the stats, or you do not intend to generate
# any
report.public = true

# directory where live reports are stored
report.dir = ${dspace.dir}/reports

Help

Back to IR Help