Difference between revisions of "SUNScholar/SOLR Statistics"

From Libopedia
Jump to navigation Jump to search
 
(190 intermediate revisions by the same user not shown)
Line 1: Line 1:
=General Reports=
+
<center>
Make sure that you run the stats programs regularly. (Not needed for DSpace 1.6.2 and upwards)
+
'''[[SUNScholar/Statistics|Back to Statistics]]'''
<pre>
+
</center>
@daily /home/dspace/bin/stat-general
 
@daily /home/dspace/bin/stat-report-general
 
@monthly /home/dspace/bin/stat-monthly
 
@monthly /home/dspace/bin/stat-report-monthly
 
</pre>
 
The above must be added to the [[SUNScholar/Daily_Admin|crontab]] for the "dspace" user.
 
  
=Apache SOLR with DSpace 1.6.2 and upwards=
+
==Introduction==
==Add config for Tomcat==
+
SOLR Statistics was introduced with DSpace =>1.6.2. SOLR statistics are now enabled by default for the XMLUI in DSpace versions =>3.2.
To enable the SOLR java webapp add the following to the tomcat server file in '''/etc/tomcat55/server.xml'''.
+
==Instructions==
<pre>
+
===[[SUNScholar/Statistics/5.X|For DSpace 5.X]]===
<Context path="/solr" docBase="/home/dspace/webapps/solr" debug="0"
+
===[[SUNScholar/Statistics/4.X|For DSpace 4.X]]===
        reloadable="true" cachingAllowed="false"
 
        allowLinking="true"/>
 
</pre>
 
  
Add the following to the Apache jakarta config files in '''/etc/apache2/conf.d/tomcat.conf''' and '''/etc/apache2/conf.d/tomcat-ssl.conf'''
+
==[[SUNScholar/Statistics/Old Versions|Old Statistics Conversion Help]]==
  
See: http://ir.sun.ac.za/wiki/index.php/Prepare_Ubuntu#Step_6._Setup_Apache2_Tomcat_5.5_Jakarta_Connector for more info.
+
==<font color="red">'''PLEASE NOTE:'''</font>==
<pre>
+
===Step 1 - Optimise statistics database===
JkMount /solr localhost
+
''Before upgrading DSpace, run SOLR statistics optimisation '''$HOME/bin/dspace stats-util -o''' at least once just before the upgrade!''
JkMount /solr/* localhost
+
===Step 2 - Backup statistics database===
</pre>
+
<font color="red">'''If you are upgrading, you need to backup the SOLR database before doing any new configuration.'''</font>
  
Restart Apache and then Tomcat.
+
Type the following to backup the SOLR DB: (If using Ubuntu 12.04 LTS, replace tomcat7 with tomcat6)
 +
sudo service tomcat7 stop
  
=Test the local SOLR webapp=
+
mkdir backup
Install the "lynx" application.
 
  
Then type the following in a console:
+
  cp -Rv $HOME/solr/ $HOME/backup/
  lynx http://localhost/solr
 
  
You should get the following:
+
sudo service tomcat7 start
 +
===Step 3 - Adding new SOLR statistics database fields===
 +
New statistics fields were introduced in DSpace 4.X, therefore any upgrade from a DSpace version <= 3.X will not have these fields.
  
[[File:Lynx-solr.png]]
+
''The lack of these new fields in statistics records from previous versions of DSpace, [https://jira.duraspace.org/browse/DS-2212 causes major errors when attempting to do statistics maintenance] in DSpace versions => 4.X after an upgrade from a lower version of DSpace.'' See the link below for a possible fix.
 +
https://gist.github.com/terrywbrady/82bd91b53ea4374b96e4
 +
Because of this problem, we are considering using [[SUNScholar/Elastic_Statistics|Elastic Statistics]] exclusively instead, which seems to be more stable and provides download data not just visits and views.
  
If not debug the solr webapp until you can, then complete the following.
+
The issue is being discussed on this wiki page: https://wiki.duraspace.org/display/DSPACE/DSpace+statistics+-+current+status+and+future+development
  
==Check DSpace config==
+
Also see the following about work to automatically fix old statistics databases.
The DSpace should look something like the following:
+
*https://jira.duraspace.org/browse/DS-2486
<pre>
+
*https://jira.duraspace.org/browse/DS-2541
#---------------------------------------------------------------#
+
*https://github.com/DSpace/DSpace/pull/905
#--------------SOLR STATISTICS CONFIGURATIONS-------------------#
 
#---------------------------------------------------------------#
 
# These configs are only used by the SOLR interface/webapp to  #
 
# track usage statistics.                                       #
 
#---------------------------------------------------------------#
 
  
##### Usage Logging #####
+
==References==
# set this to be the port you run the dspace "solr" webapp
+
*https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics
# on, by default, we are assuming a test configuration with
+
*https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance
# tomcat still running on port 8080
+
----
solr.log.server = http://localhost/solr/statistics
+
*https://wiki.duraspace.org/display/DSDOC4x/DSpace+Statistics
 
+
*https://wiki.duraspace.org/display/DSDOC4x/Managing+Usage+Statistics
# The location for the Geo Database retrieved on update/installation
+
----
solr.dbfile = /home/dspace/config/GeoLiteCity.dat
+
*https://wiki.duraspace.org/display/DSDOC3x/DSpace+Statistics
 
+
*https://wiki.duraspace.org/display/DSDOC3x/Managing+Usage+Statistics
# If enabled the statistics system will look for an X-Forward header
+
[[Category:Customisation]]
# if it finds it, it will use this for the user IP Addrress
 
# it is enabled by default
 
useProxies = true
 
 
 
statistics.items.dc.1=dc.identifier
 
statistics.items.dc.2=dc.date.accessioned
 
statistics.items.type.1=dcinput
 
statistics.items.type.2=date
 
statistics.default.start.datepick = 01/01/1977
 
 
 
solr.spidersfile = /home/dspace/config/spiders.txt
 
 
 
# Control if the statistics pages should be only shown to authorized users
 
# If enabled, only the administrators for the DSpaceObject will be able to
 
# view the statistics.
 
# If disabled, anyone with READ permissions on the DSpaceObject will be able
 
# to view the statistics.
 
statistics.item.authorization.admin = false
 
 
 
# control solr statistics querying to filter out spider IPs
 
# false by default
 
solr.statistics.query.filter.spiderIp = false
 
 
 
# control solr statistics querying to look at "isBot" field to determine
 
# if record is a bot. true by default.
 
solr.statistics.query.filter.isBot = false
 
 
 
# URLs to download IP addresses of search engine spiders from
 
solr.spiderips.urls = http://iplists.com/google.txt, \
 
                      http://iplists.com/inktomi.txt, \
 
                      http://iplists.com/lycos.txt, \
 
                      http://iplists.com/infoseek.txt, \
 
                      http://iplists.com/altavista.txt, \
 
                      http://iplists.com/excite.txt, \
 
                      http://iplists.com/misc.txt, \
 
                      http://iplists.com/non_engines.txt
 
</pre>
 
 
 
=Convert the old log files=
 
I prepared the following bash script to do the conversion.
 
<pre>
 
#!/bin/sh
 
 
 
cd /home/dspace/log
 
ITEM=`ls dspace.log.*`
 
#echo $ITEM
 
 
 
for i in $ITEM ; do
 
/home/dspace/bin/dspace stats-log-converter -i $i -o $i.solr
 
done
 
</pre>
 
 
 
After running the script you will have a lot of log files with the .solr extension. This takes quite a while with a lot of log files. Be patient.
 
=Import the converted solr log files=
 
I prepared the following bash script to do the import.
 
<pre>
 
#!/bin/sh
 
 
 
cd /home/dspace/log
 
ITEM=`ls *.solr`
 
#echo $ITEM
 
 
 
for i in $ITEM ; do
 
echo "###################################"
 
echo "Importing stats for log file:... $i"
 
/home/dspace/bin/dspace stats-log-importer -i $i
 
done
 
</pre>
 
This takes quite a while with a lot of log files. Be patient.
 
 
 
=Allow normal users to browse the statistics=
 
Edit the following in the DSpace config file.
 
<pre>
 
###### Statistical Report Configuration Settings ######
 
 
 
# should the stats be publicly available?  should be set to false if you only
 
# want administrators to access the stats, or you do not intend to generate
 
# any
 
report.public = true
 
 
 
# directory where live reports are stored
 
report.dir = ${dspace.dir}/reports
 
</pre>
 
 
 
=Conclusion=
 
Rebuild DSpace and restart Tomcat. Now when you login to DSpace you should see statistics per collection and item. Make sure you have the correct campus firewall rules to be able to get solr working properly.
 
 
 
'''[[SUNScholar/IR|Back to IR Help]]'''
 

Latest revision as of 16:48, 25 November 2016

Back to Statistics

Introduction

SOLR Statistics was introduced with DSpace =>1.6.2. SOLR statistics are now enabled by default for the XMLUI in DSpace versions =>3.2.

Instructions

For DSpace 5.X

For DSpace 4.X

Old Statistics Conversion Help

PLEASE NOTE:

Step 1 - Optimise statistics database

Before upgrading DSpace, run SOLR statistics optimisation $HOME/bin/dspace stats-util -o at least once just before the upgrade!

Step 2 - Backup statistics database

If you are upgrading, you need to backup the SOLR database before doing any new configuration.

Type the following to backup the SOLR DB: (If using Ubuntu 12.04 LTS, replace tomcat7 with tomcat6)

sudo service tomcat7 stop
mkdir backup
cp -Rv $HOME/solr/ $HOME/backup/
sudo service tomcat7 start

Step 3 - Adding new SOLR statistics database fields

New statistics fields were introduced in DSpace 4.X, therefore any upgrade from a DSpace version <= 3.X will not have these fields.

The lack of these new fields in statistics records from previous versions of DSpace, causes major errors when attempting to do statistics maintenance in DSpace versions => 4.X after an upgrade from a lower version of DSpace. See the link below for a possible fix.

https://gist.github.com/terrywbrady/82bd91b53ea4374b96e4

Because of this problem, we are considering using Elastic Statistics exclusively instead, which seems to be more stable and provides download data not just visits and views.

The issue is being discussed on this wiki page: https://wiki.duraspace.org/display/DSPACE/DSpace+statistics+-+current+status+and+future+development

Also see the following about work to automatically fix old statistics databases.

References