Difference between revisions of "SUNScholar/SOLR Statistics"

From Libopedia
Jump to navigation Jump to search
 
(175 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Apache SOLR with DSpace 1.6.2 and upwards=
+
<center>
 +
'''[[SUNScholar/Statistics|Back to Statistics]]'''
 +
</center>
  
=Enable the SOLR webapp=
+
==Introduction==
To achieve this become the root user by typing as follows:
+
SOLR Statistics was introduced with DSpace =>1.6.2. SOLR statistics are now enabled by default for the XMLUI in DSpace versions =>3.2.
sudo -i
+
==Instructions==
 +
===[[SUNScholar/Statistics/5.X|For DSpace 5.X]]===
 +
===[[SUNScholar/Statistics/4.X|For DSpace 4.X]]===
  
Now we create the shortcut to the SOLR webapp in the default Tomcat webapps folder by typing as follows:
+
==[[SUNScholar/Statistics/Old Versions|Old Statistics Conversion Help]]==
  
cd /var/lib/tomcat6/webapps
+
==<font color="red">'''PLEASE NOTE:'''</font>==
 +
===Step 1 - Optimise statistics database===
 +
''Before upgrading DSpace, run SOLR statistics optimisation '''$HOME/bin/dspace stats-util -o''' at least once just before the upgrade!''
 +
===Step 2 - Backup statistics database===
 +
<font color="red">'''If you are upgrading, you need to backup the SOLR database before doing any new configuration.'''</font>
  
  ln -s /home/dspace/webapps/solr
+
Type the following to backup the SOLR DB: (If using Ubuntu 12.04 LTS, replace tomcat7 with tomcat6)
 +
  sudo service tomcat7 stop
  
=Setup SOLR in DSpace config file=
+
  mkdir backup
Edit the DSpace config file by typing the following.
 
  nano /home/dspace/dspace-1.6.2-src-release/dspace/config/dspace.cfg
 
  
Go to the bottom of the file and check the settings, see example below.
+
cp -Rv $HOME/solr/ $HOME/backup/
<pre>
 
#---------------------------------------------------------------#
 
#--------------SOLR STATISTICS CONFIGURATIONS-------------------#
 
#---------------------------------------------------------------#
 
# These configs are only used by the SOLR interface/webapp to  #
 
# track usage statistics.                                      #
 
#---------------------------------------------------------------#
 
  
##### Usage Logging #####
+
sudo service tomcat7 start
# set this to be the port you run the dspace "solr" webapp
+
===Step 3 - Adding new SOLR statistics database fields===
# on, by default, we are assuming a test configuration with
+
New statistics fields were introduced in DSpace 4.X, therefore any upgrade from a DSpace version <= 3.X will not have these fields.
# tomcat still running on port 8080
 
solr.log.server = http://localhost/solr/statistics
 
  
# The location for the Geo Database retrieved on update/installation
+
''The lack of these new fields in statistics records from previous versions of DSpace, [https://jira.duraspace.org/browse/DS-2212 causes major errors when attempting to do statistics maintenance] in DSpace versions => 4.X after an upgrade from a lower version of DSpace.'' See the link below for a possible fix.
solr.dbfile = /home/dspace/config/GeoLiteCity.dat
+
https://gist.github.com/terrywbrady/82bd91b53ea4374b96e4
 +
Because of this problem, we are considering using [[SUNScholar/Elastic_Statistics|Elastic Statistics]] exclusively instead, which seems to be more stable and provides download data not just visits and views.
  
# If enabled the statistics system will look for an X-Forward header
+
The issue is being discussed on this wiki page: https://wiki.duraspace.org/display/DSPACE/DSpace+statistics+-+current+status+and+future+development
# if it finds it, it will use this for the user IP Addrress
 
# it is enabled by default
 
useProxies = true
 
  
#statistics.items.dc.1=dc.identifier
+
Also see the following about work to automatically fix old statistics databases.
#statistics.items.dc.2=dc.date.accessioned
+
*https://jira.duraspace.org/browse/DS-2486
#statistics.items.type.1=dcinput
+
*https://jira.duraspace.org/browse/DS-2541
#statistics.items.type.2=date
+
*https://github.com/DSpace/DSpace/pull/905
#statistics.default.start.datepick = 01/01/1977
 
  
solr.spidersfile = /home/dspace/config/spiders.txt
+
==References==
 
+
*https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics
# Control if the statistics pages should be only shown to authorized users
+
*https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance
# If enabled, only the administrators for the DSpaceObject will be able to
+
----
# view the statistics.
+
*https://wiki.duraspace.org/display/DSDOC4x/DSpace+Statistics
# If disabled, anyone with READ permissions on the DSpaceObject will be able
+
*https://wiki.duraspace.org/display/DSDOC4x/Managing+Usage+Statistics
# to view the statistics.
+
----
statistics.item.authorization.admin = false
+
*https://wiki.duraspace.org/display/DSDOC3x/DSpace+Statistics
 
+
*https://wiki.duraspace.org/display/DSDOC3x/Managing+Usage+Statistics
# control solr statistics querying to filter out spider IPs
+
[[Category:Customisation]]
# false by default
 
solr.statistics.query.filter.spiderIp = false
 
 
 
# control solr statistics querying to look at "isBot" field to determine
 
# if record is a bot. true by default.
 
solr.statistics.query.filter.isBot = false
 
 
 
# URLs to download IP addresses of search engine spiders from
 
solr.spiderips.urls = http://iplists.com/google.txt, \
 
                      http://iplists.com/inktomi.txt, \
 
                      http://iplists.com/lycos.txt, \
 
                      http://iplists.com/infoseek.txt, \
 
                      http://iplists.com/altavista.txt, \
 
                      http://iplists.com/excite.txt, \
 
                      http://iplists.com/misc.txt, \
 
                      http://iplists.com/non_engines.txt
 
</pre>
 
 
 
Now restart the server and continue with the following.
 
 
 
=Test the local SOLR webapp=
 
Install the "lynx" application.
 
 
 
Then type the following in a console:
 
lynx http://localhost/solr
 
 
 
You should get the following:
 
 
 
[[File:Lynx-solr.png]]
 
 
 
If not debug the solr webapp until you can, then complete the following.
 
 
 
=Convert the old log files=
 
I prepared the following bash script to do the conversion.
 
<pre>
 
#!/bin/sh
 
 
 
cd /home/dspace/log
 
ITEM=`ls dspace.log.*`
 
#echo $ITEM
 
 
 
for i in $ITEM ; do
 
/home/dspace/bin/dspace stats-log-converter -i $i -o $i.solr
 
done
 
</pre>
 
 
 
After running the script you will have a lot of log files with the .solr extension. This takes quite a while with a lot of log files. Be patient.
 
=Import the converted solr log files=
 
I prepared the following bash script to do the import.
 
<pre>
 
#!/bin/sh
 
 
 
cd /home/dspace/log
 
ITEM=`ls *.solr`
 
#echo $ITEM
 
 
 
for i in $ITEM ; do
 
echo "###################################"
 
echo "Importing stats for log file:... $i"
 
/home/dspace/bin/dspace stats-log-importer -i $i
 
done
 
</pre>
 
This takes quite a while with a lot of log files. Be patient.
 
 
 
=Statisitcs for DSpace version 1.5.2 and lower=
 
==General Reports==
 
Make sure that you run the stats programs regularly.
 
<pre>
 
@daily /home/dspace/bin/stat-general
 
@daily /home/dspace/bin/stat-report-general
 
@monthly /home/dspace/bin/stat-monthly
 
@monthly /home/dspace/bin/stat-report-monthly
 
</pre>
 
The above must be added to the [[SUNScholar/Daily_Admin|crontab]] for the "dspace" user.
 
 
 
==Allow normal users to browse the statistics==
 
Edit the following in the DSpace config file.
 
<pre>
 
###### Statistical Report Configuration Settings ######
 
 
 
# should the stats be publicly available?  should be set to false if you only
 
# want administrators to access the stats, or you do not intend to generate
 
# any
 
report.public = true
 
 
 
# directory where live reports are stored
 
report.dir = ${dspace.dir}/reports
 
</pre>
 
 
 
=Help=
 
* https://wiki.duraspace.org/display/DSDOC/DSpace+Statistics
 
 
 
'''[[SUNScholar/IR|Back to IR Help]]'''
 

Latest revision as of 16:48, 25 November 2016

Back to Statistics

Introduction

SOLR Statistics was introduced with DSpace =>1.6.2. SOLR statistics are now enabled by default for the XMLUI in DSpace versions =>3.2.

Instructions

For DSpace 5.X

For DSpace 4.X

Old Statistics Conversion Help

PLEASE NOTE:

Step 1 - Optimise statistics database

Before upgrading DSpace, run SOLR statistics optimisation $HOME/bin/dspace stats-util -o at least once just before the upgrade!

Step 2 - Backup statistics database

If you are upgrading, you need to backup the SOLR database before doing any new configuration.

Type the following to backup the SOLR DB: (If using Ubuntu 12.04 LTS, replace tomcat7 with tomcat6)

sudo service tomcat7 stop
mkdir backup
cp -Rv $HOME/solr/ $HOME/backup/
sudo service tomcat7 start

Step 3 - Adding new SOLR statistics database fields

New statistics fields were introduced in DSpace 4.X, therefore any upgrade from a DSpace version <= 3.X will not have these fields.

The lack of these new fields in statistics records from previous versions of DSpace, causes major errors when attempting to do statistics maintenance in DSpace versions => 4.X after an upgrade from a lower version of DSpace. See the link below for a possible fix.

https://gist.github.com/terrywbrady/82bd91b53ea4374b96e4

Because of this problem, we are considering using Elastic Statistics exclusively instead, which seems to be more stable and provides download data not just visits and views.

The issue is being discussed on this wiki page: https://wiki.duraspace.org/display/DSPACE/DSpace+statistics+-+current+status+and+future+development

Also see the following about work to automatically fix old statistics databases.

References