Difference between revisions of "SUNScholar/SOLR Statistics"

From Libopedia
Jump to navigation Jump to search
 
(185 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Apache SOLR with DSpace 1.6.2 and upwards=
+
<center>
==Add config for Tomcat==
+
'''[[SUNScholar/Statistics|Back to Statistics]]'''
To enable the SOLR java webapp add the following to the tomcat server file in '''/etc/tomcat6/server.xml'''.
+
</center>
<pre>
 
<Context path="/solr" docBase="/home/dspace/webapps/solr" debug="0"
 
        reloadable="true" cachingAllowed="false"
 
        allowLinking="true"/>
 
</pre>
 
  
Restart Tomcat.
+
==Introduction==
 +
SOLR Statistics was introduced with DSpace =>1.6.2. SOLR statistics are now enabled by default for the XMLUI in DSpace versions =>3.2.
 +
==Instructions==
 +
===[[SUNScholar/Statistics/5.X|For DSpace 5.X]]===
 +
===[[SUNScholar/Statistics/4.X|For DSpace 4.X]]===
  
=Test the local SOLR webapp=
+
==[[SUNScholar/Statistics/Old Versions|Old Statistics Conversion Help]]==
Install the "lynx" application.
 
  
Then type the following in a console:
+
==<font color="red">'''PLEASE NOTE:'''</font>==
lynx http://localhost/solr
+
===Step 1 - Optimise statistics database===
 +
''Before upgrading DSpace, run SOLR statistics optimisation '''$HOME/bin/dspace stats-util -o''' at least once just before the upgrade!''
 +
===Step 2 - Backup statistics database===
 +
<font color="red">'''If you are upgrading, you need to backup the SOLR database before doing any new configuration.'''</font>
  
You should get the following:
+
Type the following to backup the SOLR DB: (If using Ubuntu 12.04 LTS, replace tomcat7 with tomcat6)
 +
sudo service tomcat7 stop
  
[[File:Lynx-solr.png]]
+
mkdir backup
  
If not debug the solr webapp until you can, then complete the following.
+
cp -Rv $HOME/solr/ $HOME/backup/
  
==Check DSpace config==
+
sudo service tomcat7 start
The DSpace should look something like the following:
+
===Step 3 - Adding new SOLR statistics database fields===
<pre>
+
New statistics fields were introduced in DSpace 4.X, therefore any upgrade from a DSpace version <= 3.X will not have these fields.
#---------------------------------------------------------------#
 
#--------------SOLR STATISTICS CONFIGURATIONS-------------------#
 
#---------------------------------------------------------------#
 
# These configs are only used by the SOLR interface/webapp to  #
 
# track usage statistics.                                       #
 
#---------------------------------------------------------------#
 
  
##### Usage Logging #####
+
''The lack of these new fields in statistics records from previous versions of DSpace, [https://jira.duraspace.org/browse/DS-2212 causes major errors when attempting to do statistics maintenance] in DSpace versions => 4.X after an upgrade from a lower version of DSpace.'' See the link below for a possible fix.
# set this to be the port you run the dspace "solr" webapp
+
https://gist.github.com/terrywbrady/82bd91b53ea4374b96e4
# on, by default, we are assuming a test configuration with
+
Because of this problem, we are considering using [[SUNScholar/Elastic_Statistics|Elastic Statistics]] exclusively instead, which seems to be more stable and provides download data not just visits and views.
# tomcat still running on port 8080
 
solr.log.server = http://localhost/solr/statistics
 
  
# The location for the Geo Database retrieved on update/installation
+
The issue is being discussed on this wiki page: https://wiki.duraspace.org/display/DSPACE/DSpace+statistics+-+current+status+and+future+development
solr.dbfile = /home/dspace/config/GeoLiteCity.dat
 
  
# If enabled the statistics system will look for an X-Forward header
+
Also see the following about work to automatically fix old statistics databases.
# if it finds it, it will use this for the user IP Addrress
+
*https://jira.duraspace.org/browse/DS-2486
# it is enabled by default
+
*https://jira.duraspace.org/browse/DS-2541
useProxies = true
+
*https://github.com/DSpace/DSpace/pull/905
  
statistics.items.dc.1=dc.identifier
+
==References==
statistics.items.dc.2=dc.date.accessioned
+
*https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics
statistics.items.type.1=dcinput
+
*https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance
statistics.items.type.2=date
+
----
statistics.default.start.datepick = 01/01/1977
+
*https://wiki.duraspace.org/display/DSDOC4x/DSpace+Statistics
 
+
*https://wiki.duraspace.org/display/DSDOC4x/Managing+Usage+Statistics
solr.spidersfile = /home/dspace/config/spiders.txt
+
----
 
+
*https://wiki.duraspace.org/display/DSDOC3x/DSpace+Statistics
# Control if the statistics pages should be only shown to authorized users
+
*https://wiki.duraspace.org/display/DSDOC3x/Managing+Usage+Statistics
# If enabled, only the administrators for the DSpaceObject will be able to
+
[[Category:Customisation]]
# view the statistics.
 
# If disabled, anyone with READ permissions on the DSpaceObject will be able
 
# to view the statistics.
 
statistics.item.authorization.admin = false
 
 
 
# control solr statistics querying to filter out spider IPs
 
# false by default
 
solr.statistics.query.filter.spiderIp = false
 
 
 
# control solr statistics querying to look at "isBot" field to determine
 
# if record is a bot. true by default.
 
solr.statistics.query.filter.isBot = false
 
 
 
# URLs to download IP addresses of search engine spiders from
 
solr.spiderips.urls = http://iplists.com/google.txt, \
 
                      http://iplists.com/inktomi.txt, \
 
                      http://iplists.com/lycos.txt, \
 
                      http://iplists.com/infoseek.txt, \
 
                      http://iplists.com/altavista.txt, \
 
                      http://iplists.com/excite.txt, \
 
                      http://iplists.com/misc.txt, \
 
                      http://iplists.com/non_engines.txt
 
</pre>
 
 
 
=Convert the old log files=
 
I prepared the following bash script to do the conversion.
 
<pre>
 
#!/bin/sh
 
 
 
cd /home/dspace/log
 
ITEM=`ls dspace.log.*`
 
#echo $ITEM
 
 
 
for i in $ITEM ; do
 
/home/dspace/bin/dspace stats-log-converter -i $i -o $i.solr
 
done
 
</pre>
 
 
 
After running the script you will have a lot of log files with the .solr extension. This takes quite a while with a lot of log files. Be patient.
 
=Import the converted solr log files=
 
I prepared the following bash script to do the import.
 
<pre>
 
#!/bin/sh
 
 
 
cd /home/dspace/log
 
ITEM=`ls *.solr`
 
#echo $ITEM
 
 
 
for i in $ITEM ; do
 
echo "###################################"
 
echo "Importing stats for log file:... $i"
 
/home/dspace/bin/dspace stats-log-importer -i $i
 
done
 
</pre>
 
This takes quite a while with a lot of log files. Be patient.
 
 
 
=Statisitcs for DSpace version 1.5.2 and lower=
 
==General Reports==
 
Make sure that you run the stats programs regularly.
 
<pre>
 
@daily /home/dspace/bin/stat-general
 
@daily /home/dspace/bin/stat-report-general
 
@monthly /home/dspace/bin/stat-monthly
 
@monthly /home/dspace/bin/stat-report-monthly
 
</pre>
 
The above must be added to the [[SUNScholar/Daily_Admin|crontab]] for the "dspace" user.
 
 
 
==Allow normal users to browse the statistics==
 
Edit the following in the DSpace config file.
 
<pre>
 
###### Statistical Report Configuration Settings ######
 
 
 
# should the stats be publicly available?  should be set to false if you only
 
# want administrators to access the stats, or you do not intend to generate
 
# any
 
report.public = true
 
 
 
# directory where live reports are stored
 
report.dir = ${dspace.dir}/reports
 
</pre>
 
 
 
=Conclusion=
 
Rebuild DSpace and restart Tomcat. Now when you login to DSpace you should see statistics per collection and item. Make sure you have the correct campus firewall rules to be able to get solr working properly.
 
 
 
'''[[SUNScholar/IR|Back to IR Help]]'''
 

Latest revision as of 16:48, 25 November 2016

Back to Statistics

Introduction

SOLR Statistics was introduced with DSpace =>1.6.2. SOLR statistics are now enabled by default for the XMLUI in DSpace versions =>3.2.

Instructions

For DSpace 5.X

For DSpace 4.X

Old Statistics Conversion Help

PLEASE NOTE:

Step 1 - Optimise statistics database

Before upgrading DSpace, run SOLR statistics optimisation $HOME/bin/dspace stats-util -o at least once just before the upgrade!

Step 2 - Backup statistics database

If you are upgrading, you need to backup the SOLR database before doing any new configuration.

Type the following to backup the SOLR DB: (If using Ubuntu 12.04 LTS, replace tomcat7 with tomcat6)

sudo service tomcat7 stop
mkdir backup
cp -Rv $HOME/solr/ $HOME/backup/
sudo service tomcat7 start

Step 3 - Adding new SOLR statistics database fields

New statistics fields were introduced in DSpace 4.X, therefore any upgrade from a DSpace version <= 3.X will not have these fields.

The lack of these new fields in statistics records from previous versions of DSpace, causes major errors when attempting to do statistics maintenance in DSpace versions => 4.X after an upgrade from a lower version of DSpace. See the link below for a possible fix.

https://gist.github.com/terrywbrady/82bd91b53ea4374b96e4

Because of this problem, we are considering using Elastic Statistics exclusively instead, which seems to be more stable and provides download data not just visits and views.

The issue is being discussed on this wiki page: https://wiki.duraspace.org/display/DSPACE/DSpace+statistics+-+current+status+and+future+development

Also see the following about work to automatically fix old statistics databases.

References