SUNScholar/Elastic Statistics/Import Logs

From Libopedia
Jump to navigation Jump to search
BACK TO ELASTIC STATISTICS

Introduction

To get a full statistical history after enabling Elastic Statistics, requires the import of old log data. Below are details to do just that.

PLEASE NOTE:

When importing records, try not to duplicate records.

For example:

If you enabled Elastic Statistics a month previously, then do not import the logs for the past month, because you will duplicate records from the period when you first enabled Elastic Statistics.

Step 1 - Export old log data to suitable import format

  • Create export script:
mkdir $HOME/scripts
nano $HOME/scripts/stats-export
  • Copy and paste the following, then save the file:
#!/bin/sh

cd $HOME/log
ITEM=`ls dspace.log.*`
## Uncomment the following to debug ##
#echo $ITEM

for i in $ITEM ; do
	echo "###################################"
	echo "Exporting stats for log file:... $i"
	$HOME/bin/dspace stats-log-converter -n -i $i -o $i.export
done

NANO Editor Help
CTL+O = Save the file and then press Enter
CTL+X = Exit "nano"
CTL+K = Delete line
CTL+U = Undelete line
CTL+W = Search for %%string%%
CTL+\ = Search for %%string%% and replace with $$string$$
CTL+C = Show line numbers

More info = http://en.wikipedia.org/wiki/Nano_(text_editor)


  • Make the script executable:
chmod 0775 $HOME/scripts/stats-export
  • Convert the logs:
$HOME/scripts/stats-export

Step 2 - Import prepared log data to Elastic Statistics


PLEASE NOTE:

  • For a busy web site, the imports will take a long time. You will be updating a long history of visits in a very short space of time on the server. Please be patient, the import may take hours, maybe even days!
  • You can speed up the imports on an Ubuntu server by installing a name server caching daemon, type: sudo apt-get install nscd before you start the imports.
  • The following can be used to run the import script in the background in a "detached mode".
  • If the import should fail, then check the "stats-import.log" file to see which imports succeeded. Then remove the import files that have been processed and begin the import again. Repeat this process until all the import files have been processed.

Procedure

  • Create import script:
mkdir $HOME/scripts
nano $HOME/scripts/stats-import
  • Copy and paste the following, then save the file:
#!/bin/sh

cd $HOME/log
ITEM=`ls *.export`
## Uncomment the following to debug ##
#echo $ITEM

for i in $ITEM ; do
	echo "###################################" > stats-import.log
	echo "Importing stats for log file:... $i" > stats-import.log
	$HOME/bin/dspace stats-log-importer -v -i $i
        echo "Import of:... $i completed. Resting for a while!" > stats-import.log
        sleep 3
done
  • Make the script executable:
chmod 0755 $HOME/scripts/stats-import
  • Import the converted log files
$HOME/scripts/stats-import

References

See: https://github.com/osulibraries/kb-stats-csv-import-elasticsearch

Export Utility

$HOME/bin/dspace stats-log-converter
usage: ClassicDSpaceLogConverter
       
 -h,--help        help
 -i,--in <arg>    source file ('-' or omit for standard input)
 -m,--multiple    treat the input file as having a wildcard ending
 -n,--newformat   process new format log lines (1.6+)
 -o,--out <arg>   destination file or directory ('-' or omit for standard
                  output)
 -v,--verbose     display verbose output (useful for debugging)

	ClassicDSpaceLogConverter -i infilename -o outfilename -v (for verbose output)

Import Utility

$HOME/bin/dspace stats-log-importer-elasticsearch
usage: StatisticsImporterElasticSearch
       
 -h,--help       help
 -i,--in <arg>   the input file ('-' or omit for standard input)
 -m,--multiple   treat the input file as having a wildcard ending
 -s,--skipdns    skip performing reverse DNS lookups on IP addresses
 -v,--verbose    display verbose output (useful for debugging)