Wavelet Data Compression

Team:
Sergey Klimenko
Benoit Mours
Andrei Sazonov
The Wavelet Data Compression (WDC) standalone utilities can be used for lossy compression of time series stored in frame files. The WDC performs wavelet transforms, followed by lossless RDC compression. Losses are introduced in wavelet domain by scaling the wavelet coefficients (wavelet shrinkage). The amount of losses, number of wavelet layers and wavelet order are controlled by command-line parameters. Since we use a combination of two different types of wavelet decomposition trees (dyadic and binary) there are two sets of parameters respectively.

References

WatFrComp

The WatFrComp utility takes input from frame file, applies lossy wavelet compression and writes compressed data to output frame file. Only data vectors are compressed, so the initial frame structure remains the same. Variable "compress" in corresponding data structure FrVect is set to be 255. Current processing strategy is:
a) channels with data rate < 200 Hz are not processed, data stored as is
b) channels with data rate  >200 Hz but <1000 Hz are not processed with wavelets, but compressed by lossless RDC method
c) channels with data rate  >1000 Hz but <16 kHz are processed using binary wavelet decomposition only (control parameters are Lbt, loss2, and order2).
d) channels with data rate >16 kHz are processed first with a dyadic wavelet transform (control parameters are Lwt, loss1, and order1), followed by binary tree wavelet transform .
e) when wavelet decomposition is applied, the level of losses for the lowest frequency band is intrinsically set to be 0.1 of value of the loss1 or loss2 parameters respectively
Syntax: WatFrComp input_file output_file options

where:

input_file is the name of input frame format file
output_file is the name of output frame format file
options = Lwt Lbt loss1 loss2 order1 order2
 
Lwt - integer, depth of the dyadic wavelet tree, [default = 3
Lbt - integer, depth of the binary wavelet tree, [default = 4
loss1 - float, allowed losses [in %] for dyadic wavelet (high frequency band), [default = 1]
loss2 - float, allowed losses [in %] for binary wavelet (low frequency band), [default = 1
order1 - integer, wavelet order (length of wavelet filter) for dyadic wavelet tree, [default = 8]
order2 - integer, wavelet order (length of wavelet filter) for binary wavelet tree, [default = 8]
Parameters may be omitted. In this case the default values will be used.
Special parameters cases:
Lwt=Lbt=0 - no wavelet decomposition is performed
loss1=loss2=0. - data in wavelet layers is not scaled, just rounded and stored with the RDC compression

WatFrUnComp

The WatFrUnComp utility takes input from a WDC compressed frame file, uncompresses data and writes output to frame file. All parameters required for decompression are contained in the compressed data. Standard compression may be applied to output.

Syntax: WatFrUnComp input_file output_file options

where:

input_file is the name of input frame file,
output_file is the name of output frame file
options = compression
compression - the standard frame compression types (0,1,2,3,...)

Makefile

Makefile for use with GNU make utility is provided for convenient compression/decompression of many files.This Makefile allows to take many frame files from input directory, select desired ADC channels, compress ADC data by lossy WAT compression, optionally combine output in less number of files and compress files as whole by gzip (which effectively compress frame service information). Makefile also allows to uncompress WAT-compressed files.

Variables to set inside Makefile:

 
CHANNELS - is a space-delimited list of names of selected ADC channels, for example, CHANNELS = H2:LSC-AS_Q H2:LSC-LA_NPTRT
TAG  - is a tag for FrCopy, i.e. ADC channel name with wildcard, for example TAG = H2:LSC* (minus before channel name means deselection)
N1, N2 - to select from sorted list of file names those numbered from N1 to N2
WITH_GZIP=yes - if variable set to 'yes' then after WAT compression each output files will be compressed as whole by gzip
BUNCH - controls combining of frames in single output file, value may be 'all','10' and '100'
INDIR - the name of input directory where initial files reside
COPYDIR - the name of directory where to place intermediate files containing only selected ADC channels
COMPDIR - the name of directory where to place frame files with WAT-compressed ADC data, optional gzip compression may be applied
OUTDIR - the name of directory where to place frame files after WAT decompression
ISFX, CSFX, OSFX - suffixes for input, compressed and output frame files, if gzip is applied then additional suffix .gz will be appended
COPY_OPT - FrCopy options, see FrCopy descriptions
COMP_OPT - WAT compression options, see WatFrComp description
UNCOMP_OPT - WAT uncompression options, see WatFrUnComp description
If COPYDIR is equal to INDIR then FrCopy is not executed and frame files are taken from INDIR with all contained ADC channels (CHANNELS and TAG will be ignored). Selection of ADC by variable CHANNELS don't work in combination with BUNCH (due to FrCopy features). When BUNCH is used the output file is named by the name of the first frame file in the bunch. When BUNCH is set to 10 or 100 the each output file will be made of 10 or 100 frames with the first frame taken from the file with "0" or "00" at the end of base name. With BUNCH set to 10 or 100 the numbers N1 and N2 count tens or hundreds of input files.

Syntax: make command options

where commands are:
frcopy - copy frame files from N1 to N2 with selected ADC data
compress - do 'frcopy' stage then apply WAT compression
uncompress - uncompress WAT-compressed files
all - do three stages 'frcopy', 'compress' and 'uncompress'
dir - create working directories OUTDIR, COMPDIR and COPYDIR if they are not exist
cleancp - clean COPYDIR (if COPYDIR differ from INDIR)
cleancomp - clean compressed file directory COMPDIR
cleanout - clean output directory OUTDIR
cleanall - clean all three working directories
and some possible options are:
VAR_NAME=value - to override value of any variables used in Makefile,
-f makefile_name - to specify the name of Makefile, if it is not default 'Makefile' or 'makefile'
-j n - execute n jobs in parallel, suitable to boost processing on multiprocessor systems
Other common options for make utility may be viewed via command 'make -h'.
Caution: Directory names containing colon symbol ":" will confuse make. To process directory with such name first create soft link to it choosing appropriate name for link. Then refer to link instead of original directory. It is found that make version 3.77 doesn't treat links to directories properly, but 3.78 does. Use of GNU make utility version 3.78 is advised.

Caution: Variables in your command shell environment may have names matching names of variables inside Makefile. In this case values of variables from environment will override correspondig settings inside Makefile. This may lead to unpredictable results . Check you environment via command 'env'.

Examples of use:

make -j 4 compress BUNCH=10  N1=
will take all files in input directory, combine frames by 10 in output and apply WAT compression, 4 jobs will be started in parallel to take advantage of the 4-processors system

make compress TAG="H2:LSC*" BUNCH=all N1=1 N2=20
will take input files from 1 to 20, combine all files in single file selecting channels matching the pattern "H2:LSC*" and applying WAT compression

make frcopy TAG="H2:LSC-AS_Q" BUNCH=100 N1=
will take all input files and combine them by 100 in several output files selecting only one channel "H2:LSC-AS_Q", WAT compression is not applied

make uncompress
will take all WAT-compressed files from the directory pointed by variable COMPDIR (either "gzipped" files or not) and uncompress them

make cleanall
will clean up directories COPYDIR (only if COPYDIR and INDIR differ), COMPDIR and OUTDIR - suitable before making new compression with different settings

WatFrStat

The WatFrStat utility allows to view a compression statistics for selected frame file. It lists channel name, data rate, uncompressed and compressed data length, compression ratio and average number of bits per data sample (bps) for each channel.

Syntax: WatFrStat input_file

Example of the WatFrStat output:
  browse frame 117184 run:244 GPS time:657913773

 Channel name               |  rate |  uncomp.|   comp. | compress.| bits per |
                            |   Hz  |   bytes |   bytes |    ratio |  sample  |

 H2:ASC-WFS1_IY             |  2048 |    8192 |    1436 |    5.705 |     5.61 |
 H2:ASC-WFS1_IP             |  2048 |    8192 |    1432 |    5.721 |     5.59 |
 H2:LSC-LA_NPTRT            | 16384 |   65536 |    8800 |    7.447 |     4.30 |
 H2:LSC-LA_NPTRR            | 16384 |   65536 |    8760 |    7.481 |     4.28 |
 H2:PSL-FSS_RCTEMP          |    16 |      64 |      64 |    1.000 |    32.00 |
 H2:PSL-FSS_RMTEMP          |    16 |      64 |      64 |    1.000 |    32.00 |
 ...
 H2:SUS-RM_SENSOR_UR        |   256 |     512 |     140 |    3.657 |     4.38 |
 H2:SUS-RM_SENSOR_LL        |   256 |     512 |     268 |    1.910 |     8.38 |
 H2:SUS-RM_SENSOR_UL        |   256 |     512 |     140 |    3.657 |     4.38 |

 total WAT compressed ADC data :  1454592 ->   272300 bytes
 these channels average compression ratio: 5.341873

 lossy WAT compressed ADC data :  1409024 ->   255756 bytes
 these channels average compression ratio: 5.509251

 total WAT unprocessed ADC data :     5120 bytes
 
 

Availability and installation

Binaries for Sun UltraSPARC Solaris are available as single archive file. These binaries are compiled by GNU gcc/g++ v.2.95 and require shared library libstdc++.so version 2.10.0. For convinience GNU make utility and statically linked frame copy utility FrCopy are also included in this archive. The latest source code for utilities is in the file wdc.tar.gz. Building of utilities requires WAT and Frame libraries. Installation of Frame library is described in "Frame Library (Fr) User's Manual".

Only static WAT library is required for building of lossy compression utilities. Building of static WAT library requires the GNU make. Makefile is tuned by default for the Sun UltraSPARC platform and g++ compiler. For other platforms only one change in this file is needed: optimization options "-mcpu=ultrasparc" and "-mcpu=v8" must be set to option suitable for your platform. DMT and ROOT are not required for building static libraries so that corresponding variables in Makefile must be ignored.

After tuning Makefile run "make dir" to create necessary subdirectories, then run "make lib".

As soon as Frame library and WAT library are built, the last stage is simple: unpack wdc.tar.gz , change lines in Makefile which specify Frame and WAT libraries locations and run 'make'.
 

Reading WAT-compressed data from frame files in ROOT environment

Until WAT compression is integrated into Frame library there is still possibility to read WAT-compressed data in ROOT environment without prior uncompression of whole data file. The function ReadFrFile()  is developed for this purpose. This function reads both compressed and uncompressed data files transparently. To make this function available the compilation of WAT code  must be done with option -D_USE_FR and installation of Virgo Frame library is required. Using of function ReadFrFile() in ROOT session requires loading of shared libraries wavelet.so (WAT) and libFrameROOT.so (Virgo Frame library). Following commands must be issued (or placed in rootlogon.C file):

gSystem->Load("wavelet.so");
gSystem->Load("libFrameROOT.so");

SyntaxWaveData *ReadFrFile(double t_length, double t_skip, char *cn, char *fn, bool seek=true);

where

 
t_length - time length of data portion, in seconds, 
t_skip - time skip from the begining of the first frame in file, in seconds,
cn - the name of selected ADC channel,
fn - the name of file containing frames with WAT-compressed ADC data,
seek - seek data continuation trying to guess next file name. 

Function returns pointer to WaveData object (WAT-object), which contains data portion from selected channel

Note: if data amount in frame file is not enough to fill requested data array then the rest of data samples is set to zero.

Example: w=ReadFrFile(4.5,0.,"H2:LSC-AS_Q","wz/H-657913760.WZ");



Contact

S.Klimenko (klimenko@phys.ufl.edu)



A.Sazonov (sazonov@phys.ufl.edu)

Last update: June 6, 2001