Intel® Integrated Performance Primitives (Intel® IPP) Developer Guide and Reference

ID 790148
Date 10/31/2024
Public
Document Table of Contents

Burrows-Wheeler Transform

Burrows-Wheeler Transform (BWT) does not compress data, but it simplifies the structure of input data and makes more effective further compression. One of the distinctive feature of this method is operation on the block of data (as a rule of size 512kB - 2 mB). The main idea of this method is block sorting which groups symbols with a similar context. Let us consider how BWT works on the input data block ‘abracadabra'. The first step is to create a matrix containing all its possible cyclic permutations. The first row is input string, the second is created by shifting it to the left by one symbol and so on:

abracadabra bracadabraa racadabraab acadabraabr cadabraabra adabraabrac dabraabraca abraabracad braabracada raabracadab aabracadabr

Then all rows are sorted in accordance with the lexicographic order:

 0 aabracadabr  1 abraabracad  2 abracadabra  3 acadabraabr  4 adabraabrac  5 braabracada  6 bracadabraa  7 cadabraabra  8 dabraabraca  9 raabracadab 10 racadabraab

The last step is to write out the last column and the index of the input string: rdarcaaaabb, 2 - this is a result of the forward BWT transform.

Inverse BTW is performed as follows:

elements of the input string are numbered in ascending order

 0  r  1  d  2  a  3  r  4  c  5  a  6  a  7  a  8  a  9  b 10  b

and sorted in accordance with the lexicographic order:

 2  a  5  a  6  a  7  a  8  a  9  b 10  b  4  c  1  d  0  r  3  r

This index array is a vector of the inverse transform (Inv), the further reconstruction of the string is performed in the following manner:

src[] = "rdarcaaaabb";
Inv[] = {2,5,6,7,8,9,10,4,1,0,3};
index = 2; // index of the initial string is known from the forward BWT
for( i = 0; i < len; i++ ) {
  index = Inv[index];
  dst[i] = src[index];
}