Compression Scheme for Faster and Secure Data Transmission Over Networks B.S.Shajeemohan1, Dr. V.K.Govindan 2 1.Assistant Professor, CSED, LBSCE, Kerala, India. 2. Professor & HOD, CSED, NITC, Kerala. [email protected] [email protected] Abstract The Compression algorithms reduce the redundancy in data representation to decrease the storage required for that data. Data compression offers an attractive approach to reducing communication costs by using available bandwidth effectively. There has been an unprecedented increase in the amount of digital data transmitted via networks especially through the Internet and mobile cellular networks, over the last decade. Digital data represent text, images, video, sound, computer programs, etc. With this trend expected to continue, it makes sense to pursue research on developing algorithms that can most effectively use available network bandwidth by maximally compressing data. A strategy called Intelligent Dictionary Based Encoding (IDBE) in conjunction with Burrows Wheeler Transform (BWT)[1] is discussed to achieve this. It has been observed that a preprocessing of the text prior to conventional compression will improve the compression efficiency much better. The intelligent dictionary based encoding provides some level of confidentiality. The experimental results of this compression method are also analyzed. Index Terms— BWT, Data compression, Dictionary Based Encoding, IDBE, Lossless compression, Star Encoding.

1. Introduction The objective of this paper is to develop a better transformation yielding greater compression and added confidentiality to the text data transmitted over the mobile cellular network in the form of short messages (SMS), and over the Internet. The basic idea of compression is to transform text in to some intermediate form, which can be compressed with better efficiency. A moderately secure encoding called IDBE (Intelligent Dictionary Based Encoding) is used

Proceedings of the International Conference on Mobile Business (ICMB’05) 0-7695-2367-6/05 $20.00 © 2005 IEEE

as a pre processing stage so as to improve the compression ratio and the rate of compression. The compression method suggested will greatly reduce the transmission time as well as the bandwidth requirements for text messages transmitted over networks.

2. Related work and background A. Burrows Wheeler Transform (BWT) The BWT is an algorithm that takes a block of data and reorders it using a sorting algorithm. The resulting output block contains exactly the same data elements that it started with, differing only in their ordering. The transformation is reversible; meaning the original ordering of the data elements can be restored with no loss of quality and fidelity. The output of the BWT transform is usually piped through a move-to-front stage, then a run length encoder stage, and finally an entropy encoder, normally arithmetic or Huffman coding. The actual command line to perform this sequence will look like this: BWT < input-file | MTF | RLE | ARI > outputfile

The decompression is just the reverse process and look like this UNARI input-file | UNRLE | UNMTF | UNBWT output-file

>

B. Star encoding An alternate approach to this is to perform a lossless, reversible transformation to a source file prior to applying an existing compression algorithm. The transformation is designed to make it easier to compress the source file. The star encoding [3] is generally used for this type of pre processing transformation of the source text. Star-encoding works by creating a large dictionary of commonly used words expected in the input files. The dictionary must

be prepared in advance, and must be known to the compressor and decompressor. As an example, a section of text from Canterbury corpus version of Bible.text looks like this in the original text: In the beginning God created the heaven and the earth. And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.

Running this text through the star-encoder yields the following text: *n *** ********* *o* *******d *** *****n **d *** *****. *n* *** ***** **s ******* ***m, **d ***d; **d ******** **s **** *** ***e ** *** ***p.*n* *** *****t ** *o* ****d **** *** ***e ** *** ******.

We can clearly see that the encoded data has exactly the same number of characters, but is dominated by stars. A later modification for star encoding called LIPT (Length Index Preserving Transform)[5][6] whose genesis can be traced to several other similar transforms developed by the M-5 Research group at the Department of Computer Science, University of Central Florida, should also be mentioned at this juncture.

3. Intelligent Dictionary Based Encoding In these circumstances we propose a better encoding strategy, which will offer higher compression ratios, rate of compression and maintains confidentiality of the data sent over the channel by making use of a dictionary for encoding and decoding. The dictionary is known only to the sender and receiver for a particular session. The basic philosophy of this compression technique is to transform text into some intermediate form by using an intelligent dictionary Based Encoding (IDBE) scheme. The transformed text can be then encoded with a BWT stage. These two operations form the first stage of the preprocessing block of the proposed compression scheme. The preprocessed text is then piped through a move-tofront encoder stage, then a run length encoder stage, and finally an entropy encoder, normally arithmetic coding. The algorithm we developed is called IDBE [4] and involves two step process consisting: Step1: Make an intelligent dictionary Step2: Encode the input text data The entire process can be summarized as follows.

Proceedings of the International Conference on Mobile Business (ICMB’05) 0-7695-2367-6/05 $20.00 © 2005 IEEE

A. Dictionary Making Algorithm Start MakeDict with multiple source files as input 1. Extract all words from input files. 2. If a word is already in the table increment the number of occurrences by 1, otherwise add it to the table and set the number occurrence to 1. 3. Sort the table by frequency of occurrences in descending order. 4. Start giving codes using the following method: i). Give the first 218 words the ASCII characters 33 to 250 as the code. ii). Now give the remaining words each one permutation of two of the ASCII characters (in the range 33 – 250), taken in order. If there are any remaining words give them each one permutation of three of the ASCII characters and finally if required permutation of four characters. 5. Create a new table having only words and their codes. Store this table as the Dictionary in a file. Stop.

B. Encoding Algorithm Start encode with argument input file inp A. Read the dictionary and store all words and their codes in a table B . While inp is not empty 1.Read the characters from inp and form tokens. 2. If the token is longer than 1 character, then 1.Search for the token in the table 2.If it is not found, 1.Write the token as such in to the output file. Else 1.Find the length of the code for the word. 2.The actual code consists of the length concatenated with the code in the table, the length serves as a marker while decoding and is represented by the ASCII characters 251 to254 with 251 representing a code of length 1, 252 length 2 and so on.

1.Write the actual code into the output file. 2. Read the next character and neglect it if it is a space. If it is any other character, make it the first character of the next token and go back to B, after inserting a marker character (ASCII 255) to indicate the absence of a space. Endif Else 1.Write the 1 character token 2.If the character is one of the ASCII characters 251 – 255, write the character once more so as to show that it is part of the text and not a marker Endif End (While) C. Stop.

ratios achieved by using IDBE as a preprocessing stage along with the BWT based compression scheme is compared with simple BWT and BWT with star encoding. Benchmark files from Calgary and Canterbury corpuses are used to validate the performances of the compression scheme. In almost every case, it has been observed that a better compression is achievable by using IDBE as the preprocessing stage for the BWT based compressor. It is also observed that IDBE gives better average BPC compared to LIPT for both Canterbury and Calgary corpuses. The results are really promising! TABLE I

BPC comparison of simple BWT, BWT with *Encode[11] and BWT with IDBE in Calgary corpuses

As an example, to demonstrate this a section of the text from Canterbury corpus version of bible.txt which looks like this in the original text: In the beginning God created the heaven and the earth. And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters. And God said, Let there be light: and there was light. And God saw the light, that it was good: and God divided the light from the darkness. And God called the light Day, and the darkness he called Night. And the evening and the morning were the first day. And God said, Let there be a firmament in the midst of the waters, and let it divide the waters from the waters. And God made the firmament, and divided the waters which were under the firmament from the waters which were above the firmament: and it was so. And God called the firmament Heaven. And the evening and the morning were the second day.

Running the above text through our Intelligent Dictionary Based Encoder (IDBE), which we have implemented in C ++, yields the following text: û©û!ü%;ûNü'Œû!ü"ƒû"û!û˜ÿ. û*û!û˜û5ü"8ü"}ÿ, û"ü2Óÿ; û"ü%Lû5ûYû!ü"nû#û!ü&“ÿ. û*û!ü%Ìû#ûNü&ÇûYû!ü"nû#û!ü#Éÿ. û*ûNûAÿ, ü"¿û]û.ü"’ÿ: û"û]û5ü"’ÿ. û*ûNü"Qû!ü"’ÿ, û'û1û5û²ÿ: û"ûNü(Rû!ü"’û;û!ü%Lÿ. û*ûNûóû!ü"’ü%…ÿ, û"û!ü%Lû-ûóü9[ÿ. û*û!ü'·û"û!ü#¹ûSû!ûºûvÿ. û*ûNûAÿ, ü"¿û]û.û&ü6 û%û!ü#?û#û!ü#Éÿ, û"û«û1ü,-û!ü#Éû;û!ü#Éÿ. û*ûNû‚û!ü6 ÿ, û"ü(Rû!ü#Éû:ûSü"2û!ü6 û;û!ü#Éû:ûSü"‚û!ü6 ÿ: û"û1û5ûeÿ.û*ûNûóû!ü6 ü#Wÿ. û*û!ü'·û"û!ü#¹ûSû!ü"ßûvÿ.

4. Performance analysis A. Bits Per Character and conversion time The performance issues such as Bits Per Character (BPC) and conversion time are compared for the three cases i.e., simple BWT, BWT with Star encoding and BWT with our proposed Intelligent Dictionary Based Encoding (IDBE). The compression

Proceedings of the International Conference on Mobile Business (ICMB’05) 0-7695-2367-6/05 $20.00 © 2005 IEEE

Fig.1. BPC comparison of simple BWT, BWT with *Encode[11] and BWT with IDBE for Calgary corpuses

5. Conclusion

Fig. 3. Conversion time comparison of transform with BWT, BWT with *Encoding[11] and BWT with IDBE for Calgary corpuses.

B. Transmission Time Over the Network The next step of our performance comparison experiments was to see the difference in data transmission time between the normal, uncompressed file and the file compressed using our compression system. We stored both these files on the Web and then recorded the time taken for downloading the same.

Transmission Time (Secs.)

350 300 250 200

Ordinary file Compressed file

150 100 50

c

p pr og

r6

pr og

r4 pe

pe pa

pa

j2

r2

ob

pe pa

2

ne ws

bi

bo

ok

b

0

File Name

Fig. 5. Comparison of transmission time over the Internet for uncompressed and compressed files using BWT with IDBE for Calgary corpuses.

The results were very much encouraging. The total time for transmission of the files have been greatly reduced. There are many other parameters affecting the total time of transmission of data over the network such as the traffic rate, topology, underlying protocols etc., any how, by employing the compression will considerably reduce the time of transmission.

Proceedings of the International Conference on Mobile Business (ICMB’05) 0-7695-2367-6/05 $20.00 © 2005 IEEE

The reduction of transmission time is directly proportional to the amount of compression. Our results have shown excellent improvement in text data compression. It also adds some level of confidentiality to the data by employing a coding mechanism involving a secret dictionary known only to sender and receiver. The concept of sorting data before compression is a powerful one; the preprocessing stage by making use of an intelligent dictionary is even more powerful. Over the last six years, there were significant improvements in text compression using BWT. We feel it may well have an application in text message transmission over the cellular mobile networks also.

6. References [1] M. Burrows and D. J. Wheeler. “ A Block-Sorting Lossless Data Compression Algorithm”. SRC Research Report 124, Digital Systems Research Centre. [2] H. Kruse and A. Mukharjee. “Data Compression Using Text Encryption”, Proc. Data Compression Conference, 1997, IEEE Computer Society Press, 1997, p.447. [3] H. Kruse and A. Mukharjee. “Preprocessing Text to Improve Compression Ratios”, Proc. Data Compression Conference, 1998, IEEE Computer Society Press, 1997, p. 556. [4] Dr. V. K. Govindan, and B. S. Shajee Mohan. “ An Intelligent Text Data Encryption and Compression for High Speed Data Transmission Over Internet”, Proceeding of the IIT Kanpur Hackers Workshop IITKHACK04, February 2004. [5] F. Awan, and A. Mukharjee, “LIPT: A lossless Text transform to Improve Compression”, Proceedings of the International Coference on Information and Theory: Coding and Computing, IEEE Computer Society, Las Vegas Nevada, April 2001. [6] F. Awan, NanZhang, N Motgi, R. Iqbal and A. Mukharjee, “LIPT: A reversible Lossless Text Transformation to Improve Compression Performance”, Proceedings of data Compression Conference, Snowbird, Utah, March 2001. [7] R. Franceschini and A. Mukharjee. “Data Compression Using Encrypted Text”, Proceedings of the third Forum on Research and Technology, Advances on Digital Libraries, ADL 96, pp. [8] R.Franceschini, H. Kruse, N Zhang, R.T. Iqbal and A. Mukharjee. “Lossless Reversible transmissions That Improve Text Compression Ratios” IEEE Transactions on Multimedia Systems, June 2000.

Compression Scheme for Faster and Secure Data ...

IDBE (Intelligent Dictionary Based Encoding) is used as a pre processing stage so as to improve the compression ratio and the rate of compression. The compression method suggested will greatly reduce the transmission time as well as the bandwidth requirements for text messages transmitted over networks. 2. Related ...

354KB Sizes 0 Downloads 298 Views

Recommend Documents

Data Compression July 2016 (2010 Scheme).pdf
Highlight the various distortion criterions used in lossy compression schemes. (04 Marks). Data Gompression. Note: Answer FIVE.full questions, selecting.

Data Compression
Data Compression. Page 2. Huffman Example. ASCII. A 01000001. B 01000010. C 01000011. D 01000100. E 01000101. A 01. B 0000. C 0001. D 001. E 1 ...

A Scheme for Attentional Video Compression
In this paper an improved, macroblock (MB) level, visual saliency algorithm ... of low level features pertaining to degree of dissimilarity between a region and.

Joint Compression/Watermarking Scheme Using ...
Abstract—In this paper, a watermarking scheme, called ma- jority-parity-guided error-diffused block truncation coding. (MPG-EDBTC), is proposed to achieve high image quality and embedding capacity. EDBTC exploits the error diffusion to effec- tivel

A Scheme for Attentional Video Compression
of low level features pertaining to the degree of dissimilarity between a .... rameters of salient and non-salient MBs to achieve compression, i.e, reduce rate.

A Scheme for Attentional Video Compression
ity, to yield probabalistic values which form the saliency map. These ... steps for computing the saliency map. ... 2.4 Learning to Integrate the Feature Maps.

A Key Management Scheme for Providing Secure ...
technology, Bluetooth has key distribution supports for secure multicasting over its unit one-hop network, piconet. Bluetooth core specification [1] defines basic ...

Data Compression Algorithms for Energy ... - Margaret Martonosi
Data Compression Algorithms for Energy-Constrained Devices in .... 1000000. 10000000. CC2420. CC1000. XTend. Radio. In s tru c tio n. C y c le s fo r. S a m e.

Data Compression Algorithms for Energy ... - Margaret Martonosi
focuses on more generic data streams from text files or the web which they ... the least of amount of energy to compress and send 1 MB of ...... rather than a PC.

A Secure and Robust Authentication Scheme against ...
Hyderabad, Andhra Pradesh, India [email protected]. 2Assistant Professor, Department of MCA, Teegala Krishna Reddy Engineering College. Hyderabad, Andhra Pradesh, India [email protected]. Abstract. The pollution attacks are amplified by t

An Efficient and Secure User Revocation Scheme in ...
a set of custom simulations built in Java. In the following, we detail .... fine-grained data access control in cloud computing,” in INFOCOM,. 2010, pp. 534–542.

A Secure and Robust Authentication Scheme against ...
content distribution in peer-to-peer networks to distributed file storage systems. .... swarming with network coding,” Microsoft Research, Cambridge, U.K. [Online].

Information Rates and Data-Compression Schemes for ...
The author is with the Department of System Science, University of. California, Los .... for R(D), and study practical data-compression schemes. It is worthwhile to ...

an intelligent text data encryption and compression for ...
encryption provides the required security. Key words: Data compression, BWT, IDBE, Star Encoding,. Dictionary Based Encoding, Lossless. 1. RELATED WORK AND BACKGROUND. In the last decade, we have seen an unprecedented explosion of textual information

Yobicash: a cryptocurrency for secure sharing and storage of data
The World Wide Web is built on top of technologies for sharing, storing and retrieving data. A few decades after its inception, the web has become the backbone of the information economy, and thanks to innovations as the Internet of Things, Virtual R

Secure Erase for SSDs Helps Sanitize Data and Boost ... - Media16
Intel IT uses secure erase (see Figure 1) because it is the most effective ... laptops every two to four years. .... EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY ...

Secure and Efficient Data Transmission.pdf
Page 1 of 5. Secure and Efficient Data Transmission for Cluster-Based. Wireless Sensor Networks. ABSTRACT: Secure data transmission is a critical issue for wireless sensor networks. (WSNs). Clustering is an effective and practical way to enhance the

Data Compression Algorithms for Energy-Constrained ...
Energy is Longevity. □Solar cells degraded quickly. Other: 5% ... Long radio active periods. □More energy intensive than stationary systems. Need to minimize ...

Flywheel: Google's Data Compression Proxy for the Mobile Web
in their network, for example to support automatic login to a billing portal site. Although web content filtering can be used as a means of censorship, our goal is ...

Improved Secure Routing Scheme in WSN - International Journal of ...
we will assign keys manually with Hash Function which is Blowfish. ... Authentication and encryption based on symmetrical cryptography are lightweight security ...

Exploring Application-Level Semantics for Data Compression..pdf ...
Exploring Application-Level Semantics for Data Compression..pdf. Exploring Application-Level Semantics for Data Compression..pdf. Open. Extract. Open with.

Flywheel: Google's Data Compression Proxy for the ... - Matt Welsh
to track and manage consumption to avoid throttling or overage fees. ..... logic like rate-limiting and handling untrusted external input. ..... Our evaluation answers two main ques- ..... bility issues, carrier dependencies, or legal constraints.

Flywheel: Google's Data Compression Proxy for the ... - Matt Welsh
apple-touch-icon requests that return a 404 error since the error page is not ... connection errors, software bugs, high server load) may occur along the path ...