126x Filetype PDF File size 0.52 MB Source: www.ijettcs.org
International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 4, July – August 2013 ISSN 2278-6856 Enhancement of Security and Embedding Capacity through Huffman Coding in Steganography 1 2 Sanjay Bajpai , Dr. Kanak Saxena 1 Department of Computer applications, Lakshmi Narain College of Technology, RGPV, Bhopal Kalchuri Nagar, Raisen Road, Bhopal (MP), INDIA. 2 Department of computer applications, Samrat Ashok Technological Institute Civil Lines, Vidisha (M.P.), INDIA. Abstract: In this paper, we incorporated Huffman coding also used GA in their proposed block truncation coding algorithm in digital image steganography to enhance both the (BTC) method that can embed about 3 bits in each BTC- data security and embedding capacity. Data is compressed by encoded block in an average. Elham Ghasemi et al [6] variable length compression technique and then embedded in embedded the secret message in Discrete Wavelet digital colored images using modified LSB technique. Pixels Transform coefficients by dividing the image into 4×4 are broken into RGB components and some of the least blocks. In our work, we have taken the digital colored significant bits, based on certain criterion, of these images as the cover and text message as the host. Factors components are used to embed the secret message. Security is extended to a greater extent by using multi-keys for that play a vital role in digital image steganography are embedding and data compression ratio varies from 45% to image distortion, embedding capacity and security of data 80% that depends on the type of data embedded. Here, we [7] and care is taken in the proposed algorithm to have categorized data into three types, small length, larger optimize these factors. We have increased the security of length and data from a particular domain. Our experimental data by using multi-key LSB substitution method and results have shown that distortion in the stego-image is embedding capacity by compartmentalizing the pixels negligible, very difficult to extract the message and into its components. embedding capacity increased to a large extent. Further security and embedding capacity is enhanced by Keywords: Compression ratio, Embedding capacity, incorporating the Huffman coding algorithm and results Distortion show that embedding capacity is increased from 45% to 1. INTRODUCTION 55 % that depends on the type of data chosen for hiding. Remainder of the paper is organized as follows. In section The present work shows the embedding of compressed 2, we shall briefly discuss about the related work and data through Huffman coding in steganography. The Huffman coding algorithm. Section 3 describes the native meaning of word “steganography” is hidden proposed method with analysis. Section 4 includes case writing and is originated from the Greek language. It is studies, experimental results and comparisons with an art and science in which secret data is hid in other previous works. Finally, the conclusion is presented in medium known as cover and the thing which is hidden is section 5. known as host [1]. The basic advantage of steganography is to keep the unwanted persons or intruders away from 2. RELATED WORK the actual fact and this technique is successful since they Most popular and core concept of steganography is to are not able to see the hidden message and only cover is hide the secret message in digital images by changing the visible. Most common method employed in least significant bits of the pixels. Xin Liao and et al. [8] steganography is the LSB substitution method in which focused on finding the edge pixels of the cover image to two or three bits are replaced by the bits of the secret hide the secret message because these are the locations message so that distortion is not visible by human eyes where changes are least visible. Rosziati Ibrahim and et [2], [3] but they are unable to give high embedding al. [9] proposed a SIS (Steganography Imaging System) capacity and because of the same pattern, steganalysis in which they converted the secret message into binary techniques can detect them. El Safy et al. [4] used codes and then embedded the two bits in each pixel. adaptive data embedding technique involving Optimal Mahmud Hasan and et al. [10] proposed a block Pixel Adjustment Process to hide data in the Integer processing mechanism in which they divided the image Wavelet coefficients of the cover image. Recent trends into a number of 4×4 non-overlapping blocks. Most show that genetic algorithm (GA) is commonly used to Frequent Pixels (MFPs) and Second Most Frequent Pixels embed the secret message and Chin-Chen Chang et al [5] (SMFPs) of each block were identified and after deleting Volume 2, Issue 4 July – August 2013 Page 73 International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 4, July – August 2013 ISSN 2278-6856 their occurrences, remaining pixels were over written by Ax the encoded bits of the secret message. L(C, X) P(x)l(x) pili (3) 2.1 Huffman Coding x Ax i 1 Huffman coding is one of the popular entropy encoding Fig.2 shows the result of coding and compression on the algorithms used for lossless data compression [11]. It is given Ensemble X stated below- classified into Static Huffman coding and Adaptive A = { a , b , c , d } x Huffman coding and later is more beneficial than former P = {1/2 , 1/4 , 1/8, 1/8 } x in terms of number of internal nodes, path length and height of the tree in memory representation [12]. To do the encoding, at first characters of the file with their frequency are stored in a list and sorted in the increasing order of frequency. To achieve optimality, Huffman joins the two symbols with lowest probability/frequency and replaces them with a new fictive node whose probability is the sum of the other nodes' probabilities. The two removed symbols are now two nodes in the binary tree. The fictive node is their parent and is not inserted in the binary tree yet but kept in the input list. We repeat this Figure 2 Code and code length of the symbols given process until input list becomes empty and at the same probabilities time binary tree is constructed whose frequency indicates the total number of symbols in the file. Code for each symbol is constructed by traversing the tree from the root to leaf by assigning 0 for left visit and 1 for right visit. Encoded tree for a sample data is shown in Fig.1. Figure 3 Contiguous interval representation of symbols 2.2 Huffman Decoding Decoding procedure can be performed by the binary Figure 1 Encoded tree for ETASNO search algorithm if the Huffman tree is stored in the array. This can be demonstrated by taking the sample For example, here TEA will be encoded as 10 00 010 and data from Fig.3 where codes of 8 symbols are shown in SEA will be encoded as 011 00 010 [12]. These codes are the increasing order [13]. Let the input code be ‘01010’, in the form of bits and space occupied for one symbol is then fifth [(1+8) / 2 = 5] entry ‘01101’ is tested. Since reduced which in general take 8 bits. An ensemble X is a ‘01010’ is smaller than ‘01101’ so the third [(1+4)/2=3] triple (x, A , P ) entry is tested. This process is continued until the given x x x: value of a random variable code is found by reducing the list into half of its size in A: set of possible values for x , A ={a , a , …, a} each iteration or the size of the list becomes only one. x x 1 2 i P : probability for each value , P ={p , p , …, p} x x 1 2 i 3. PROPOSED METHOD where P(x) = P(x = a) = p, p > 0, and pi 1 i i i Shannon information content of x is shown in equation 3.1 Analysis (1) In the proposed work, we are using multi-key LSB h(x) = log2(1/P(x)) (1) substitution method in which we are compartmentalizing and entropy of x is stated in equation (2) the pixels into RGB components to make it more secure H(x) P(x).log 1 (2) and every pixel is capable of hiding one character so p(x) embedding capacity varied linearly, that is x Ax and the expected length L(C,X) of symbol code C for X is N(c)N(p) (4) stated in equation (3) which achieves as much where N(c) is the number of characters that can be compressiion as possible – embedded and N(p) is the number of pixels of the image. We start by accumulating all the pixels of the cover image Volume 2, Issue 4 July – August 2013 Page 74 International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 4, July – August 2013 ISSN 2278-6856 in an array P. In our proposed algorithm, we are without causing any distortion to the cover image and use deploying four keys for embedding and extraction. KEY1 of multi keys make it more secure. (k11, k12, k13) which is a composite key and is used for 3.2 Algorithm (Embedding Process) selecting the bits of the pixel components and KEY2 1. Store all the different characters of a file and their (k21, k22, k23) is used for deciding the pattern of the bits frequency in a two dimensional array T1 in the of the characters comprising the secret message. Two increasing order of frequency and accumulate all the keys KEY3 and KEY4 are formulated based on the size pixels of the image in one dimensional array P of size and texture of the cover image to fix the region for M*N where M×N is the size of the image. embedding and to decide the gap between the pixels if 2. Construct the Huffman binary tree T by merging the needed. elements from left to right until all the elements of the To increase the embedding capacity and to make it array T1 are exhausted using the Huffman coding further secure we incorporated the Huffman compression algorithm. technique. We assumed that the secret message which is 3. Generate the Huffman codes for every character in the to be embedded may vary from a few lines to thousands of array T1 using the Huffman binary tree T and store lines. If message length is very large then Huffman them in another array T2 along with the code length. compression technique plays a vital role in increasing the 4. Calculate the keys KEY3 and KEY4 by incorporating embedding capacity and security. We used the variable the size and texture of the cover image and length of length code to make it further efficient. If T is a tree the secret message. KEY3 fixes the starting position of corresponding to a prefix code then number of bits the cover image and KEY4 fixes the gap between two required to encode a file is pixels. B(T) f (c)dT(c) (5) 5. Set the variable MaxBits to 9. c C 6. Read each code from the array T2 one by one, find its where ƒ(c) denote the frequency of a character c and d (c) T length ‘Len’ and repeat steps from 6 to 10. denote the depth of c’s leaf in tree T and is also the length 7. Repeat steps 8 and 9 until (Len ≥ MaxBits). of the codeword for character c [14]. 8. Select a pixel using keys KEY3 and KEY4 and After calculating the probabilities of different symbols bifurcate it into its components and embed first nine that are occurring in the secret message, Huffman codes bits of the code, 3 bits in each RGB components. are obtained by generating the prefix-free codes and 9. Reduce the length of the code by MaxBits that have placing the symbols at the leaves of a code tree. This been embedded. helps in obtaining the instantaneously decodable code of 10. Embed the remaining bits of the code in next pixel minimal expected length L satisfying the equation using the keys KEY1 and KEY2. 11. Rejoin all the pixels of the array P to form the stego- H(S) L H(S) 1 (6) image where H(S) is the entropy of a source S or the expected information of a symbol [15], [16]. If source S is emitting 3.3 Algorithm (Extraction Process) one of symbols s , s , …, s at each time with probabilities 1. Accumulate pixels of the stego-image in the byte array 1 2 n p1, p2, …, pn respectively independent of the symbols T and extract the keys KEY1, KEY2, KEY3 and emitted at other times then in a very long string of K KEY4. emissions we expect to get Kp , Kp , …, Kp instances of 2. Read the length of the code from the input array T2. 1 2 n the symbols s , s , …, s respectively. These emitted 3. Pick up pixels from the array T using KEY3 and 1 2 n symbols are independent and identically distributed. This KEY4. is supported by the law of large numbers. If the expected 4. Extract the bits from the pixels using keys KEY1 and or mean number of occurrences of symbol s in K 1 KEY2 to generate the code. independent repetitions is Kp where p is the probability 1 1 5. Decode the code to get the symbol using decoding of getting s in a single trial then standard deviation 1 process discussed in 2.2 and append the symbol in around this mean is sqrt{Kp1(1-p1)}. Therefore, fractional String S. one-std spread around the mean is sqrt{(1-p ) ∕ (Kp ), that 1 1 6. Repeat steps 2 to 5 until all the elements of array T2 is, for large K, the number of occurrences of s is 1 are exhausted. relatively tightly concentrated around the mean value of 7. The string S is the extracted message. Kp [15]. In our proposed method, every pixel of the 1 cover image can be embedded and to make it distortion 4. EXPERIMENTAL RESULTS free, embedding is done in all the components of the We selected and categorized data into three types, that is, pixel. Since data is compressed before embedding micro (message length less than 100 characters), macro therefore a message of very long length can be embedded (message length of more than 1000 characters) and Volume 2, Issue 4 July – August 2013 Page 75 International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 2, Issue 4, July – August 2013 ISSN 2278-6856 texture (message from a particular domain) and tested our For normal messages, embedding capacity is almost results on 50 different digital colored images out of which doubled as can be seen in table 1 where compression ratio 5 results are shown. Different types of images are chosen varies from 47% to 52% and it gives very significant as the cover and robustness is tested by selecting results for small messages where compression ratio varies contiguous pixels of any region of the image to embed the from 75% to 85% and these compression ratios totally message. Images img1c, img2c, img3c, img4c, img5c are depend on the frequencies of the symbols occurring in the the cover images and img1s, img2s, img3s, img4s, img5s message. Compression through Huffman coding not only are the stego-images. Micro message of length 92 is increases the embedding capacity but security is also embedded in image ‘img1c’ in Fig. 4(a) and increased as the symbols are embedded in the form of corresponding stego-image ‘img1s’ is shown in Fig. 4(b). codes. Comparative results in peak position for macro and Macro message of length 1537 and 1687 is embedded in texture type messages are shown in table 2. ‘img2c’ in Fig. 5(a) and ‘img3c’ in Fig. 6(a) respectively and corresponding stego-images ‘img2s’ and ‘img3s’ are Result Images shown in Fig. 5(b) and Fig. 6(b). Similarly, data of same texture of length 1503 and 1714 are embedded in ‘img4c’ and ‘img5c’ whose stego-images are ‘img4s’ and ‘img5s’ that are shown in Fig. 7(a), 7(b), 8(a), 8(b) respectively. Compression ratio by Huffman coding depends on the length and nature of data that is why we took different samples of three types of data whose result is shown in Figure 4(a) img1c (Cover image) Table 1. Maximum performance given by our algorithm is Table 2: Comparative Performance Measurements compared with the algorithms proposed by Mahmud Hasan et al. [10] and our previous designed algorithm which clearly show that our results have been improved as shown in Table 2. Mahmud Hasan et al. [10] compromised with only 32 bits out of 128 bits after performing calculations in their block processing approach and embedded characters of 7 bits. On an average, 4.57 characters can be embedded in 32 bits and considering the complete block of 4×4, 3.50 pixels are required to embed one character. In our previous algorithm, we stored 8 bits of a character per pixel by compartmentalizing it into its components and in no case failure occurred. In the proposed method, maximum 9 bits can be stored per pixel but it is observed that very few symbols are obtained whose code length is greater than 9. Figure 4 (b) img1s (Stego image) In general, code length varies from 4 to 7 for macro and texture type of messages and for micro message, code length varies from 1 to 4. Table 1: Results of Compression on Different type of data Message Message Number No. of Type Length of Bits Compression Bits (in per Ratio Embedde chars) Symbol d Micro 12 1.67 82.46 20 Micro 92 1.83 Figure 5 (a) img2c (Cover image) 77.17 168 Macro 1537 4.22 47.16 6497 Macro 1687 4.24 47.00 7152 Texture 1503 3.97 50.43 5960 Texture 1714 3.80 52.52 6510 Figure 5 (b) img2s (Stego image) Volume 2, Issue 4 July – August 2013 Page 76
no reviews yet
Please Login to review.