November 1, 2002 Submitted to ISIT 2003

Simulation-Based Computation of Information Rates: Upper and Lower Bounds Dieter Arnold, Aleksandar Kavˇci´c, Hans-Andrea Loeliger, Pascal O. Vontobel, and Wei Zeng Abstract It has recently become feasible to compute information rates of finite-state source/channel models with not too many states. Such methods can also be used to compute upper and lower bounds on the information rate of very general (non-finite-state) channels with memory by means of finite-state approximations. We review these methods and present new reduced-state bounds.

1

Introduction

We consider the problem of computing the information rate 

1 I(X1 , . . . , Xn ; Y1 , . . . , Yn ) n→∞ n

I(X; Y ) = lim

(1)

between the input process X = (X1 , X2 , . . .) and the output process Y = (Y1 , Y2 , . . .) of a time-invariant channel with memory. We will assume that X is Markov or hidden Markov, and we will primarily be interested in the case where the channel input alphabet X (i.e., the set of possible values of Xk ) is finite. For finite-state channels (to be defined in Section 2), a practical method for the computation of (1) was presented independently by Arnold and Loeliger [1], by Sharma and Singh [11], and by Pfister et  al. [10]. That method consists essentially of sampling both a long input sequence xn = (x1 , . . . , xn ) and  the corresponding output sequence y n = (y1 , . . . , yn ), followed by the computation of log p(y n ) (and, if necessary, of log p(y n |xn )) by means of a forward sum-product recursion on the joint source/channel trellis. We will review this method in Section 2. Extension of such methods to very general (non-finite state) channels were presented in [2]. These extensions use finite-state approximations of the actual channel. By simulations of the actual source/channel and computations using the finite-state model, both an upper bound and a lower bound on the information rate of the actual channel are obtained. We will review these bounds in Section 3 and give new numerical results. In Section 4, we propose a new upper bound and a generic new lower bound on the information rate, which complement the bounds of [2]. Related earlier and parallel work includes [6] [12] [13] [5] [7] [14], see [2].

2

Computing I(X; Y ) for Finite-State Channels

In this section, we review the method of [1] [11] [10]. We will assume that X, Y , and S = (S0 , S1 , S2 , . . .) are stochastic processes such that p(x1 , . . . , xn , y1 , . . . , yn , s0 , . . . , sn ) = p(s0 )

n 

p(xk , yk , sk |sk−1 )

(2)

k=1

D. Arnold and H.-A. Loeliger are with the Dept. of Information Technology and Electrical Engineering, ETH, CH-8092 Z¨ urich, Switzerland. Email: {arnold, loeliger}@isi.ee.ethz.ch. A. Kavˇci´c and W. Zeng are with the Division of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138. Email: {kavcic, wzeng}@deas.harvard.edu. P. O. Vontobel is with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801. Email: [email protected]. Supported by Grant NSF CCR 99-84515.

1

for all n > 0 and with p(xk , yk , sk |sk−1 ) not depending on k. We will assume that the state Sk takes values in a finite set and we will assume that the process S is ergodic; under the stated conditions, a sufficient condition for ergodicity is p(sk |s0 ) > 0 for all s0 , sk for all sufficiently large k. For the sake of clarity, we will further assume that the channel input alphabet X is a finite set and that the channel output Yk takes values in R; none of these assumptions is essential, however. With these assumptions, the left-hand side of (2) should be understood as a probability mass function in xk and sk , and as a probability density in yk . Under the stated assumptions, the limit (1) exists. Moreover, the sequence − n1 log p(X n ) converges with probability 1 to the entropy rate H(X), the sequence − n1 log p(Y n ) converges with probability 1 to the differential entropy rate h(Y ), and − n1 log p(X n , Y n ) converges with probability 1 to H(X)+ h(Y |X), cf. [4]. From the above remarks, an obvious algorithm for the numerical computation of I(X; Y ) = h(Y ) − h(Y |X) is as follows: 1. Sample two “very long” sequences xn and y n . 2. Compute log p(xn ), log p(y n ), and log p(xn , y n ). If h(Y |X) is known analytically, then it suffices to compute log p(y n ). 3. Conclude with the estimate 1 1 1 ˆ I(X; Y ) = log p(xn , y n ) − log p(xn ) − log p(y n ) n n n 1 n ˆ or, if h(Y |X) is known analytically, I(X; Y ) = − n log p(y ) − h(Y |X).

(3)

The computations in Step 2 can be carried out by forward sum-product message passing through the factor graph of (2), as illustrated in Fig. 1. Since the graph represents a trellis, this computation is just the forward sum-product recursion of the BCJR algorithm [3]. Consider, for example, the computation of  p(xn , y n , sn ) (4) p(y n ) = xn ,sn 

with sn = (s0 , s1 , . . . , sn ). By straightforward application of the sum-product algorithm [8], we recursively compute the messages (i.e., state metrics)  µf (sk−1 ) p(xk , yk , sk |sk−1 ) (5) µf (sk ) = xk ,sk−1

=



p(xk , y k , sk )

(6)

xk ,sk−1

for k = 1, 2, 3, . . ., as illustrated in Fig. 1. The desired quantity (4) is then obtained as  p(y n ) = µf (sn ), sn

the sum of all final state metrics. In practice, the recursion rule (5) is modified to include a suitable scale factor, cf. [2].

S0 -

X1

X2

S1 -

y1

y2

S2 -

X3

S3 . . . -

y3

Figure 1: Computation of p(y n ) by message passing through the factor graph of (2).

2

(7)

Computing Bounds on I(X; Y ) for General Channels

3

Let p(xn , y n ) be some ergodic source/channel law. Let q(y n |xn ) be another ergodic channel and define   qp (y n ) = xn p(xn )q(y n |xn ). As described in [2], we then have I q (X; Y ) ≤ I(X; Y ) ≤ I q (X; Y ) 

with 

I q (X; Y ) = lim Ep(·,·) n→∞

 1 1 n n n log p(Y |X ) − log qp (Y ) n n

(8) (9)

 1 1 n n n log q(Y |X ) − log qp (Y ) . (10) I q (X; Y ) = lim Ep(·,·) n→∞ n n Now assume that p(·|·) is some “difficult” (non-finite-state) ergodic channel. As shown in [2], we can compute the bounds I q (X; Y ) and I q (X; Y ) on the information rate I(X; Y ) by the following algorithm: 

and



1. Choose a finite-state source p(·) and an auxiliary finite-state channel q(·|·) so that their concatenation is a finite-state source/channel model as defined in Section 2. 2. Connect the source to the original channel p(·|·) and sample two “very long” sequences xn and y n . 3. Compute log qp (y n ) and, if necessary, log p(xn ) and log q(y n |xn )p(xn ) by the method described in Section 2. 4. Conclude with the estimates

and

1 Iˆq (X; Y ) = − log qp (y n ) − h(Y |X) n

(11)

1 1 1 ˆ I q (X; Y ) = log q(y n |xn )p(xn ) − log p(xn ) − log qp (y n ). n n n

(12)

Note that the term h(Y |X) in the upper bound (11) refers to the original channel and cannot be computed by means of the auxiliary channel.

4

Reduced-State Bounds

Let Sk be a subset of the time-k states. If the sum in the recursion rule (5) is modified to  µf (sk ) = µf (sk−1 ) p(xk , yk , sk |sk−1 ),

(13)

 xk ,sk−1 ∈Sk−1

the sum of the final state metrics will be a lower bound on p(y n ) and the corresponding estimate of h(Y ) will be increased. We have proved: Theorem 1. Omitting states from the computation (5) yields an upper bound on h(Y ).



Sk

The sets may be chosen arbitrarily. An obvious strategy is to keep only a fixed number of states with the largest metrics. By a similar argument, one may obtain Theorem 2. Merging states in the computation (5) yields a lower bound on h(Y ).



So far, however, only the upper bound has proved useful. The upper bound of Theorem 1 can also be applied to non-finite state channels as follows. Consider, e.g., the autoregressive channel of Fig. 2 and assume that, at time zero, the channel is in some fixed initial state. At time one, there will be two states; at time two, there will be four states, etc. We track all these states according to (5) until there are too many of them, and then we switch to the reduced-state recursion (13). 3

5

Numerical Examples

We consider binary-input linear intersymbol interference channels with  Yk = gi Xk−i + Zk ,

(14)

i

with Xi ∈ {+1, −1}, and where Z = (Z1 , Z2 , . . .) is white Gaussian noise with variance σ 2 . The fixed   channel coefficients gi ∈ R, i ∈ Z, will be specified by their D transform G(D) = i gi Di , and we will assume  gi2 = 1. (15) i

The signal-to-noise ratio (SNR) in the plots is defined as 1/σ 2 (i.e., the noise power is normalized with respect to the channel input). The source process X = (X1 , X2 , . . .) will be a sequence of independent and uniformly distributed (i.u.d.) random variables taking values in {+1, −1}. 10 1 i Channel 1: Memory 10 FIR filter with G(D) = γ i=0 1+(i−5) 2 D where γ ∈ R is the scale factor required by (15). Fig. 4 shows the following curves. Bottom: The exact information rate, computed as described in Section 2 (with sampled sequences of length n = 106 ). Top: The reduced-state upper bound (RSUB) of Section 4, using the 100 “best” (out of 1024) states. Middle: The reduced-state upper bound applied to the equivalent minimum-phase channel. The trick behind the middle curve in Fig. 4 is as follows. Let  G(D) = β (1 − ζi D).

(16)

i

Assuming that G(D) has no zeros on the unit circle, the equivalent minimum-phase filter is   (1 − ζi D) · (D − ζi ), G (D) = β i:|ζi |<1

(17)

i:|ζi |>1

which has all zeros outside the unit circle. It is easy to see that 

H(D) = =

G (D)/G(D)  i:|ζ |>1 (D − ζi )  i i:|ζi |>1 (1 − ζi D)

(18) (19)

is an all-pass filter with a stable inverse. Therefore, replacing G(D) by G(D)H(D) = G (D) does not change the information rate of the channel. Minimum-phase polynomials concentrate the signal energy into the leading tap weights [9], which makes the reduced-state bound tighter. Channel 2: First order IIR filter as in Fig. 2 with G(D) = γ/(1 − αD) = γ(1 + αD + α2 D2 + . . .), where γ ∈ R is the scale factor required by (15). Fig. 5 shows the following curves. Rightmost: The (indistinguishable) upper and lower bounds (AUB and ALB) of Section 3, computed using the finite-state model of Fig. 3 with 512 states, with an optimized uniform quantizer, and with optimized σ  . Very close to the left: The reduced-state upper bound (RSUB) of Section 4 using only 4 (!) states. Leftmost: The memoryless binary-input (BPSK) channel. Fig. 6 shows information rates vs. the number of trellis states used in the computation (for σ 2 = 1). Top and bottom: the upper and lower bounds of Section 3 (AUB and ALB). Middle: the reduced-state upper bound (RSUB). Channel 3: IIR filter of order 6 with G(D) = γ/(1.0000 + 0.3642·D + 0.0842·D2 + 0.2316·D3 − 0.2842·D4 + 0.2084·D5 + 0.2000·D6). Fig. 7 shows the following curves. Leftmost: BPSK. Middle: Reduced-state upper bound using only 2 (!) states. Rightmost: Reduced-state upper bound using 128 states. 4

AWGN, σ 

AWGN, σ ? q-g

Xk - g {±1} 6 D 

Yk

? q-g

Xk - g {±1} 6

 α  

D 



Yk

 α  

Quantizer Figure 2: IIR filter channel.

1

Figure 3: A quantized version of Fig. 2.

RSUB, 100 states, nonminimum phase polynomial RSUB, 100 states, minimum phase polynomial Full trellis (1024 states)

BPSK RSUB, 4 states AUB, 512 states ALB, 512 states

1

0.8

Bits/Symbol

Bits/Symbol

0.8

0.6

0.4

0.4

0.2

0.2

0 −15

0.6

−10

−5

0 SNR [dB]

5

10

0 −15

15

Figure 4: Memory 10 FIR filter.

−10

−5

0 SNR [dB]

5

10

15

Figure 5: Bounds for Fig. 2 vs. SNR.

1.3 AUB RSUB ALB

1

BPSK RSUB, 2 states RSUB, 128 states

1.1

0.9

Bits/Symbol

Bits/Symbol

0.8

0.7

0.4

0.5

0.3 1

0.6

0.2

2

3

4 log2(#states)

5

6

0 −15

7

Figure 6: Bounds for Fig. 2 vs. # states.

−10

−5

0 SNR [dB]

5

10

Figure 7: Order 6 IIR filter: upper bounds.

5

15

6

Conclusions

It has recently become feasible to compute information rates of finite-state source/channel models with not too many states. By new extensions of such methods, we can compute upper and lower bounds on the information rate of very general non-finite state channels. Bounds from channel approximations and bounds from reduced-state trellis computations can be combined in several ways.

References [1] D. Arnold and H.-A. Loeliger, “On the information rate of binary-input channels with memory,” Proc. 2001 IEEE Int. Conf. on Communications, Helsinki, Finland, June 11–14, 2001, pp. 2692–2695. [2] D. Arnold, H.-A. Loeliger, and P. Vontobel, “Computation of information rates from finite-state source/channel models,” Proc. 40th Annual Allerton Conference on Communication, Control, and Computing, (Allerton House, Monticello, Illinois), October 2 – October 4, 2002, to appear. Available from http://www.isi.ee.ethz.ch/~loeliger/. [3] L. R. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Information Theory, vol. 20, pp. 284–287, March 1974. [4] A. Barron, “The strong ergodic theorem for densitities: generalized Shannon-McMillan-Breiman theorem,” Annals of Prob., vol. 13, no. 4, pp. 1292–1303, 1995. [5] A. J. Goldsmith and P. P. Varaiya, “Capacity, mutual information, and coding for finite-state Markov channels,” IEEE Trans. Information Theory, vol. 42, pp. 868–886, May 1996. [6] W. Hirt, Capacity and Information Rates of Discrete-Time Channels with Memory. ETH-Diss no. 8671, ETH Zurich, 1988. [7] A. Kavˇci´c, “On the capacity of Markov sources over noisy channels,” Proc. 2001 IEEE Globecom, San Antonio, TX, pp. 2997–3001, Nov. 25–29, 2001. [8] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Information Theory, vol. 47, pp. 498–519, Feb. 2001. [9] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, 2nd ed. Prentice Hall, 1999. [10] H. D. Pfister, J. B. Soriaga, and P. H. Siegel, “On the achievable information rates of finite-state ISI channels,” Proc. 2001 IEEE Globecom, San Antonio, TX, pp. 2992–2996, Nov. 25–29, 2001. [11] V. Sharma and S. K. Singh, “Entropy and channel capacity in the regenerative setup with applications to Markov channels”, Proc. 2001 IEEE Int. Symp. Information Theory, Washington, DC, USA, June 24–29, 2001, p. 283. [12] Sh. Shamai, L. H. Ozarow, and A. D. Wyner, “Information rates for a discrete-time Gaussian channel with intersymbol interference and stationary inputs,” IEEE Trans. Information Theory, vol. 37, pp. 1527–1539, Nov. 1991. [13] Sh. Shamai and R. Laroia, “The intersymbol interference channel: lower bounds on capacity and channel precoding loss,” IEEE Trans. Information Theory, vol. 42, pp. 1388–1404, Sept. 1996. [14] P. O. Vontobel and D. Arnold, “An upper bound on the capacity of channels with memory and constraint input,” Proc. 2001 IEEE Information Theory Workshop, Cairns, Australia, Sept. 2–7, 2001, pp. 147–149.

6

Simulation-Based Computation of Information Rates ... - CiteSeerX

Nov 1, 2002 - Email: {kavcic, wzeng}@deas.harvard.edu. ..... Goldsmith and P. P. Varaiya, “Capacity, mutual information, and coding for finite-state Markov.

253KB Sizes 1 Downloads 308 Views

Recommend Documents

Computation of Information Rates from Finite-State ... - ETH Zürich
[email protected]. Hans-Andrea Loeliger [email protected]. Pascal O. Vontobel [email protected]. Signal & Information Proc. Lab. (ISI). ETH Zentrum. CH-8092 Zürich, Switzerland. Allerton 2002. Abstract. It has recently become feasibl

Distributed PageRank Computation Based on Iterative ... - CiteSeerX
Oct 31, 2005 - Department of Computer. Science. University of California, Davis. CA 95616, USA .... sults show that the DPC algorithm achieves better approx-.

Linear-Space Computation of the Edit-Distance between a ... - CiteSeerX
for 2k string-automaton pairs (xi k,Ai k)1≤i≤2k . Thus, the complexity of step k is in O(∑ ... In Proceedings of the 12th biennial European Conference on Artificial.

Information Discovery - CiteSeerX
For thousands of years, people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information in electronic form — and finding useful nee- dles in the result

Information processing, computation, and cognition
Apr 9, 2010 - Published online: 19 August 2010 ... University of Missouri – St. Louis, St. Louis, MO, USA ...... different notions of computation, which vary in both their degree of precision and ...... Harvard University Press, Cambridge (1990).

Information processing, computation, and cognition - Semantic Scholar
Apr 9, 2010 - 2. G. Piccinini, A. Scarantino. 1 Information processing, computation, and the ... In recent years, some cognitive scientists have attempted to get around the .... used in computer science and computability theory—the same notion that

Nielsen, Chuang, Quantum Computation and Quantum Information ...
Nielsen, Chuang, Quantum Computation and Quantum Information Solutions (20p).pdf. Nielsen, Chuang, Quantum Computation and Quantum Information ...

Nielsen, Chuang, Quantum Computation and Quantum Information ...
Nielsen, Chuang, Quantum Computation and Quantum Information Solutions (20p).pdf. Nielsen, Chuang, Quantum Computation and Quantum Information ...

Information processing, computation, and cognition - Semantic Scholar
Apr 9, 2010 - Springer Science+Business Media B.V. 2010. Abstract Computation ... purposes, and different purposes are legitimate. Hence, all sides of ...... In comes the redness of a ripe apple, out comes approaching. But organisms do ...

Reversible jump Markov chain Monte Carlo computation ... - CiteSeerX
Bayesian model for multiple change-point analysis, and develop a reversible jump Markov chain Monte Carlo ...... of Markov chain Monte Carlo computation can be extended to new classes of problems, ... App. Statist., 41, 389{405. Consonni ...

Data-Oblivious Graph Algorithms for Secure Computation ... - CiteSeerX
a server holds a database and a clients wants to a retrieve a record at a specific position with the goal that the ... anything about any other records in the database except the record of interest. Current PIR solutions exist in both the ...... O. C

Reversible jump Markov chain Monte Carlo computation ... - CiteSeerX
Email: [email protected]. Presented at the workshop on Model ..... simply, by following a standard \template". We give further details in the following ...

Data-Oblivious Graph Algorithms for Secure Computation ... - CiteSeerX
privacy concerns, especially when the data contains per- sonal, proprietary, or otherwise sensitive information. To protect such ... Data privacy is then guaranteed if the memory accesses are data-independent or oblivious. In this work we .... This f

Eliciting Information on the Distribution of Future Outcomes - CiteSeerX
Oct 20, 2009 - Department of Computer Science, Stanford University, Stanford CA 94305 ... She might want to compare the production between subunits, might have .... Alice wishes to learn the mean of a random quantity X that takes.

Eliciting Information on the Distribution of Future Outcomes - CiteSeerX
Oct 20, 2009 - common—case of eliciting distribution parameters, I show that all the elicitation schemes that give proper ..... I show that for any regular information that satisfies the convexity condition there ...... Games and economic behavior 

Mutual Information Statistics and Beamforming ... - CiteSeerX
G. K. Karagiannidis is with the Department of Electrical and Computer. Engineering, Aristotle ... The most general approach so far has been reported in [16] ...... His research mainly focuses on transmission in multiple antenna systems and includes p

Mutual Information Statistics and Beamforming ... - CiteSeerX
Engineering, Aristotle University of Thessaloniki, 54 124, Thessaloniki,. Greece (e-mail: ...... Bretagne, France and obtained a diploma degree in electrical ...

Textline Information Extraction from Grayscale Camera ... - CiteSeerX
INTRODUCTION ... our method starts by enhancing the grayscale curled textline structure using ... cant features of grayscale images [12] and speech-energy.

Computation of Time
May 1, 2017 - a Saturday, a Sunday, or a legal holiday as defined in T.C.A. § 15-1-101, or, when the act to be done is the filing of a paper, a day on which the ...

Geometric Algebra in Quantum Information Processing - CiteSeerX
This paper provides an informal account of how this is done by geometric (aka. Clifford) algebra; in addition, it describes an extension of this formalism to multi- qubit systems, and shows that it provides a concise and lucid means of describing the

Weighted Average Pointwise Mutual Information for ... - CiteSeerX
We strip all HTML tags and use only words and numbers as tokens, after converting to .... C.M., Frey, B.J., eds.: AI & Statistics 2003: Proceedings of the Ninth.

Computation vs. information processing: why their ...
For engi- neering purposes, information is what is transmitted by messages ... was seen as a computer, whose function is to receive information from the .... On the literal interpretation, which we are considering in the text, 'analog computer' is ..

Rates of Change
(a) When t = 3, V: I80 m 2. (b) when t = 0, V = Om L. i.e. initially empty. (e) when t = 5, V : 300m L. (d) 60mL/5. 6. dE = { 13 (4 minutes, E cm3). (a) li) when t = 0, ...