Modelling the Neural Response to Speech: Stochastic ... - CiteSeerX

Viewer
Transcript

IEEE EMBS CONFERENCE, MONASH UNIVERSITY, FEBRUARY 2001

1

Modelling the Neural Response to Speech: Stochastic Resonance and Coding Vowel-like Stimuli N. Hohn and A. N. Burkitt The Bionic Ear Institute, 384-388 Albert Street, East Melbourne, Vic 3002, Australia. Abstract— We study the effect of noise upon the transmission of information about an input signal containing two frequencies in a leaky integrate-and-fire neural model. This work extends the results of earlier studies on stochastic resonance in neural models, and particularly in models of auditory processing. It is found that the temporal coding of two sub-threshold formants can be enhanced by the presence of noise. This study provides an approximation to the response to vowel speech stimuli and the results have a bearing upon the possible effectiveness of incorporating noise in cochlear implant speech coding strategies. Keywords—Neural coding, stochastic resonance, Integrateand-fire neurons, speech.

I. Introduction There has been considerable interest recently in the possible use of noise in aiding the speech perception of cochlear implant users [20], [14], [21], [22], [8], [11], [26], [15], [30], [23]. Although the presence of noise in a signal is usually detrimental to its detectability, the theory of stochastic resonance (SR) demonstrates that in certain situations it can be beneficial [10]. Stochastic resonance is defined to be the enhancement of the output signal-to-noise ratio (SNR) in a non-linear system by the addition of noise [1], and it has been demonstrated in various bi-stable systems [9], [19] and biological systems [7], [18]. There has been a growing interest in SR in neural systems, since neurons contain significant amounts of noise, both internally due to the random opening and closing of ion channels [29] and externally due to spontaneous activity (i.e. spikes that are uncorrelated to the signal) [28] (these two types of noise are taken to be white and Gaussian with widths σN and σD resp. [13]). Neurons also exhibit a number of non-linearities in response to their input, the most important being the thresholding mechanism for generating action potentials (APs or spikes). SR has been analyzed in the leaky integrate-and-fire (LIF) neural model, in which the randomly arriving synaptic inputs are summed and an action potential is generated when the postsynaptic potential reaches threshold, using both the diffusion approximation and the Gaussian approximation [13], [12]. The Gaussian approximation allows a finite number N of synapses to be included [2], [3] and the membrane potential variance is de facto modulated by the input signal. This property has been shown confer to the mneuron model better signal processing capabilities than in the case of the traditional diffusion approximation [13]. In this paper we extend the results of our earlier work [13] to examine the situation in which the stimulus contains two

frequencies, making it much closer to the situation observed with acoustical stimulation using vowel stimuli. The aim is to show that the addition of ‘noisy’ spikes can help to transmit temporal information along a neural pathway. Such temporal information, which has been extensively investigated in the auditory pathway using acoustical stimulation (see [27], [25] for reviews), may also play a crucial role in electrical stimulation strategies for cochlear implants. The aim of the cochlear implant is to use electrical stimuli on electrodes located at different sites along the cochlea in order to generate patterns of neural excitation that encode frequency and intensity of the acoustic stimulus (see Clark [4] for a recent review). The results presented here provide some support to the hypothesis that designing speech coding strategies that would elicit stochastic neural patterns may benefit the users. II. Methods The stochastic leaky integrate-and-fire neuron is used as a model of a nerve cell. The membrane potential receives time-modulated random synpatic inputs and is subject to leakage with time constant τ (see [28] for a review). When the membrane potential reaches a threshold Vth , an output spike is fired and the membrane potential is deterministically reset. The notation used here is identical to that in [2], [3], [13], [12], where the methods used for evaluating the distribution of output spikes are described more fully. The spike trains are modeled by inhomogeneous Poisson processes (IHPPs), which provide a good model of experimentally observed spike trains in the auditory system [16]. We consider the input stimulus to the neuron to be a sum of two periodic functions with frequencies f1 and f2 < 2.5 kHz corresponding to the first two formants of the speech stimulus (formants are the peak frequencies in the power spectrum of speech [6]). The input spiking rate is defined by the sum: λin (t)

(1)

1 2 (1) 2π ηin

(1)

= rin T1 r

1 2 (2) 2π ηin

(2)

+rin T2 r (1)

E-mail: [email protected] E-mail: [email protected]

(2)

= λin (t) + λin (t)

(2)

(1)





+∞ X

2

 (t − mT1 )  exp − 2  (1) m=−∞ 2 ηin

+∞ X





 (t − mT2 )2  exp − 2  (, 1) (2) m=−∞ 2 ηin

(2)

where rin , rin , ηin and ηin characterize the number of spikes per unit of time and their degree of phase locking to

IEEE EMBS CONFERENCE, MONASH UNIVERSITY, FEBRUARY 2001 (2)

where hV∞ i is the time averaged offset expected value of the membrane potential. Hence in a given time interval (1) the number of spikes synchronized to λin (t) is the same (2) as the number of spikes synchronized to λin (t). This is again a simplification since it is not common for vowels to have equal energy. However this simplification could be easily avoided by introducing the ratio of the two formants energies in the remaining of the analysis (this has not been done here in order to simplify the mathematical expressions and focus upon the general principles of the derivation). III. Results The poly-periodic input rate λin (t) for the vowel ‘e’ in the phoneme ‘bet’ is shown by the black line in figure 1(a). The corresponding input power spectrum is shown in figure 1(b). The results from numerical simulations of the power spectrum are presented in figure 2, which shows peaks at the two driving frequencies and at linear combinations of f1 and f2 (see [13] for an explanation of the dip at low frequencies). Although an exact solution to the case of a polyperiodic input rate composed of two periodic functions could not be found, an approximation for a specific condition on f1 and f2 can be constructed as follows. Let v1 (t) and v2 (t) be the respective contributions to the membrane potential of the (1) two inhomogeneous Poisson spike trains with rates λin (t) (2) and λin (t). If f2 f1 , the overall membrane potential will have more or less the same fluctuations as v1 (t). Indeed the neuron acts as a low-pass filter, and the modulations of (2) λin (t) will therefore be strongly attenuated when filtered by the neuron membrane. More precisely, the input rate (2) λin (t) can be written as (2)

(2)

λin (t) = α0 + ζ(t),

(4)

where ζ(t) is a ‘high’ frequency function with frequency f2 and zero mean. Being a high frequency function, ζ(t) will be largely attenuated when put through the low-pass filter. The resulting output v2 (t) can therefore be approximated by (2) v2 (t) = Vth α0 1 − e(−t/τ ) + ξ(t) , (5)

(a) 1.4 1.2 1 0.8

I(t)

each of the driving frequencies. ηin and ηin are related to (1) (2) the synchronization indexes (SIs) sin and sin by [17] πη s = exp −2 . (2) T Since neurons tuning curves have a long low-frequency tail, it is possible that a neuron can phase lock to a low frequency (of the order of f1 here) even if its best frequency is close to f2 . The exact situation depends very much on the details of the input signal and where the formants are situated on the neuron’s tuning curve. It is assumed in the following that the two input spike trains each contribute half of the membrane potential, i.e. the following relation holds Z T τ hV∞ i (1) (2) = I(t)dt, (3) rin T1 = rin T2 = 2Vth 2T Vth 0

0.6 0.4 0.2 0 0

5

10

15

t

(b) 80

Power spectrum (dB)

(1)

2

60

40

20

0 0

0.5

1

1.5

2

2.5

f Fig. 1. (a) Polyperiodic input current (black line) and ‘low’ frequency approximation (thick gray line). (b) Power spectrum of the simulated input spike train (black) corresponding to the spectrum of the vowel ‘e’ obtained by Linear Prediction Coding analysis (thick gray line). Input parameters are f1 = 0.488kHz, f2 = 1.9kHz, (1) (2) rin T1 = rin T2 = 0.45 and τ = 1 ms.

where ξ(t) is a periodic function with period T2 and am(2) plitude much smaller than α0 . The input rate is treated here as a T1 -periodic signal with the same mean value and SI relative to T1 . The mean (1) firing rate rout and output SI sout (relative to f1 ) can be obtained using the methods used for just one frequency component [13]. However, since the input signal is not periodic, the analytical results are an approximation of the actual first-passage time density, but it in fact gives a good approximation of the true first-passage time density [12]. Since the output statistics are described by the mean (1) firing rate rout and the synchronization indexes sout and (2) sout relative to the two input frequencies f1 and f2 , the (2) only remaining unknown quantity is sout . Now the SNR at 2 (1) frequency f1 is given by rout sout T0 [13]. As the power spectrum background is approximated by a flat spectrum with intensity rout , the output power spectrum at the fre 2 (1) 2 sout T0 . quency f1 reads PT0 (f1 ) = rout The amplitude attenuation of the input signal λin (t) by the low pass filter is given by the module of its complex transfer function |H1 (f )| as [12] Vth τ

|H1 (f )| = p

1 + (2πf τ )2

.

(6)

IEEE EMBS CONFERENCE, MONASH UNIVERSITY, FEBRUARY 2001

3

15

70

PTo(f) (dB)

10

60

5

0

50

−5

PTo(f) (dB)

40

−10 0

0.5

1

1.5

2

30

2.5

f

20

Fig. 2. Output power spectrum obtained from the computer simulation of 100000 output spikes. Same parameters as figure 1.

Since the thresholding mechanism generating the output spikes has no internal time constant, it can be assumed that the effects of the threshold detector on the signal amplitude is the same for any frequency. Since f1 and f2 are assumed to be equally represented in the input signal, as detailed in equation (3), the value of the output power spectrum at frequency f2 can be obtained from its value at frequency f1 as |H1 (f2 )| . (7) PT0 (f2 ) = PT0 (f1 ) |H1 (f1 )|

10

0

−10 0

0.5

1

1.5

2

2.5

f Fig. 3. Output power spectrum from simulations (grey), analytical average background (black dotted) and values at the driving frequencies (black circles) obtained from the simulation of a single neuron (lower curve) and the pooled output of 1000 neurons (upper curve). Amplitude attenuation of the low pass filter PT0 (f1 )|H1 (f )|/|H1 (f1 )| (black dashed). Other parameters same as figure 1.

Once the value PT0 (f2 ) is found, the synchronization index relative to the frequency f2 is given by s s 1 PT0 (f2 ) |H1 (f2 )| (2) (1) sout = = sout . (8) rout T0 |H1 (f1 )|

(a) 18 16

C

=

f (λin (t), λout (t)) , f (λin (t), λin (t))

12

0

RT (dB)

14

Figure 3 shows the power spectrum of the output spike train of a single neuron and the power spectrum of the pooled output of 1000 neurons, with a backround level 30 dB higher. In both cases, the match between simulations and analytical results is excellent. In the case of an aperiodic input stimulus, SR can be quantified by a measure of the cross-correlation between the input and the output [5]. In what follows, SR will be measured by using the quantity

10 8 6 0.3

0.6

0.9

σ

1.2

1.5

1.8

1.2

1.5

1.8

(b) 1

(9)

C

where

0.5

n o f (x(t), y(t+θ)) = max R x(t)−hx(t)i, y(t+θ)−hy(t+θ)i , (t,θ)

(10) and R(x, y) is the cross-correlation function of x and y. C takes values between 0 (no correlation) and 1 (perfect correlation). Figure 4(a) show the traditional result of SR, namely a non-monotonicpvariation of the SNR as a function 2 + σ 2 . Figure 4(b) shows the of the input noise σ = σN D variation of C as a function of σ. In the case of the diffusion approximation σN = 0 (thick gray curve), C first increases as a function of σ, and then reaches a plateau for a large domain of variation of σ.

0 0.3

0.6

0.9

σ

Fig. 4. (a) SNRs relative to f1 (upper traces) and f2 (lower traces). (b) Crosscorrelation between input and output firing rates. Results for the diffusion approximation (solid grey line) and for the values of N corresponding to σN = 0.7 (dashdot), σN = 0.9 (dotted), σN = 1.1 (solid), σN = 1.3 (circles), σN = 1.5 (diamonds). Other parameters same as figure 1.

IEEE EMBS CONFERENCE, MONASH UNIVERSITY, FEBRUARY 2001

IV. Discussion It might seem odd at first that the quantity C does not decrease for a high noise level σ, meaning that the addition of an increasing amount of noise does not deteriorate the quality of the output significantly, the ‘no-tuning’ effect [5], [24]. When using this cross-correlation measure to quantify SR, one is left with the impression that the optimum noise level can be chosen within a wide range, making it almost independent of the input stimulus. However, the cross-correlation C is misleading in the sense that it only measures the correlation between the fluctuations of λin (t) and λout (t) around there respective means, and does not take into account the actual mean. For instance, at very high noise level, the neuron fires almost randomly, i.e. irrespectively of the input stimulus, since the noise itself is large enough to bring the membrane potential to its threshold value. However, for the noise values used for the study of SR with SNR, the small fluctuations of λout (t) around its mean are close enough to the ones of λin (t) that the cross-correlation measure remains almost constant. Due to the centering of the firing rates, the focus is on the oscillations of the firing rates around their respective means instead of being on their average values. Even if the inputoutput cross-correlation measure is not satisfactory for the reasons given above, it is widely used to study SR with aperiodic stimuli. A measure of the mutual information between the input spike train and the corresponding output spike train would give more satisfactory measurements of SR. V. Conclusions We have shown that it is possible to enhance the time coding of the two subthreshold formants f1 and f2 by using the theory of stochastic resonance. The model presented here can be applied to a neuron of the cochlear nucleus. The present study brings some theoretical understanding to very recent results showing that noise is useful for the speech perception of cochlear implant users [30]. Indeed, proving that the addition of ‘noisy’ spikes can help to transmit temporal information along a neural pathway gives some support to the hypothesis that designing speech coding strategies that would elicit stochastic neural patterns might benefit the users. However, a lot remains to be done before such results can be successfully applied. The optimal amount of ‘noise’ to be added is highly dependent on the stimulus and may require some complex control mechanisms. A simplified method could be to find an “average” noise level, that would improve the detectability of all the stimuli but would not optimize all of them [5].

4

[3] [4]

[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]

[23] [24] [25] [26]

[27]

Acknowledgments This work was funded by The Bionic Ear Institute. References [1] [2]

R. Benzi, A. Sutera, and A. Vulpiani. The mechanism of stochastic resonance. J. Phys. A, 14:L453–L457, 1981. A. N. Burkitt and G. M. Clark. Analysis of integrate-and-fire neurons: Synchronization of synaptic input and spike output in neural systems. Neural Comput., 11:871–901, 1999.

[28] [29] [30]

A. N. Burkitt and G. M. Clark. Calculation of interspike intervals for integrate and fire neurons with Poisson distribution of synaptic inputs. Neural Comput., 12:1789–1820, 2000. G. M. Clark. Electrical stimulation of the auditory nerve: The coding of frequency, the perception of pitch and the development of cochlear implant speech processing strategies for profoundly deaf people. Clin. Exp. Pharmacol. Physiol., 23:766–776, 1996. J. J. Collins, Carson C. Chow, and Thomas T. Imhoff. Stochastic resonance without tuning. Nature, 376:236–237, 1995. J. R. Deller, J. H. L. Hansen, and J. G. Poakis. Discrete-Time Processing of Speech Signals. IEEE Press, 2000. J. K. Douglas, L. Wilkens, E. Pantozelou, and F. Moss. Noise enhancement of information transfer in crayfish mechanoreceptors by stochastic resonance. Nature, 365:337–340, 1993. K. Ehrenberger, D. Felix, and K. Svozil. Stochastic resonance in cochlear signal transduction. Acta Otolaryngol. (Stockh), 119:166–170, 1999. S. Fauve and F. Heslot. Stochastic resonance in a bistable system. Phys. Lett., 97A:5–7, 1983. L. Gammaitoni, P. H¨ anggi, P. Jung, and F. Marchesoni. Stochastic resonance. Rev. Mod. Phys., 70:223–287, 1998. K. R. Henry. Noise improves transfer of near-threshold, phaselocked activity of the cochlear nerve: Evidence for stochastic resonance? J. Comp. Physiol. A, 184:577–584, 1999. N. Hohn. Stochastic resonance in a neuron model with application to the auditory pathway. Master’s thesis, Department of Otolaryngology, The University of Melbourne, 2000. N. Hohn and A. N. Burkitt. Shot noise in the leaky integrateand-fire neuron, 2000. To appear in Phys. Rev. E. F. Jaramillo and K. Wiesenfeld. Mechanoelectrical transduction assisted by Brownian motion: A role for noise in the auditory system. Nature Neurosci., 1:384–388, 1998. F. Jaramillo and K. Wiesenfeld. Physiological noise level enhances mechanoelectrical transduction in hair cells. Chaos, Solitons and Fractals, 11:1869–1874, 2000. D. H. Johnson. Point process model of single-neuron discharges. J. Comput. Neurosci., 3:275–299, 1996. R. Kempter, W. Gerstner, J. L. van Hemmen, and H. Wagner. Extracting oscillations: Neuronal coincidence detection with noisy periodic spike input. Neural Comput., 10:1987–2017, 1998. J. E. Levin and J. P. Miller. Broadband neural encoding in the cricket cercal sensory system enhanced by stochastic resonance. Nature, 380:165–168, 1996. B. McNamara and K. Wiesenfeld. Theory of stochastic resonance. Phys. Rev., A39:4854, 1989. R. P. Morse and E. F. Evans. Enhancement of vowel coding for cochlear implants by addition of noise. Nature Medicine, 2:928–932, 1996. R. P. Morse and E. F. Evans. Additive noise can enhance temporal coding in a computational model of analogue cochlear implant stimulation. Hear. Res., 133:107–119, 1999. R. P. Morse and E. F. Evans. Preferential and non-preferential transmission of formant information by an analogue cochlear implant using noise: the role of the nerve threshold. Hear. Res., 133:120–132, 1999. R. P. Morse and P. Roper. Enhanced coding in a cochlearimplant model using additive noise: Aperiodic stochastic resonance with tuning. Phys. Rev. E, 61:5683–5692, 2000. F. Moss and X. Pei. Neurons in parallel. Nature, 376:211–212, 1995. D. Oertel. The role of timing in the brain stem nuclei of vertebrates. Annu. Rev. Physiol., 61:497–519, 1999. J. T. Rubinstein, B. S. Wilson, C. C. Finley, and P. J. Abbas. Pseudospontaneous activity: Stochastic independence of auditory nerve fibers with electrical stimulation. Hear. Res., 127(1-2):108–118, 1999. L. O. Trussell. Cellular mechanisms for preservation of timing in central auditory pathways. Curr. Opin. Neurobiol., 7:487–492, 1997. H. C. Tuckwell. Introduction to Theoretical Neurobiology: Volume 2, Nonlinear and Stochastic Theories. Cambridge University Press, Cambridge, 1988. J. A. White, J. T. Rubinstein, and A. R. Kay. Channel noise in neurons. Trends Neurosci., 23:131–137, 2000. F.-G. Zeng, Q.-J. Fu, and R. Morse. Human hearing enhanced by noise. Brain Research, 869:251–255, 2000.