Contents 1 Introduction 1.1 Some jargon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Loading STAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Analysis of data from neuron 1 of e060824spont data set 2.1 Loading data . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Summarizing data . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Automatic analysis . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 reportHTML . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Spike train plot . . . . . . . . . . . . . . . . . . . . . . 2.3.3 renewalTestPlot . . . . . . . . . . . . . . . . . . . . . 2.3.4 Cross correlation histograms and Cross-intensity plots 2.3.5 Conclusions of the automatic analysis . . . . . . . . . 2.4 Partial autocorrelation function . . . . . . . . . . . . . . . . . 2.5 Data frame for gss . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Variables transformation . . . . . . . . . . . . . . . . . . . . . 2.7 Variables evolution . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Fitting and testing models . . . . . . . . . . . . . . . . . . . . 2.8.1 Model fit: the straightforward approach . . . . . . . . 2.8.2 Time transformation and goodness of fit . . . . . . . . 2.8.3 Exchanging fitting and testing part . . . . . . . . . . . 2.8.4 Doing two fits at once with a multi-core CPU . . . . . 2.8.5 Trying a simpler model . . . . . . . . . . . . . . . . . 2.9 Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Plotting results . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10.1 Quick visualization of the model terms . . . . . . . . . 1

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

2 2 3 3 3 3 4 4 4 6 7 7 8 9 12 13 14 14 15 17 17 20 22 22 22

2.10.2 Use of quickPredict and its associated methods . . . . . . . . . . . 22 2.10.3 Looking at the intensity process of the two models . . . . . . . . . 27 2.11 Checking the necessity of variable transformations . . . . . . . . . . . . . 27 3 Analysis of data from neuron 1 of e060817spont data set 3.1 Loading and summarizing the data . . . . . . . . . . . . . . 3.2 Automatic analysis of neuron 1 from e060817spont . . . . 3.3 Partial autocorrelation function . . . . . . . . . . . . . . . . 3.4 Conclusions of the preliminary analysis . . . . . . . . . . . . 3.5 Data frame for gss . . . . . . . . . . . . . . . . . . . . . . . 3.6 Models fits, tests and selection . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

30 30 30 30 32 32 34

4 Analysis of data from neuron 2 of e060817spont data set 4.1 Loading and summarizing the data . . . . . . . . . . . . . . 4.2 Automatic analysis of neuron 2 from e060717spont . . . . 4.3 Partial autocorrelation function . . . . . . . . . . . . . . . . 4.4 Conclusions of the preliminary analysis . . . . . . . . . . . . 4.5 Data frame for gss . . . . . . . . . . . . . . . . . . . . . . . 4.6 Models fits, tests and selection . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

39 39 39 39 42 42 43

5 Software versions used for this vignette

1

45

Introduction

What follows is a detailed description of how I presently estimate the intensity function (IF )1 of spike trains recorded in the spontaneous regime using STAR functionalities.

1.1

Some jargon

Before entering into the details of the analysis some technical terms that are going to be used constantly in the sequel will be introduced. The notations follow mainly the ones of Brillinger [1988a, pp 190–191]. Definition 1 For points {tj } randomly scattered along a line, the counting process N (t) gives the number of points observed in the interval (0, t]: counting process definition N (t) = ]{tj with 0 < tj ≤ t} (1) where ] stands for the cardinality (number of elements) of a set. Brillinger [1988a] uses τj for our tj .

2

Definition 2 The history, Ht , consists of the variates determined up to and including history time t that are necessary to describe the evolution of the counting process. tion

defini-

The history is often called the filtration in the counting process literature. See Touboul and Faugeras [2007, p 93] for a rigorous definition of the concept, see also Andersen et al. [1993, pp 49–51]. Definition 3 For the process N and history Ht , the intensity function at time t is defined as: intensity funcProb{event ∈ (t, t + h] | Ht } λ(t | Ht ) = lim (2) tion definition h↓0 h For small h one has the interpretation: Prob{event ∈ (t, t + h] | Ht } ≈ λ(t | Ht ) h

(3)

Notice that we are using symbol λ for the intensity function, following the now most usual convention Andersen et al. [1993, p 51], while Brillinger [1988a], Johnson [1996] use µ.

1.2

Loading STAR

We will start with the analysis of the discharge of neuron 1 in data set: e060824spont. I assume that you have installed the last version of STAR from your favorite CRAN server. Then the first thing to do once R has been started is to load the package: library > library(STAR) Here some of the stuff printed upon loading the package has been removed.

2

Analysis of data from neuron 1 of e060824spont data set

2.1

Loading data

Fine, we now have to make the data set e060824spont available from our work space, and this is done with function data: data > data(e060824spont)

2.2

Summarizing data

We start by getting a quick summary of neuron 1 spike train by applying the summary method to the spikeTrain object, e060824spont[["neuron 1"]]2 : summary 1

“Our” intensity function is also often called the conditional intensity function, e.g., Brillinger [1988a], Ogata [1988] or the hazard function, e.g., Johnson [1996]. The intensity function should be called more properly the intensity process since it is in general a function of random variables Andersen et al. [1993, p 51] 2 As can be seen by looking at the documentation of the data set (?e060824spont), e060824spont is a list of two spikeTrain objects.

3

> summary(e060824spont[["neuron 1"]]) A spike train with 505 events, starting at: 0.594 and ending at: 58.585 (s). The mean ISI is: 0.115 and its SD is: 0.36 (s). The mean log(ISI) is: -3.148 and its SD is: 1.044 The shortest interval is: 0.008 and the longest is: 3.811 (s).

2.3 2.3.1

Automatic analysis reportHTML

We next carry out an “automatic analysis” using method reportHTML:

reportHTML

> reportHTML(e060824spont[["neuron 1"]], filename = "e060824spont_1", + directory = "report", otherST = e060824spont[c(2)], maxiter = 100) The result of this automatic analysis is a bunch of figures in png format and an html file named e060824spont_1.html and located in subdirectory report. The best way to visualize the html file is clearly to use your favorite web browser. 2.3.2

Spike train plot

The first figure appearing on the web page (e060824spont_1.html) is a spike train plot [Pouzat and Chaffiol, 2009] and is reproduced in Fig. 1. A striking staircase pattern can be seen on the realization of the counting process defined by Eq. (1). This pattern which translates into the non-uniform distribution of the ticks on the raster plot shown at the bottom of the graph rules out a model based on a homogenous Poisson process for this spike train. Intensity function of an homogenous Poisson process Poisson process is extremly simple. One has: λ(t | Ht ) = λ0

The IF of an homogenous (4)

That is, the IF is a constant. Tip When dealing with spike train with a lot of events, say 1000 or more, the extra “visibility” provided by the spike train plot compared to the classical raster plot, can be defficient in the sense that important details of the discharge can end up being not discernible. It is then easy to use the subsetting method for spikeTrain objects which would give in the present case: Subsetting spikeTrain > e060824spont[[1]][10 <= e060824spont[[1]] & e060824spont[[1]] < jects + 40] The resulting spike train plot is not shown in this document, but a “zoom” of Fig. 1 between seconds 10 and 40 would pop-up. 4

ob-

Figure 1: The spike train plot of neuron 1 of data set e060824spont.

5

Figure 2: Renewal test plot of neuron 1 of data set e060824spont. 2.3.3

renewalTestPlot

As explained in Pouzat and Chaffiol [2009, Sec. 2.4.3], the model “following” the homogenous Poisson process is the homogenous renewal process. A graphical plot of the suitability of such a model for empirical data is the second graph appearing on the web page, e060824spont_1.html, and is generated by function renewalTestPlot. We show it here on Fig. 2 from which it is clear that a homogenous renewal process model does not apply. Intensity function of a homogenous renewal process renewal process is still reasonably simple. One has: λ(t | Ht ) = r(t − tl )

The IF of a homogenous

(5)

where tl is the occurence time of the last spike before t, formally, tl = max{tj : tj < t}. In other words, t − tl is the elapsed time since the last spike. It is as if the clock was reset at 0 everytime an event occurs. For a homogenous renewal process the history is simply made of all the spikes observed up to, but not including, t: {tj : tj < t}. 6

Figure 3: Cross-intensity plot and Cross correlation histogram between neuron 1 and 2 of data set e060824spont. 2.3.4

Cross correlation histograms and Cross-intensity plots

The web page shows next two plots which are relevant only when the homogenous renewal process applies. They are not reproduced here since a more sophisticated model is required as shown by Fig. 2. The last plots showing the cross-correlation histogram [Brillinger et al., 1976, Eq. (13), p 218] and its smooth version, the cross-intensity plot, is reproduced here on Fig. 3. Since this data sets contains only two neurons, only one such plot appears on the web page. With more neurons in the data set, more plots can be generated by setting properly argument otherST of method reportHTML. Since the black horizontal lines on Fig. 3 is entirely contained in the “confidence region”, there is no ground to include an interaction term between the two recording neurons in our discharge model for neuron 1. 2.3.5

Conclusions of the automatic analysis

This automatic analysis leads us to conclude that our model needs to be more complex than a homogenous renewal process model (Fig. 2). The absence of significant crosscorrelation (Fig. 3) suggests that an interaction term between neurons 1 and 2 of the data set is not required. Moreover neither the spike train plot (Fig. 1) nor the autocorrelation function plot (bottom left of Fig. 2) show clear signs of non stationnarity of the train. At the present stage we do not have any method leading to unambiguously interpretable models with non stationnary data in the spontaneous regime. A model more complex than a homogenous renewal process model will necessarily lead us to a multivariate IF . Biophysics teaches us that every neuron exhibits a refractory 7

0.05 −0.05

0.00

Partial ACF

0.10

0.15

Series diff(e060824spont[["neuron 1"]])

0

5

10

15

20

25

Lag

Figure 4: The partial autocorrelation function of neuron 1 of data set e060824spont. period following a spike (ruling out the homogenous Poisson process as a “true” model) and that will lead us to always include the elapsed time since the last spike in our models; just as we did for the homogenous renewal process model of Eq. (5). Of course the bothering question at this stage is: What the extra variables in our IF model should be? A “natural” way to include interactions between neurons would be to add the elapsed time since the last spike of a “functionally” coupled neuron in our variables list. But as we just saw for the present data set such an additional variable does not seem necessary. We are therefore left with the occurrence times of the other previous spikes, or equivalently, with the duration of the previous inter spike interval s (isi s). The question becomes then: how many previous isi s should we include in our variables list? The next section presents a tool providing us with a first guess.

2.4

Partial autocorrelation function

A practical guidance on how many past isi s should be included is provided by the partial autocorrelation function of the isi s [Kuhnert and Venables, 2005, pp 77–79]. A graph of this function for the present data set is shown on Fig. 4. It is obtained with command: acf > acf(diff(e060824spont[["neuron 1"]]), type = "partial") What we should look at here are the lags at which the function is out of 95% “confidence intervals”, like lag 1 for this data set. 8

This initial analysis would lead us to a model like: λ(t | Ht ) = f (t − tl , i1 )

(6)

where i1 is the duration of the last isi . Tip In practice when the homogenous renewal process model does not apply, I always include the last isi in the model variables list even if partial autocorrelation function is not out of the confidence intervals at lag 1. I include in addition all the other isi s for which it is out.

2.5

Data frame for gss

We will follow the approach of Brillinger [1988a, p 191], where for computational convenience, a discretization of the spike train is performed. That is, we go from the “actual train”, {t1 , . . . , tn } where 0 < t1 , . . . , tn ≤ T , to a binary vector, event, whose jth element time discretizais the value of the sum over {t1 , . . . , tn } of the indicator function Ij defined by: tion and vector event 1 if (j − 1)δ < t ≤ jδ (7) Ij (t) = 0 otherwise Where the “bin width”, δ, is chosen small enough to have at most one spike per bin3 . More explicitly we have: n X Ij (tk ) (8) eventj = k=1

When we work with this binary vector event we do not estimate f (t − tl , i1 ) directly anymore but: fδ (t − tl , i1 ) ≡ f ((j − jl )δ, (jl − jl−1 )δ) δ (9) where j is be index of the bin containing time t, jl is the index of the bin of the previous spike and jl−1 is the index of the second previous spike. fδ should be a probability (if δ has indeed been set small enough), that is a number between 0 and 1. This is what Brillinger [1988a, Eq. (2.5), p 191] writes pt (his t being our j). Since Biophysics doesn’t help us much beyond the presence of a refractory period, it is hard to guess what f (t − tl , i1 ) of Eq. (6) or fδ (t − tl , i1 ) of Eq. (9) should look like. We will therefore use a nonparametric approach where fδ (t − tl , i1 ) will be estimated with a penalized likelihood method. The general features of this approach are describe briefly in Pouzat and Chaffiol [2009, Sec. 2.5.2] and in depth in Gu [2002]. 3 This type of discretization is referred to by Berman and Turner [1992, pp 33–34] as a probabilistic approximation, they propose an alternative numerical approximation where the bin width, δ, is allowed to change along the time axis. The quantity being approximated is the likelihood of intensity [Pouzat and Chaffiol, Sec. 2]. Chornoboy et al. [1988] present an approach similar to the one of Brillinger [1988a] albeit with a different motivation. Brillinger did moreover use this probabilistic approximation from the early eighties on as witnessed by his 1983 “Wald Memorial Lecture” [Brillinger, 1988b, p 34].

9

The model estimation will moreover be performed by function gssanova of Chong Gu’s package gss. The data “fed” to this function have to be in a data frame format. Function mkGLMdf of STAR will allow us to build a data frame from a spikeTrain object. Since our preliminary analysis lead us to rule out an interaction between the two neurons of our data set, we do not need to include a variable containing the ellapsed time since the last spike of neuron 2 in our data frame. Our initial summary taught us that the shortest isi was 8 ms long and that events were obeserved between 0 and 59 s. We will therefore use a bin width of 4 ms and create our data frame with: mkGLMdf > DFA <- mkGLMdf(e060824spont[["neuron 1"]], 0.004, 0, 59) We can get a quick view of the first elements of our data frame DFA with:

head

> head(DFA) 150 151 152 153 154 155

event 0 0 0 0 0 0

time neuron lN.1 0.596 1 0.004 0.600 1 0.008 0.604 1 0.012 0.608 1 0.016 0.612 1 0.020 0.616 1 0.024

Here the variables are: event corresponds to our previous event vector, it contains the binary version of the spike train. time contains the time at the bin center (in s). neuron contains the number of the considered neuron in the data set (the one to which the spikes in the event variable belong). It wont be used here but it becomes useful when several neurons are present and when interactions between them have to be considered as we will later see. lN.1 contains the elapsed time since the last spike of neuron 1, that is, j − jl in Eq. (9).

We can also get a quick view at the end of DFA with: > tail(DFA) 14746 14747 14748 14749 14750 14751

event 0 0 0 0 0 0

time neuron lN.1 58.980 1 0.396 58.984 1 0.400 58.988 1 0.404 58.992 1 0.408 58.996 1 0.412 59.000 1 0.416 10

tail

There is still one variable missing in our data frame in order to work with our candidate model: the last isi which can be obtained with function isi of STAR: isi > DFA <- within(DFA, i1 <- isi(DFA, lag = 1)) Here we have just added variable i1 to our data frame. As we can see by calling head on our modified DFA: > head(DFA) 150 151 152 153 154 155

event 0 0 0 0 0 0

time neuron lN.1 i1 0.596 1 0.004 NA 0.600 1 0.008 NA 0.604 1 0.012 NA 0.608 1 0.016 NA 0.612 1 0.020 NA 0.616 1 0.024 NA

values of i1 are not available for the first elements of the data frame. It makes sense since we do not know when was the last spike before the beginning of the acquisition. We are therefore going to remove the elements of DFA for which one of the variables is not available. We can do that efficiently with function complete.cases of R: complete.cases > DFA <- DFA[complete.cases(DFA), ] Calling head again, we can see that complete.cases did its job right: > head(DFA) 436 437 438 439 440 441

event 0 0 0 0 0 0

time neuron lN.1 i1 1.740 1 0.004 0.216 1.744 1 0.008 0.216 1.748 1 0.012 0.216 1.752 1 0.016 0.216 1.756 1 0.020 0.216 1.760 1 0.024 0.216

We can moreover check that isi did its job correctly by looking at a well chosen part of DFA, like the one starting at index 14169: 14604 14605 14606 14607 14608 14609 14610 14611

event 0 1 0 1 0 0 1 0

time neuron lN.1 i1 58.412 1 0.012 0.016 58.416 1 0.016 0.016 58.420 1 0.004 0.016 58.424 1 0.008 0.016 58.428 1 0.004 0.008 58.432 1 0.008 0.008 58.436 1 0.012 0.008 58.440 1 0.004 0.012 11

0.8 0.6 0.0

0.2

0.4

Fn(x) 0.0

0.2

0.4

Fn(x)

0.6

0.8

1.0

ecdf(i1)

1.0

ecdf(lN.1)

0

1

2

3

4

0

x

1

2

3

4

x

Figure 5: Empirical cumulative distribution function of lN.1 and i1.

2.6

Variables transformation

A crucial ingredient for efficient smoothing spline estimation is an a “reasonably” uniform distribution of the independent variables (or predictors). But as Fig. 5 shows our independent variables, lN.1 and i1 are not uniformly distributed. > > > >

with(DFA, with(DFA, with(DFA, with(DFA,

plot(ecdf(lN.1), pch = ".")) lines(range(lN.1), c(0, 1), col = 2, lty = 2)) plot(ecdf(i1), pch = ".")) lines(range(i1), c(0, 1), col = 2, lty = 2))

In the sequel we will adopt the slightly extrem “mapping to uniform” approach, that is, we are going to estimate a smooth version of the cumulative distribution function (cdf ) and use it to transform our independent variables. The STAR function doing this job is mkM2U. It can be called on part of the data set in order to perform a mapping of the other part independent of the (mapped) data. For our two variables lN.1 and i1 we call: mkM2U > m2u1 <- mkM2U(DFA, "lN.1", 0, 28.5) > m2ui <- mkM2U(DFA, "i1", 0, 28.5, maxiter = 200) The results of these two commands are two mapping functions that we can now use to generate the “mapped to uniform” variables, e1t and i1t: 12

0.8 0.6 0.0

0.2

0.4

Fn(x) 0.0

0.2

0.4

Fn(x)

0.6

0.8

1.0

ecdf(i1t)

1.0

ecdf(e1t)

0.0

0.2

0.4

0.6

0.8

1.0

0.0

x

0.2

0.4

0.6

0.8

1.0

x

Figure 6: Empirical cumulative distribution function of e1t and i1t. > DFA <- within(DFA, e1t <- m2u1(lN.1)) > DFA <- within(DFA, i1t <- m2ui(i1)) Fig. 6 shows us that our mapping worked properly.

2.7

Variables evolution

Before going further it can be a good idea to look at the (time) evolution of the variables. In order to do that quickly we are going to use the default time series objects provided by R and created with function ts. Looking at e1t should not be too interesting since this variable is starting at zero following an event and increases linearly thereafter. We will therefore look at our event and i1t variables. In order to have a clearer picture we are going to box-filter the variables using function filter with a widow length of 0.5 s (that is, 125 indexes, since we used a bin width of 4 ms). We will also map i1t onto a normal random variable and we get Fig. 7 : ts filter > DFAts <- ts(with(DFA, cbind(event, i1t)), start = DFA$time[1], + delta = diff(DFA$time[1:2])) > DFAtsF <- filter(DFAts, rep(1/125, 125)) > colnames(DFAtsF) <- c("event", "i1t") > plot(DFAtsF) 13

0.10 0.6 0.2

0.4

i1t

0.8

1.0 0.00

0.05

event

0.15

DFAtsF

0

10

20

30

40

50

60

Time

Figure 7: Time evolution of event and i1t. A box filter of 0.5 s has been applied Fig. 7 does not exhibit any clear trend in the graphed variables, confirming thereby our former stationary discharge conclusion. Depending on the data at hand it can clearly be a good idea to try out several filter window lengths.

2.8

Fitting and testing models

Since we are going to use nonparametric model estimation procedures and since we want to have meaningful goodness of fit tests we will systematically fit a given model to one half of the data and test it on the other half before switching the fit and test data parts and repeating the procedure. 2.8.1

Model fit: the straightforward approach

We are going to fit our models using function gssanova of package gss. The most straightforward way to fit the model of Eq. (6) to the first half of our data set is: gssanova > GF1e <- gssanova(event ~ e1t * i1t, data = subset(DFA, time <= + 29.5), family = "binomial", seed = 20061001)

14

The time needed to carry out this fit on the present machine is, 107.39 s4 2.8.2

Time transformation and goodness of fit

We will asses the quality of our model by evaluating the intensity process of the part of the data taht we did not use fro model estiamtion. This intensity process will then be used to performed a time transformation as proposed by Ogata [1988] after which a new counting process will be obtained. If our model is good this process should be the realization of a homogenous Poisson process with rate 1. The latter process is then the null hypothesis against which we are going to test Pouzat and Chaffiol. The time transformation is simply performed with function %tt% of STAR. It is called as follows: %tt% > tt.GF1e <- GF1e %tt% subset(DFA, time > 29.5) Object tt.GF1e is an object of class CountingProcessSamplePath for which a summary method exists providing a quick numeric summary of how appropriate the model is: > tt.GF1e.summary <- summary(tt.GF1e) > tt.GF1e.summary *** Test of uniformity on the time axis Prob. of the Kolmogorov statistic under H0: 0.25221 *** Wiener process test Inside 95% domain: TRUE , inside 99% domain: TRUE *** Berman test Prob. of the Kolmogorov statistic under H0: 0.11325 *** Renewal test Inside 95% domain: TRUE , inside 99% domain: TRUE Maximum lag: 24 *** Variance vs "time" with 5 time windows: 1 window out at 95% level 0 window out at 99% level *** The object contains 262 events. Notice that the last two commands could be combined in a single one by typing: > (tt.GF1e.summary <- summary(tt.GF1e)) The quality of the model can also be assest by calling the plot method for CountingProcessSamplePath objects as shown on Fig. 8: > plot(tt.GF1e.summary, which = c(1, 2, 4, 6)) Looking at Fig. 8 we would conclude that the model is satisfying. 4 For reference, it takes 88.23 s on an Intel Core2 Duo P9500 at 2.53 GHz with 4 GB of RAM, running Ubuntu 9.04, R-2.9.1 link to the ATLAS version of BLAS (version 3.8.3) everything being compiled with gcc 4.3.3.

15

Uniform on Λ Test

0.6 Uk+1

150 0

0.2

50

0.4

100

N(Λ)

200

0.8

250

1.0

Uk+1 vs Uk

50

100

150

200

250

0.2

0.4

0.6

Λ

Uk

Berman's Test

Wiener Process Test

0.8

1.0

0 −3

0.0

−2

0.2

−1

0.4

Xnt

ECDF

0.6

1

0.8

2

3

1.0

0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

U(k)

0.2

0.4

0.6

0.8

1.0

t

Figure 8: Ogata’s tests battery applied to the time transformed second half of neuron 1 spike train (from data set: e060824spont) using model GF1e fitted on the first half. See Pouzat and Chaffiol for a description of the plots.

16

2.8.3

Exchanging fitting and testing part

In order to fully validate our model we are going to exchange the fitting and testing part, that is, fit the same model as before to the last half of the data set before testing it on the first half: > GF1l <- gssanova(event ~ e1t * i1t, data = subset(DFA, time > + 29.5), family = "binomial", seed = 20061001) The total time taken by our two fits is: 203.89 s. We now perform the same series of tests than before but this time on the early part of the data set: > tt.GF1l <- GF1l %tt% subset(DFA, time <= 29.5) > (tt.GF1l.summary <- summary(tt.GF1l)) *** Test of uniformity on the time axis Prob. of the Kolmogorov statistic under H0: 0.44084 *** Wiener process test Inside 95% domain: TRUE , inside 99% domain: TRUE *** Berman test Prob. of the Kolmogorov statistic under H0: 0.14131 *** Renewal test Inside 95% domain: FALSE , inside 99% domain: TRUE Maximum lag: 24 *** Variance vs "time" with 4 time windows: 0 window out at 95% level 0 window out at 99% level *** The object contains 240 events. > plot(tt.GF1l.summary, which = c(1, 2, 4, 6)) Looking at Fig. 9 we would conclude again that the model is satisfying. 2.8.4

Doing two fits at once with a multi-core CPU

Nowadays many computers are equipped with a multi-core CPU. This means that on such machines we can in principle perform our last two fits, GF1e and GF1l simultaneously. There are several ways to do just that in R and we are going to illustrate here how to do it with the snow package [Rossini et al., 2003]. Of course you need to have the package installed on your computer to perform what follows. Then we load the package the usual way: > library(snow) We are going to create a SOCKET cluster with 2 “slave” processes:

17

makeCluster

Uniform on Λ Test

0

0.0

0.2

50

0.4

100

Uk+1

N(Λ)

0.6

150

0.8

200

1.0

Uk+1 vs Uk

50

100

150

200

250

0.0

0.2

0.4

0.6

Λ

Uk

Berman's Test

Wiener Process Test

0.8

1.0

0 −3

0.0

−2

0.2

−1

0.4

Xnt

ECDF

0.6

1

0.8

2

3

1.0

0

0.0

0.2

0.4

0.6

0.8

1.0

0.0

U(k)

0.2

0.4

0.6

0.8

1.0

t

Figure 9: Ogata’s tests battery applied to the time transformed first half of neuron 1 spike train (from data set: e060824spont) using model GF1l fitted on the second half.

18

> nbSlaves <- 2 > cl <- makeCluster(rep("localhost", nbSlaves), type = "SOCK") Do not forget at the end of your R session to stop your cluster with:

stopCluster

> stopCluster(cl) Now each of the 2 processes needs to have the required libraries in its own workspace. This is done as follows: clusterEvalQ > clusterEvalQ(cl, library(STAR)) Here the stuff printed after command completion has been removed. We are going to use snow’s clusterApply function to perform our two fits in parallel. But before calling the function we need to create the fucntion we are going to pass as an argument. This function should fit the model and perform the time transfomation on the portion of the data passed to it. We do this in two stages: > mkPFct <- function(df = DFA) { + df + PFct <- function(gtime, fmla = event ~ e1t * i1t, seed = 20061001) { + GF <- gssanova(fmla, data = subset(df, time %in% gtime), + family = "binomial", seed = seed) + tt <- GF %tt% subset(df, !(time %in% gtime)) + list(GF = GF, tt = tt) + } + PFct + } > PFct1 <- mkPFct() We now have two create the list containing the proper argument gtime for PFct1: > gtList <- list(early = with(DFA, time[time <= 29.5]), late = with(DFA, + time[time > 29.5])) The job is then simply done at once:

clusterApply

> GF1 <- clusterApply(cl, gtList, PFct1) This time it takes only: 134.13 s (this time also includes the time required by the time transformation). We can check that the two procedures produce identical results5 by plotting the point process of the first component of GF1 against the one of GF1e and the point process of the second component of GF1 against the one of GF1l as shown on Fig. 10. 5

Using function identical would give FALSE here since the objects have functions as components and these functions have their own environments. Although the results of evaluating the same function from two different objects are the same, the reference of their environments are different and that leads to a FALSE results when applying identical.

19

Fit second half and test first half

200 150 100 0

50

GF1[[2]]$tt$ppspFct()

200 150 100 50 0

GF1[[1]]$tt$ppspFct()

250

250

Fit first half and test second half

0

50

100

150

200

250

0

tt.GF1e$ppspFct()

50

100

150

200

250

tt.GF1l$ppspFct()

Figure 10: Comparison of fits performed sequentially and in parallel on the spike train of neuron 1 of data set e060824spont. 2.8.5

Trying a simpler model

We have just explored a model containing an “interaction” term between variable e1t and variable i1t. Since the latter gives good fits it is interesting to try simplifying it to see if a model without interaction would not give as good results sequentially we would proceed as follows: > + > > > + > >

GF2e <- gssanova(event ~ e1t + i1t, data = subset(DFA, time <= 29.5), family = "binomial", seed = 20061001) tt.GF2e <- GF2e %tt% subset(DFA, time > 29.5) (tt.GF2e.summary <- summary(tt.GF2e)) GF2l <- gssanova(event ~ e1t + i1t, data = subset(DFA, time > 29.5), family = "binomial", seed = 20061001) tt.GF2l <- GF2l %tt% subset(DFA, time <= 29.5) (tt.GF2l.summary <- summary(tt.GF2l))

While in parallel we would do: > GF2 <- clusterApply(cl, gtList, PFct1, fmla = event ~ e1t + i1t) The latter requires 65.57 to complete. The fit diagnostic plots are shown on Fig. 11. Since the simpler model also looks good, the question becomes: Which one should we choose? 20

0.0 0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

U(k)

U(k)

Uk+1 vs Uk

Uk+1 vs Uk

0.8

1.0

0.8

1.0

0.4 0.0

Uk+1

0.8

0.2 0.4 0.6 0.8 1.0

0.0

Uk+1

0.4

ECDF

0.4 0.0

ECDF

0.8

Berman's Test

0.8

Berman's Test

0.2

0.4

0.6

0.8

1.0

0.0

0.4

0.6

Wiener Process Test

Wiener Process Test 2 1 −1 0 −3

−1 0

Xnt

1

2

3

Uk

3

Uk

−3

Xt

0.2

0.0

0.2

0.4

0.6

0.8

1.0

0.0

t

0.2

0.4

0.6

0.8

1.0

t

Figure 11: Ogata’s tests battery applied to the time transformed neuron 1 spike train (from data set: e060824spont) using model GF2. Left column, fit first half, test second half (compare with Fig. 8). Right column, fit second half, test first (compare with Fig. 9). The first tests (upper right) of Fig. 8 and 9 are not shown here but are “passed” (i.e., within the confidence bands).

21

2.9

Model selection

One way to compare two alternative models is to look at the probability they give to data which were not the data used to fit them. This can be done with function predictLogProb of STAR which returns the log probability of some data (passed as the second argument to the function) under some model (passed as the first argument). Here the log probability of our data using the simpler model is: predictLogProb > (GF2.logProb <- predictLogProb(GF2[[1]][[1]], subset(DFA, time > + 29.5)) + predictLogProb(GF2[[2]][[1]], subset(DFA, time <= + 29.5))) [1] -1775.823 > (GF1.logProb <- predictLogProb(GF1[[1]][[1]], subset(DFA, time > + 29.5)) + predictLogProb(GF1[[2]][[1]], subset(DFA, time <= + 29.5))) [1] -1762.995 Since the most complex model (with interaction) gives a higher probability than the less complex one (without interaction) I would go ahead and keep the former.

2.10 2.10.1

Plotting results Quick visualization of the model terms

Before looking at the model terms effect we can refit our selected model to the full data set in orer to have better estimates: > GF1f <- gssanova(event ~ e1t * i1t, data = DFA, family = "binomial", + seed = 20061001) A plot of the terms is then quickly generated with the plot method for gssanova objects: > plot(GF1f, nr = 3, nc = 1)

2.10.2

Use of quickPredict and its associated methods

A finner control of the plots can be obtained with the quickPredict function of STAR and of its associated plot, contour, image and persp methods. The easiest way to fine tune a term effect plot with STAR is to generate a quickPredict object containing the term effect first. For the first two terms, e1t and i1t of our model this is done simply with: quickPredict > term.e1t <- quickPredict(GF1f, "e1t") or, using the binary operator version, %qp%: 22

%qp%

2 1 0 η −1 −2

0.0

0.2

0.4

0.6

0.8

1.0

0.6

0.8

1.0

−2

−1

η

0

1

2

e1t

0.0

0.2

0.4 i1t

0.6

0.8

1.0

term e1t:i1t

8 −10.5.3

0.4 1.74 0.3

.23 −10.0

1.18 0.2

0.1 −0.48

0.63 0.1

0.4

0.0

0.2

0.4

i1t

0

0.08

0.08

0

0.1 0.63

.48 −00.1

0.2 1.18

0.2 −1.03

0.3 1.7 0.4 4 0.0

0.38 −1.5 0.4 0.2

0.4

0.6

0.8

1.0

e1t

Figure 12: Effects of the 3 terms of the selected model for neuron 1 of data set e060824spont. The abscissa scale corresponds to the percentiles of the variables. The ordinate logit scales are directly comparable. On the third plot the estimated interaction term is displayed as red contours while the estimated standard error is displayed as dotted black contours. 23

> term.i1t <- GF1f %qp% "i1t" We can then call the plot method for quickPredict objects and get basic plots. We can also pass additional arguments to these methods in order to fine tune the output. Another thing we can do is get a plot of the term effects on the “native” scale instead of the “probability” scale. To do that we can use the qFct attribute of our “mapping to uniform” functions (Sec. 2.6). What we have to transform is the xx element of our two quickPredict objects, term.e1t and term.i1t: > > > >

term.e1

term.e1t <- attr(m2u1, "qFct")(term.e1$xx) term.i1t <- attr(m2ui, "qFct")(term.i1$xx)

We can then use the plot method to get Fig. 13: > + > + > + > +

plot(term.e1t, xlab = "Probability scale", ylab = expression(eta[1]), main = "Elapsed time since last spike") plot(term.e1, xlab = "Time (s)", ylab = expression(eta[1]), panel.first = grid(col = 1), main = "Elapsed time since last spike") plot(term.i1t, xlab = "Probability scale", ylab = expression(eta[i1]), main = "Last ISI") plot(term.i1, xlab = "Time (s)", ylab = expression(eta[i1]), main = "Last ISI", panel.first = grid(col = 1))

The quickPredict object corresponding to the interaction term of the model, e1t:i1t, is also easily obtained: > term.e1ti1t <- GF1f %qp% "e1t:i1t" We can call the image, contour persp methods on the resulting object. If we want to go to the native scale for the plot, the best way is to use the changeScale function of STAR: changeScale > term.e1i1 <- changeScale(term.e1ti1t, attr(m2u1, "qFct"), attr(m2ui, + "qFct")) The following commands: > > > + > + > > +

image(term.e1ti1t) contour(term.e1ti1t, add = TRUE) contour(term.e1ti1t, levels = seq(-2, 2, 0.5), labcex = 1.5, col = 2) contour(term.e1ti1t, what = "sd", levels = seq(-0.4, 0.4, 0.1), col = 1, lty = 2, add = TRUE) persp(term.e1ti1t, theta = -10, phi = 30) persp(term.e1i1, theta = -25, phi = 30, xlab = "time since last (s)", ylab = "last isi (s)", main = "")

lead to the plots shown on Fig. 14. 24

image, contour, persp

2 −1 0

1 0.2

0.4

0.6

0.8

1.0

0

1

2 Time (s)

Last ISI

Last ISI

3

0.0 −0.5

−0.5

ηi1

0.0

0.5

Probability scale

0.5

0.0

ηi1

Elapsed time since last spike

−3

η1

−1 0 −3

η1

1

2

Elapsed time since last spike

0.0

0.2

0.4

0.6

0.8

1.0

0

Probability scale

1

2

3

Time (s)

Figure 13: Terms e1t (upper row) and i1t (lower row) with a probability scale (left column) and with a native scale (right column).

25

term e1t:i1t

term e1t:i1t 2

.5 −1

−0.5

0.5

0.5

−0.5

1

−2

0.0

0.2

0.4

0.6

0.8

0.2

1.0

0.3

0.2

0.5

−0.5 0 0.5

0.1

0

0.1

−0.5

0.1

0.2

1

0.3

.5 −1

2

1

0.3

0.2

−1

1.5

.5 −1

0.1

0.4

0

0.0

0.4

0

0.0

0.3

.5 −1

0.

4 0.

4

0.0

0.2

e1t

0.4

0.6

0.8

1.0

e1t

Mean of term e1t:i1t

mean

last )

isi (s

mean

i1t

i1t

0.8

1

−1

0. 4

4 0.

1.5

i1t

0.8

−2

e1t

(s) last e inc es tim

Figure 14: Examples of image (upper right), contour (upper row) and persp (bottom row) methods for quickPredict objects.

26

2.10.3

Looking at the intensity process of the two models

We conclude the analysis of the spike train of neuron 1 from the e060824spont data set by looking at the intensity process obtained with our two models (with and without interaction) on a small part of the spike train. We first get the predicted value of fδ of Eq. (9) on the logit scale for the second half of the second half of the data set using the fit obtained from the first half: > eta1.e <- predict(GF1[[1]][[1]], newdata = subset(DFA, time > + 29.5)) > eta2.e <- predict(GF2[[1]][[1]], newdata = subset(DFA, time > + 29.5)) We then convert eta1.e and eta2.e into proper frequencies: > tigol <- function(x) exp(x)/(1 + exp(x)) > lambda1.e <- tigol(eta1.e)/0.004 > lambda2.e <- tigol(eta2.e)/0.004 Then a plot showing the intensity process of the two models is obtained with the following commands: > + + > + > > +

with(subset(DFA, time > 29.5), plot(time, lambda1.e, xlim = c(30.5, 32), type = "l", col = 2, xlab = "Time (s)", ylab = expression(lambda ~ "(Hz)"), ylim = c(0, 50), lwd = 2)) with(subset(DFA, time > 29.5), lines(time, lambda2.e, xlim = c(30.5, 32), col = 4, lty = 2, lwd = 2)) with(subset(DFA, time > 29.5), rug(time[event == 1], lwd = 2)) legend(30.5, 45, c("with interaction", "without interaction"), col = c(2, 4), lty = c(1, 2), lwd = c(2, 2), bty = "n")

The results appears on Fig. 15. Notice that the intensity process of the “best model” (with interaction) is almost always larger than the one of the other model just before the spike while it tends to be smaller in between the spikes. In other words the best model predicts a lower event probability when there is actually no event and a larger probability when there are events.

2.11

Checking the necessity of variable transformations

In order to substantiate the variable transformation procedure introduced in Sec. 2.6 we can refit one of our two models using the untransformed independent variables. To gain time we will refit the model without interaction but your invited to do it for the other model and check for yourself that the results remain very bad. > GFb <- clusterApply(cl, gtList, PFct1, fmla = event ~ lN.1 + + i1) The tests shown on Fig. 16 clearly show that this new model is not satisfying. 27

50 0

10

20

λ (Hz)

30

40

with interaction without interaction

30.5

31.0

31.5

32.0

Time (s)

Figure 15: The intensity process of the two considered models, with (red, continuous) and without (blue, dashed) interactions between the elapsed time since the last spike and the last isi . The first half (≤ 29.5 s) of the data set (e060824spont) was fitted with both models.

28

0.0 0.2

0.4

0.6

0.8

1.0

0.0

0.4

0.6

U(k)

Uk+1 vs Uk

Uk+1 vs Uk

1.0

0.8

1.0

0.8

0.4

Uk+1

0.6

0.8 0.6 0.2

0.2

0.4 0.2

0.4

0.6

0.8

1.0

0.2

0.4

0.6

0.8

Wiener Process Test

Wiener Process Test 2 1 −3

−3

−1 0

Xnt

1

2

3

Uk

3

Uk

1.0

−1 0

Uk+1

0.2

U(k)

1.0

0.0

Xt

0.4

ECDF

0.4 0.0

ECDF

0.8

Berman's Test

0.8

Berman's Test

0.0

0.2

0.4

0.6

0.8

1.0

0.0

t

0.2

0.4

0.6

0.8

1.0

t

Figure 16: Ogata’s tests battery applied to the time transformed neuron 1 spike train (from data set: e060824spont) using model GFb, that is no interaction between the two independent variables, lN.1 and i1 which are expressed on their native scales. Left column, fit first half, test second half. Right column, fit second half, test first. Compare with Fig. 11. The first tests (upper right) of Fig. 8 and 9 are not shown here but are “passed” (i.e., within the confidence bands). 29

3

Analysis of data from neuron 1 of e060817spont data set

This is an example of a spike train where a functional interaction between two neurons helps.

3.1

Loading and summarizing the data

As always we start by loading and summarizing the data: > data(e060817spont) > summary(e060817spont[["neuron 1"]])

data and summary

A spike train with 529 events, starting at: 0.074 and ending at: 58.245 (s). The mean ISI is: 0.11 and its SD is: 0.078 (s). The mean log(ISI) is: -2.523 and its SD is: 0.989 The shortest interval is: 0.001 and the longest is: 0.782 (s). We see that the spike train neuron 1 of e060817spont has roughly the same number of events that the one we analyzed in Sec. 2.2. The mean isi is therefore very close to the previous one but the SD is much smaller, 0.078, than the previous one, 0.36.

3.2

Automatic analysis of neuron 1 from e060817spont

We perform the automatic analysis by calling reportHTML as we did in Sec. 2.3:

reportHTML

> reportHTML(e060817spont[["neuron 1"]], filename = "e060817spont_1", + directory = "report", otherST = e060817spont[c(2, 3)]) This time the spike train plot, the first plot on the generated html file, e060817spont_1.html, looks much more “regular” than the previous one even when viewed with a finner time scale as shown on Fig. 17. The renewalTestPlot suggests that the discharge is compatible with a homogenous renewal process (not shown). But although the Weibull model is the best among the six tested it still seems to miss something. Among the two cross-intensity plots the one built with neurons 1 and 2 and shown on Fig. 18 suggest a significant functional coupling between the two neurons. Notice that the cross-intensity plot peaks at zero and that it is roughly symmetrical suggesting a common input rather than one neuron directly driving the other.

3.3

Partial autocorrelation function

The partial autocorrelation function won’t be reproduced in this document but you can run for yourself the command: > acf(diff(e060817spont[["neuron 1"]]), type = "partial") and see that the estimated partial autocorrelation function is within the confidence intervals everywhere suggesting that we do not need to include the previous isi (s) in our model. 30

40

60

80

100

e060817spont[["neuron 1"]] zoom

0

20

Cumulative Number of Events

500 400 300 200 100 0

Cumulative Number of Events

e060817spont[["neuron 1"]] whole train

0

10

20

30

40

50

60

10

Time (s)

12

14

16

18

20

Time (s)

Figure 17: Spike train plots of neuron 1 from data set e060817spont.

Figure 18: Cross-intensity plot and Cross correlation histogram between neuron 1 and 2 of data set e060817spont.

31

3.4

Conclusions of the preliminary analysis

Our preliminary analysis rules out the necessity of previous isi s in order to predict the discharge of neuron 1. The cross-intensity plot suggests that including a functional coupling with neuron 2 might be useful even though this coupling, if really there, is more likely to be due to common inputs than to direct synapses between neurons 1 and 2. We are therefore led to the following formal model: λ(t | Ht ) = f (t − tl1 , t − tl2 )

(10)

where tl1 and tl2 stands for the time of the last spike from neurons 1 and 2 respectively. For such a model the history is the set of all spike times of neurons 1 and 2 observed up to (but not including) t.

3.5

Data frame for gss

Since we now want to include functional interactions between neuron 1 and 2 in our model we will have to modify slightly our mkGLMdf call of Sec. 2.5. Instead of using a single spikeTrain object as we did before for the first argument of the function, we are going to use a list of spikeTrain objects containing the two neurons we want to include in our model: neuron 1 and 2. Since the shortest isi is smaller we are going to change also the second argument. Last “innovation”, since the data frame returned by mkGLMdf when its first argument is a list of spikeTrain objects contains an event variable relative to each individual spike train, we will “isolate” the part of the result corresponding to the spikes of neuron 1: mkGLMdf > DFB <- subset(mkGLMdf(e060817spont[c(1, 2)], 0.001, 0, 58.5), + neuron == 1) A quick view on the first rows of the result shows us that an extra variable, lN.2, is present compared to DFA: head > head(DFB)

135 136 137 138 139 140

event 0 0 0 0 0 0

time neuron lN.1 lN.2 0.134 1 0.061 0.000 0.135 1 0.062 0.001 0.136 1 0.063 0.002 0.137 1 0.064 0.003 0.138 1 0.065 0.004 0.139 1 0.066 0.005

This new variable contains the elapsed time since the last spike of neuron 2. We now stick to our previous approach an find a “map to uniform” function for each of the two variables we want to include in our model, lN.1 and lN.2 and we use these functions to get our transformed variables, e1t and e2t: mkM2U 32

0.8 0.6 0.0

0.2

0.4

Fn(x)

0.6 0.4 0.0

0.2

Fn(x)

0.8

1.0

ecdf(e2t)

1.0

ecdf(e1t)

0.0

0.4

0.8

0.0

0.4

x

0.8 x

Figure 19: Empirical cumulative distribution functions of e1t and e2t of data frame DFB built with data set: e060817spont. > > > >

m2u1b <- mkM2U(DFB, "lN.1", 0, 29.25) m2u2b <- mkM2U(DFB, "lN.2", 0, 29.25) DFB <- within(DFB, e1t <- m2u1b(lN.1)) DFB <- within(DFB, e2t <- m2u2b(lN.2))

The beginning of our modified data frame looks like: 135 136 137 138 139 140

event 0 0 0 0 0 0

time neuron lN.1 lN.2 e1t e2t 0.134 1 0.061 0.000 0.4890967 0.00000000 0.135 1 0.062 0.001 0.4960475 0.03203862 0.136 1 0.063 0.002 0.5029427 0.05943394 0.137 1 0.064 0.003 0.5097796 0.08292854 0.138 1 0.065 0.004 0.5165556 0.10317857 0.139 1 0.066 0.005 0.5232687 0.12073896

Fig. 19 shows that the transformations did there jobs. Before fitting our candidate model we look at the evolution of variables: event and e2t6 as shown on Fig. 20: 6

Even if the e2t exhibits by construction a fairly regular pattern: regular increase before a sharp decrease following a spike in neuron 2.

33

0.010 0.8

1.0 0.000

event

0.020

Evol. of event and e2t

0.6 0.2

0.4

e2t

Box filter: 0.5 s

0

10

20

30

40

50

60

Time Box filter: 0.5 s

Figure 20: Time evolution of event and e2t of data frame DFB built with data set: e060817spont. A box filter of 0.5 s has been applied. > DFBts <- ts(with(DFB, cbind(event, e2t)), start = DFB$time[1], + delta = diff(DFB$time[1:2])) Judging by eye the two series seem indeed slightly correlated.

3.6

Models fits, tests and selection

We will apply our approach (Sec. 2.8 and 2.9) fit-test followed by test-fit two three models: event ∼ e1t event ∼ e1t + e2t event ∼ e1t ∗ e2t

We will moreover use the parallel implementation of Sec. 2.8.4. For that we have to create the function called by clusterApply as well as the list making the first argument of the latter function:

34

> PFctB <- mkPFct(DFB) > gtListB <- list(early = with(DFB, time[time <= 29.25]), late = with(DFB, + time[time > 29.25])) We can then go ahead and fit our three models: > GF1.e060817spont.1 <- clusterApply(cl, gtListB, PFctB, fmla = event ~ + e1t) > GF2.e060817spont.1 <- clusterApply(cl, gtListB, PFctB, fmla = event ~ + e1t + e2t) > GF3.e060817spont.1 <- clusterApply(cl, gtListB, PFctB, fmla = event ~ + e1t * e2t) The three models pass the tests series as can be seen by typing e.g.: > summary(GF1.e060817spont.1[[1]][[2]]) > summary(GF1.e060817spont.1[[2]][[2]]) The graphical tests are shown on Fig. 21 for each of the three models with a fit performed on the first half of the data set and a test performed on the second half. We select the “best” model using the predictive probability method of Sec. 2.9 giving here: > predictLogProb(GF1.e060817spont.1[[1]][[1]], subset(DFB, time > + 29.25)) + predictLogProb(GF1.e060817spont.1[[2]][[1]], subset(DFB, + time <= 29.25)) [1] -2951.739 > predictLogProb(GF2.e060817spont.1[[1]][[1]], subset(DFB, time > + 29.25)) + predictLogProb(GF2.e060817spont.1[[2]][[1]], subset(DFB, + time <= 29.25)) [1] -2936.165 > predictLogProb(GF3.e060817spont.1[[1]][[1]], subset(DFB, time > + 29.25)) + predictLogProb(GF3.e060817spont.1[[2]][[1]], subset(DFB, + time <= 29.25)) [1] -2935.313 Here the third model outperforms by a rather tiny margin the second one. The model terms shown on Fig. 22 were obtained after fitting the best model to the full data set: > GF3f.e060817spont.1 <- gssanova(event ~ e1t * e2t, data = DFB, + family = "binomial", seed = 19731004) Like Fig. 15 of Sec. 2.10.3, Fig. 23 shows the estimated intensity processes for each of our three models on the same part of the data. 35

200

250

250 N(Λ)

100 50 50

150

200

250

0

0.4

0.6

0.8

1.0

1.0 ECDF 0.6

0.8

1.0

0.0

0.8

1.0

1.0

0.8 0.6 0.6

0.8

1.0

0.0

0.2

0.4

0.6

Uk Uk+1 vs Uk 3 Xnt

1

2

3 1

−1 −3

−2

−1 −2 −3 1.0

0.8

0.4 0.4

0

Xnt 0.8

1.0

0.2 0.2

2

3 2 1 0 −1

0.6

0.8

0.0 0.0

Wiener

−2

0.4

0.6

Berman's Test

Uk

−3

0.2

0.4

Uk+1

Uk+1 0.4 0.2 0.0 0.6

Uk Uk+1 vs Uk

0.0

0.2

1.0

1.0 0.6

0.6 Uk+1 0.4 0.2

0.4

250

U(k)

0.8

0.8

1.0

0.4

Uk+1 vs Uk

0.0

0.2

200

0.2 0.2

U(k)

Berman's Test

150

Λ Uniform on Λ Test

0.0 0.0

U(k)

0.0

100

0.8

1.0 0.8 0.6

ECDF

0.2 0.0 0.2

50

Test

0.4

0.6 0.4 0.2 0.0 0.0

Xt

100

Λ Berman's

0.8

1.0

0 0

0.6

150

Λ Uniform on Λ Test

0.4

100

0

50

150

200

250 200 0

ECDF

150 0

50

100

N(Λ)

150 0

50

100

N(Λ)

200

250

Uniform on Λ Test

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

Figure 21: Ogata’s test series applied to the three considered models: event ∼ e1t (left column), event ∼ e1t + e2t (central column), event ∼ e1t ∗ e2t (right column). Data set: e060817spont, neuron 1. Fit done on first half and tests done on second. 36

1.0 0.0 −1.5

η1

0.0

0.2

0.4

0.6

0.8

1.0

0.6

0.8

1.0

0.0 −1.5

η2

1.0

e1t

0.0

0.2

0.4

e2t

0.6

0.8

1.0

e2t

. −0

4

0.

0.3 0.0 8 00..02 6 0.04 0.1 0.02

− 8 0.0 06.2 0−.0 0.04 −0.1 0.02

0.0

0.2

0.4

0

0

0.02 0.1 0.04 00.0 .26 0.0 8 0.3 0. 1 0. 4

0.0

4

0. 1

1 0. 0.3

0.02 −0.1 0.04 6 0−.00.2 8 0.0 .3 −.10 0

.4

−0

0.2

0.4

0.6

0.8

1.0

e1t

Figure 22: The three terms of model: event ∼ e1t ∗ e2t on the usual logit scale. The SE contours of the third term are drawn in black. The shape of the first term is very atypical for the data sets of the STAR package.

37

25 0

5

10

λ (Hz)

15

20

event ~ e1t event ~ e1t + e2t event ~ e1t * e2t

30.3

30.4

30.5

30.6

30.7

30.8

Time (s)

Figure 23: The intensity process of the three considered models for the data of neuron 1 of the e060817spont data set. The first half (≤ 29.25 s) of the data set was fitted with each model. The spike train of neuron 1 appears as a raster plot at the bottom of the plot while the train of neuron 2 appears as a raster at the top of the plot.

38

4

Analysis of data from neuron 2 of e060817spont data set

We are analysing here a neuron belonging to the same data set, e060817spont, as the previous one (Sec. 3) and functionally coupled to it. The model we will built will also include several previous isi s.

4.1

Loading and summarizing the data

We start by loading and summarizing the data: > data(e060817spont) > summary(e060817spont[["neuron 2"]])

data and summary

A spike train with 1229 events, starting at: 0.135 and ending at: 58.014 (s). The mean ISI is: 0.047 and its SD is: 0.102 (s). The mean log(ISI) is: -4.253 and its SD is: 1.297 The shortest interval is: 0.002 and the longest is: 0.835 (s). We have here a spike train with much more spikes, 1229, than in our two former cases (Sec. 2 and 3). The mean isi is therefore shorter but the SD is close to the one of our first case (Sec. 2.2).

4.2

Automatic analysis of neuron 2 from e060717spont

The automati analysis is performed like in Sec. 2.3 and 3.2 by calling reportHTML:

reportHTML

> reportHTML(e060817spont[["neuron 2"]], filename = "e060817spont_2", + directory = "report", otherST = e060817spont[c(1, 3)]) The spike train plot, the first plot on the generated html file, e060817spont_2.html, looks again “regular” like the previous one (Fig. 17) but this aspect disappears when the train is viewed with a finner time scale as shown on Fig 24. The renewalTestPlot suggests that the discharge is not compatible with a homogenous renewal process (not shown). The two cross-intensity plots suggest a significant functional coupling between neuron 2 and 17 as well as between neuron 2 and 3. For the pair (2,1) (Fig. 25, left) the former interpretation holds, that is, the symmetry of the cross-intensity plot suggests a common input. The shape of the cross-intensity plot of the pair (2,3) (Fig. 25, right) rather suggests an inhibition of neuron 3 by neuron 2.

4.3

Partial autocorrelation function

The partial autocorrelation function has several lags (2, 7, 9) at which its value is out of the confidence intervals as shown on Fig. 26: acf > acf(diff(e060817spont[["neuron 2"]]), type = "partial") 7

As already seen in the previous case where neuron 1 was the subject of the analysis, Sec. 3.3.

39

100

150

200

e060817spont[["neuron 2"]] zoom

0

50

Cumulative Number of Events

1000 600 200 0

Cumulative Number of Events

e060817spont[["neuron 2"]] whole train

0

10

20

30

40

50

60

10

12

Time (s)

14

16

18

20

Time (s)

Figure 24: Spike train plots of neuron 2 from data set e060817spont.

−0.2

−0.1

0.0

0.1

0.2

8

10

12

14

16

18

Cross−int.: N2 (ref.), N3 (test) PDF of a Test Neuron Spike (1/s)

14 12 10 8

PDF of a Test Neuron Spike (1/s)

Cross−int.: N2 (ref.), N1 (test)

−0.2

Time (s)

−0.1

0.0

0.1

0.2

Time (s)

Figure 25: Cross-intensity plots between neuron 2 and 1 (left) and between neuron 2 and 3 (right) of data set e060817spont.

40

0.00 −0.05

Partial ACF

0.05

Series diff(e060817spont[["neuron 2"]])

0

5

10

15

20

25

30

Lag

Figure 26: The partial autocorrelation function of neuron 2 of data set e060817spont.

41

4.4

Conclusions of the preliminary analysis

Our preliminary analysis suggests the necessity of the lag 2, 7 and 9 isi s in order to predict the discharge of neuron 1. Moreover our experience tells us to also include lag 1 isi . The cross-intensity plots suggest that including a functional coupling with neuron 1 might be useful and that a coupling with neuron 3 could be tried. We are therefore led to the following formal model: λ(t | Ht ) = f (t − tl2 , i1 , i2 , i7 , i9 , t − tl1 , t − tl3 )

(11)

For such a model the history is the set of all spike times of neurons 1, 2 and 3 observed up to (but not including) t.

4.5

Data frame for gss

Following the lines of Sec. 3.5 for including interactions with neurons 1 and 3 and of Sec. 2.5 for including isi s at different lags we build our data frame as follows: mkGLMdf > + > > > > >

DFC <2) DFC

subset(mkGLMdf(e060817spont, 0.001, 0, 58.5), neuron == within(DFC, i1 <- isi(DFC, within(DFC, i2 <- isi(DFC, within(DFC, i7 <- isi(DFC, within(DFC, i9 <- isi(DFC, DFC[complete.cases(DFC), ]

lag lag lag lag

= = = =

1)) 2)) 7)) 9))

We can now look at the first lines of DFC:

head

> head(DFC)

59031 59032 59033 59034 59035 59036

event 0 0 0 0 0 0

time neuron lN.1 lN.2 lN.3 i1 i2 i7 0.529 2 0.031 0.001 0.026 0.01 0.02 0.005 0.530 2 0.032 0.002 0.027 0.01 0.02 0.005 0.531 2 0.033 0.003 0.028 0.01 0.02 0.005 0.532 2 0.034 0.004 0.029 0.01 0.02 0.005 0.533 2 0.035 0.005 0.030 0.01 0.02 0.005 0.534 2 0.036 0.006 0.031 0.01 0.02 0.005

i9 0.249 0.249 0.249 0.249 0.249 0.249

We keep our previous approach an find a “map to uniform” function for each of the seven variables we want to include in our model: lN.1, lN.2, lN.3, i1, i2, i7, i9 and we use these functions to get our transformed variables: mkM2U > > > > >

m2u1c <- mkM2U(DFC, "lN.1", 0, 29.25) m2u2c <- mkM2U(DFC, "lN.2", 0, 29.25) m2u3c <- mkM2U(DFC, "lN.3", 0, 29.25) m2ui1c <- mkM2U(DFC, "i1", 0, 29.25) m2ui2c <- mkM2U(DFC, "i2", 0, 29.25) 42

> > > > > > > > >

m2ui7c m2ui9c DFC

<- mkM2U(DFC, "i7", 0, 29.25) <- mkM2U(DFC, "i9", 0, 29.25) within(DFC, e1t <- m2u1c(lN.1)) within(DFC, e2t <- m2u2c(lN.2)) within(DFC, e3t <- m2u3c(lN.3)) within(DFC, i1t <- m2ui1c(i1)) within(DFC, i2t <- m2ui2c(i2)) within(DFC, i7t <- m2ui7c(i7)) within(DFC, i9t <- m2ui9c(i9))

We do not show the empirical cumulative distribution functions here but they look good as the reader can easily check. The evolution of variables: event, e1t, e3t, i1t, i2t, i7t and i9t is shown on Fig. 27. Notice the sharp spike of the event time series around second 17. This corresponds to the unusual pause - burst sequence clearly visible on Fig. 24 (right). Since this is the only occurrence of this “feature” in the train we should not be too surprised if a model fitted on the second half of the data set turned out failing the tests when applied to the complete first half.

4.6

Models fits, tests and selection

We start here with a rather complex model compared to the ones we considered previously (Sec. 2.8.1, 2.8.5, 3.6): event ∼ e1t + e2t ∗ i1t + i2t + i7t + i9t + e3t We stick to our parallel implementation of Sec. 2.8.4 and 3.6, meaning that we have to create a function and a list for clusterApply(see, e.g., Sec. 3.6): > PFctC <- mkPFct(DFC) > gtListC <- list(early = with(DFC, time[time <= 29.25]), late = with(DFC, + time[time > 29.25])) We can now go ahead and fit our model to each of the two halves of the data set: > GF1.e060817spont.2 <- clusterApply(cl, gtListC, PFctC, fmla = event ~ + e1t + e2t * i1t + i2t + i7t + i9t + e3t) > GF2.e060817spont.2 <- clusterApply(cl, gtListC, PFctC, fmla = event ~ + e1t + e2t * i1t + i2t + i7t + i9t) > > > + > +

DFCl <- subset(DFC, time >= 20) PFctCl <- mkPFct(DFCl) gtListCl <- list(early = with(DFCl, time[time <= 39.25]), late = with(DFCl, time[time > 39.25])) GF1l.e060817spont.2.time <- system.time(GF1l.e060817spont.2 <- clusterApply(cl, gtListCl, PFctCl, fmla = event ~ e1t + e2t * i1t + i2t + 43

0.6 0.4

i2t

0.03 0.7

0.8

0.9

0.8 0.00

0.2

0.01

0.02

event

0.04

0.05

0.8

0.06

Evol. of event and e1t, e2t, i1t, i2t, i7t, i9t

Box filter: 1 s

0.8

0.6 0.8

0.90.3

0.3

0.4

0.4

0.5

0.5

i7t

e1t

0.6

0.7

Box filter: 1 s

Box filter: 1 s

0.6

i9t

0.6 0.8

0.2

0.2

0.3

0.4

0.4

0.5

e3t

0.7

Box filter: 1 s

0

10

20

30

40

50

60

0.7

Time Box filter: 1 s

0.5 0.2

0.3

0.4

i1t

0.6

Box filter: 1 s

0

10

20

30

40

50

60

Time Box filter: 1 s

Figure 27: Time evolution of event, e1t, e3t, i1t, i2t, i7t and i9t of data frame DFC built with data set: e060817spont. A box filter of 1.0 s has been applied.

44

+ i7t + i9t + e3t)) > GF2l.e060817spont.2.time <- system.time(GF2l.e060817spont.2 <- clusterApply(cl, + gtListCl, PFctCl, fmla = event ~ e1t + e2t * i1t + i2t + + i7t + i9t)) > GF3l.e060817spont.2.time <- system.time(GF3l.e060817spont.2 <- clusterApply(cl, + gtListCl, PFctCl, fmla = event ~ e1t + e2t * i1t + i2t + + i7t))

5

Software versions used for this vignette

The versions of R and of the other packages used in this tutorial are obtained with function sessionInfo: R version 2.9.2 (2009-08-24) x86_64-unknown-linux-gnu locale:

attached base packages: [1] splines stats graphics [8] base

grDevices utils

other attached packages: [1] snow_0.3-3 STAR_0.3-2 [5] mgcv_1.5-5 survival_2.35-6 [9] filehash_2.0-1

datasets

gss_1.0-5 R2HTML_1.59-1 cacheSweave_0.4-3 stashR_0.3-3

loaded via a namespace (and not attached): [1] digest_0.3.1 grid_2.9.2 lattice_0.17-25 nlme_3.1-94 [5] tcltk_2.9.2 tools_2.9.2

45

methods

References Per Kragh Andersen, Ørnulf Borgan, Richard D. Gill, and Niels Keiding. Statistical Models Based on Counting Processes. Springer Series in Statistics. Springer-Verlag, 1993. 3 Mark Berman and T. Rolf Turner. Approximating Point Process Likelihoods with GLIM. Applied Statistics, 41(1):31–38, 1992. 9 D. R. Brillinger. Maximum likelihood analysis of spike trains of interacting nerve cells. Biol Cybern, 59(3):189–200, 1988a. 2, 3, 9 D. R. Brillinger, H. L. Bryant, and J. P. Segundo. Identification of synaptic interactions. Biol Cybern, 22(4):213–228, May 1976. 7 David R. Brillinger. Some Statistical Methods for Random Process Data from Seismology and Neurophysiology. The Annals of Statistics, 16(1):1–54, March 1988b. 9 E. S. Chornoboy, L. P. Schramm, and A. F. Karr. Maximum likelihood identification of neural point process systems. Biol Cybern, 59(4-5):265–275, 1988. 9 Chong Gu. Smoothing Spline Anova Models. Springer, 2002. 9 D.H. Johnson. Point process models of single-neuron discharges. J. Computational Neuroscience, 3(4):275–299, 1996. 3 Petra Kuhnert and Bill Venables. An Introduction to R: Software for Statistical Modelling & Computing. Technical report, CSIRO Mathematical and Information Sciences, 2005. URL http://www.csiro.au/resources/Rcoursenotes.html. Available from: http://www.csiro.au/resources/Rcoursenotes.html. 8 Yosihiko Ogata. Statistical Models for Earthquake Occurrences and Residual Analysis for Point Processes. Journal of the American Statistical Association, 83(401):9–27, 1988. 3, 15 Christophe Pouzat and Antoine Chaffiol. Automatic Spike Train Analysis and Report Generation. An Implementation with R, R2HTML and STAR. J Neurosci Methods, in press, 2009. URL http://sites.google.com/site/spiketrainanalysiswithr/. Submitted manuscript. Pre-print distributed with the STAR package: http://cran. at.r-project.org/web/packages/STAR/index.html. 4, 6, 9 Christophe Pouzat and Antoine Chaffiol. On Goodness of Fit Tests For Models of Neuronal Spike Trains Considered as Counting Processes. URL http://www. biomedicale.univ-paris5.fr/physcerv/C_Pouzat/OnCPtests.html. 9, 15, 16, 48 Anthony Rossini, Luke Tierney, and Na Li. Simple Parallel Statistical Computing in R. UW Biostatistics Working Paper Series 193, University of Washington, 2003. URL http://www.bepress.com/uwbiostat/paper193/. Available from: http://www.bepress.com/uwbiostat/paper193/. 17 46

Jonathan Touboul and Olivier Faugeras. The spikes trains probability distributions: A stochastic calculus approach. Journal of Physiology, Paris, 101(1–3):78–98, jun 2007. URL https://hal.inria.fr/inria-00156557. Preprint available at: https: //hal.inria.fr/inria-00156557. 3

47

List of Figures 1 2 3 4 5 6 7 8

9

10 11

12

13 14 15

16

The spike train plot of neuron 1 of data set e060824spont. . . . . . . . . . Renewal test plot of neuron 1 of data set e060824spont. . . . . . . . . . . Cross-intensity plot and Cross correlation histogram between neuron 1 and 2 of data set e060824spont. . . . . . . . . . . . . . . . . . . . . . . . The partial autocorrelation function of neuron 1 of data set e060824spont. Empirical cumulative distribution function of lN.1 and i1. . . . . . . . . Empirical cumulative distribution function of e1t and i1t. . . . . . . . . Time evolution of event and i1t. A box filter of 0.5 s has been applied . Ogata’s tests battery applied to the time transformed second half of neuron 1 spike train (from data set: e060824spont) using model GF1e fitted on the first half. See Pouzat and Chaffiol for a description of the plots. . . Ogata’s tests battery applied to the time transformed first half of neuron 1 spike train (from data set: e060824spont) using model GF1l fitted on the second half. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Comparison of fits performed sequentially and in parallel on the spike train of neuron 1 of data set e060824spont. . . . . . . . . . . . . . . . . . Ogata’s tests battery applied to the time transformed neuron 1 spike train (from data set: e060824spont) using model GF2. Left column, fit first half, test second half (compare with Fig. 8). Right column, fit second half, test first (compare with Fig. 9). The first tests (upper right) of Fig. 8 and 9 are not shown here but are “passed” (i.e., within the confidence bands). Effects of the 3 terms of the selected model for neuron 1 of data set e060824spont. The abscissa scale corresponds to the percentiles of the variables. The ordinate logit scales are directly comparable. On the third plot the estimated interaction term is displayed as red contours while the estimated standard error is displayed as dotted black contours. . . . . . . Terms e1t (upper row) and i1t (lower row) with a probability scale (left column) and with a native scale (right column). . . . . . . . . . . . . . . . Examples of image (upper right), contour (upper row) and persp (bottom row) methods for quickPredict objects. . . . . . . . . . . . . . . . . . . . . The intensity process of the two considered models, with (red, continuous) and without (blue, dashed) interactions between the elapsed time since the last spike and the last isi. The first half (≤ 29.5 s) of the data set (e060824spont) was fitted with both models. . . . . . . . . . . . . . . . . Ogata’s tests battery applied to the time transformed neuron 1 spike train (from data set: e060824spont) using model GFb, that is no interaction between the two independent variables, lN.1 and i1 which are expressed on their native scales. Left column, fit first half, test second half. Right column, fit second half, test first. Compare with Fig. 11. The first tests (upper right) of Fig. 8 and 9 are not shown here but are “passed” (i.e., within the confidence bands). . . . . . . . . . . . . . . . . . . . . . . . . .

48

5 6 7 8 12 13 14

16

18 20

21

23 25 26

28

29

17 18 19 20 21

22

23

24 25 26 27

Spike train plots of neuron 1 from data set e060817spont. . . . . . . . . . Cross-intensity plot and Cross correlation histogram between neuron 1 and 2 of data set e060817spont. . . . . . . . . . . . . . . . . . . . . . . . Empirical cumulative distribution functions of e1t and e2t of data frame DFB built with data set: e060817spont. . . . . . . . . . . . . . . . . . . . Time evolution of event and e2t of data frame DFB built with data set: e060817spont. A box filter of 0.5 s has been applied. . . . . . . . . . . . Ogata’s test series applied to the three considered models: event ∼ e1t (left column), event ∼ e1t + e2t (central column), event ∼ e1t ∗ e2t (right column). Data set: e060817spont, neuron 1. Fit done on first half and tests done on second. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The three terms of model: event ∼ e1t ∗ e2t on the usual logit scale. The SE contours of the third term are drawn in black. The shape of the first term is very atypical for the data sets of the STAR package. . . . . . . . . The intensity process of the three considered models for the data of neuron 1 of the e060817spont data set. The first half (≤ 29.25 s) of the data set was fitted with each model. The spike train of neuron 1 appears as a raster plot at the bottom of the plot while the train of neuron 2 appears as a raster at the top of the plot. . . . . . . . . . . . . . . . . . . . . . . . Spike train plots of neuron 2 from data set e060817spont. . . . . . . . . . Cross-intensity plots between neuron 2 and 1 (left) and between neuron 2 and 3 (right) of data set e060817spont. . . . . . . . . . . . . . . . . . . The partial autocorrelation function of neuron 2 of data set e060817spont. Time evolution of event, e1t, e3t, i1t, i2t, i7t and i9t of data frame DFC built with data set: e060817spont. A box filter of 1.0 s has been applied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

31 31 33 34

36

37

38 40 40 41

44

Index %qp%, 22 %tt%, 15 acf, 7, 39 changeScale, 24 clusterApply, 19, 34, 43 clusterEvalQ, 19 complete.cases, 11 conditional intensity function, 1 contour, 22, 24, 26 counting process, 1–3, 15 cross-correlation histogram, 6 cross-intensity plot, 6, 30, 32, 39, 42 cumulative distribution function, 13 data, 2, 30, 39 data frame, 9 filter, 14 filtration, 2 gss, 9, 15, 32, 42 gssanova, 9, 15, 22 gssanova, 9, 15, 22 hazard function, 1 head, 9–11, 42 history, 1, 2, 6, 32, 42 image, 22, 24, 26 intensity function, 1–3, 6, 7 intensity process, 1, 15, 27, 28, 35, 38 inter spike interval, 7 isi, 7, 9–11, 28, 30, 32, 39, 42 library, 2 list, 32 makeCluster, 17 method, 15, 16, 22, 24, 26 mkGLMdf, 9, 32, 42 mkM2U, 13, 32, 42

object, 2, 9, 15, 22, 24, 26, 32 partial autocorrelation function, 7, 8, 30, 39, 41 penalized likelihood, 9 persp, 22, 24, 26 plot, 16, 22, 24 point process, 19 Poisson process homogenous, 3, 7, 15 predictLogProb, 20, 22 probabilistic approximation, 9 quickPredict, 22, 24, 26 R, 2, 11, 14, 17, 45 acf, 7, 39 complete.cases, 11 contour, 22, 24, 26 data, 2, 30, 39 filter, 14 head, 9–11, 42 image, 22, 24, 26 library, 2 list, 32 method, 15, 16, 22, 24, 26 object, 2, 9, 15, 22, 24, 26, 32 persp, 22, 24, 26 plot, 16, 22, 24 sessionInfo, 45 summary, 2, 9, 15, 30, 39 tail, 10 ts, 14 raster plot, 3 renewal process homogenous, 3, 6, 7, 30, 39 renewalTestPlot, 3, 30, 39 reportHTML, 3, 6, 30, 39 sessionInfo, 45 snow, 17, 19 clusterApply, 19, 34, 43 50

clusterEvalQ, 19 makeCluster, 17 stopCluster, 17 spike train plot, 7 spike train plot, 3, 30, 31, 39, 40 spikeTrain, 2, 3, 9, 32 STAR, 1, 2, 9, 10, 13, 15, 20, 22, 24, 37 %qp%, 22 %tt%, 15 changeScale, 24 isi, 10, 11 mkGLMdf, 9, 32, 42 mkM2U, 13, 32, 42 predictLogProb, 20, 22 quickPredict, 22, 24, 26 renewalTestPlot, 3, 30, 39 reportHTML, 3, 6, 30, 39 spikeTrain, 2, 3, 9, 32 stopCluster, 17 summary, 2, 9, 15, 30, 39 tail, 10 time transformation, 15 ts, 14

51