Species Identification using MALDIquant Sebastian Gibb∗ and Korbinian Strimmer † June 8, 2015

Abstract This vignette describes how to use MALDIquant for species identification.

Contents 1

Foreword

3

2

Other vignettes

3

3

Setup

3

4

Dataset

4

5

Analysis 5.1 Import Raw Data . . . . . . . . 5.2 Quality Control . . . . . . . . . 5.3 Transformation and Smoothing 5.4 Baseline Correction . . . . . . . 5.5 Intensity Calibration . . . . . . 5.6 Alignment . . . . . . . . . . . . 5.7 Peak Detection . . . . . . . . . . 5.8 Post Processing . . . . . . . . . ∗ [email protected][email protected]

1

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

4 4 5 7 8 10 10 11 13

5.9 5.10 5.11 5.12 5.13 6

Clustering . . . . . . . . . . . . . . . . . . Diagonal Discriminant Analysis . . . . . . Linear Discriminant Analysis . . . . . . . Variable Selection using Cross-Validation Summary . . . . . . . . . . . . . . . . . . .

Session Information

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. 14 . 15 . 17 . 18 . 21 21

2

1

Foreword

MALDIquant is free and open source software for the R (R Core Team, 2014) environment and under active development. If you use it, please support the project by citing it in publications: Gibb, S. and Strimmer, K. (2012). MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics, 28(17):2270–2271 If you have any questions, bugs, or suggestions do not hesitate to contact me ([email protected]). Please visit http://strimmerlab.org/software/maldiquant/.

2

Other vignettes

Please have a look at our other vignettes on https://github.com/sgibb/ MALDIquantExamples: • MALDIquant Introduction — a general introduction how to analyze mass spectrometry data using MALDIquant. • MALDIquantForeign Introduction — a general introduction how to import/export data using MALDIquantForeign. • Analysis of Fiedler et al. 2009 — a guidance to analyse the serum profile MALDI-TOF data described in Fiedler et al. (2009). • Bacterial Species Determination — a guidance to determine different species based on their MALDI-TOF spectra. • Mass Spectrometry Imaging — a guidance how to analyse mass spectrometry imaging data using MALDIquant.

3

Setup

Before any analysis we need to install the necessary packages (you can skip this part if you have already done this). You can install MALDIquant (Gibb 3

and Strimmer, 2012), MALDIquantForeign (Gibb, 2014), sda (Ahdesm¨aki and Strimmer, 2010) and crossval (Strimmer, 2014) directly from CRAN. To install this data package from http://github.com/sgibb/MALDIquantExamples you need the devtools (Wickham and Chang, 2014) package. install.packages(c("MALDIquant", "MALDIquantForeign", "sda", "crossval", "devtools")) library("devtools") install_github("sgibb/MALDIquantExamples")

4

Dataset

The dataset we use in this vignette was kindly provided by Dr. Bryan R. Thoma ([email protected]). It contains spectra of four different bacteria species. Each species is represented by eight individual samples and each sample has three technical replicates.

5

Analysis

First we have to load the packages. ## the main MALDIquant package library("MALDIquant") ## the import/export routines for MALDIquant library("MALDIquantForeign") ## example data library("MALDIquantExamples")

5.1

Import Raw Data

We use the getPathSpecies function to get the correct local file path to the spectra.

4

spectra <- import(getPathSpecies(), verbose=FALSE) We do a basic quality control and test whether all spectra contain the same number of data points and are not empty.

5.2

Quality Control

table(sapply(spectra, length))

20882 96 any(sapply(spectra, isEmpty)) [1] FALSE all(sapply(spectra, isRegular)) [1] TRUE Subsequently we ensure that all spectra have the same mass range. spectra <- trim(spectra) Finally we draw some plots and inspect the spectra visually. idx <- sample(length(spectra), size=2) plot(spectra[[idx[1]]])

5

15000 0

5000

intensity

species2.E12

5000

10000

15000

20000

mass /tmp/RtmpL5Hq1L/MALDIquantForeign_uncompress/spectra_1b8b305a7690/species2/0_E12/1/1SLin/fid

plot(spectra[[idx[2]]])

10000 5000 0

intensity

15000

species4.G10

5000

10000

15000

20000

mass /tmp/RtmpL5Hq1L/MALDIquantForeign_uncompress/spectra_1b8b305a7690/species4/0_G10/3/1SLin/fid

6

5.3

Transformation and Smoothing

We apply the square root transformation to simplify graphical visualization and to overcome the potential dependency of the variance from the mean. spectra <- transformIntensity(spectra, method="sqrt") In the next step we want to smooth our spectra with the Savitzky-GolayFilter (Savitzky and Golay, 1964). According to Bromba and Ziegler (1981) the best halfWindowSize should be smaller than the FWHM (full width at half maximum) of the peaks. We add the argument type="b" to the plot command to show both lines and data points in our plots. We count the data points in a few different regions of some spectra to estimate the average FWHM (of course this is not the most sophisticated method). In the figure below we consider all points above the dashed blue line and get a FWHM around 10-12 data points. We choose halfWindowSize=10. plot(spectra[[1]], type="b", xlim=c(2235.3, 2252.0), ylim=c(45, 100)) abline(h=72, col=4, lty=2)

100

species1.F10

90

● ● ●





80





70

● ● ● ●



60

intensity

● ●

2235

● ● ●

50

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●

2240

2245

● ● ●











● ● ● ●

2250

mass /tmp/RtmpL5Hq1L/MALDIquantForeign_uncompress/spectra_1b8b305a7690/species1/0_F10/1/1SLin/fid

7



plot(spectra[[1]], type="b", xlim=c(11220, 11250), ylim=c(24, 40)) abline(h=32, col=4, lty=2)

40

species1.F10



● ●

● ●

● ●

● ●

● ●

● ●



30

intensity

35



● ● ● ● ●















● ●



● ● ● ● ● ●





25

● ●

11220

11225

11230

11235

11240

11245

11250

mass /tmp/RtmpL5Hq1L/MALDIquantForeign_uncompress/spectra_1b8b305a7690/species1/0_F10/1/1SLin/fid

Afterwards we apply a 21 (2*halfWindowSize+1) point Savitzky-GolayFilter (Savitzky and Golay, 1964) to smooth the spectra. spectra <- smoothIntensity(spectra, method="SavitzkyGolay", halfWindowSize=10)

5.4

Baseline Correction

Matrix effects and chemical noise results in some background noise. That’s why we have to apply a baseline correction. In this example we use the SNIP algorithm (Ryan et al., 1988) to correct the baseline. Similar to the problem of the halfWindowSize in section 5.3 we need to choose a halfWindowSize respectively number of iterations for the baseline correction algorithm as well. The baseline should be flexible enough to follow trends but must not reduce the high of the peaks. We simply try a few different numbers of iterations. 8

## define iteration steps: 25, 50, ..., 100 iterations <- seq(from=25, to=100, by=25) ## define different colors for each step col <- rainbow(length(iterations)) plot(spectra[[1]], xlim=c(2000, 12000)) ## draw different baseline estimates for (i in seq(along=iterations)) { baseline <- estimateBaseline(spectra[[1]], method="SNIP", iterations=iterations[i]) lines(baseline, col=col[i], lwd=2) } legend("topright", legend=iterations, col=col, lwd=1)

50

100

25 50 75 100

0

intensity

150

species1.F10

2000

4000

6000

8000

10000

12000

mass /tmp/RtmpL5Hq1L/MALDIquantForeign_uncompress/spectra_1b8b305a7690/species1/0_F10/1/1SLin/fid

25 iterations are already very flexible but 50 is not flexible enough and the height of the peaks is not reduced very much. So we choose iterations=25 for the baseline removal.

9

spectra <- removeBaseline(spectra, method="SNIP", iterations=25) plot(spectra[[1]])

60 40 0

20

intensity

80

species1.F10

5000

10000

15000

20000

mass /tmp/RtmpL5Hq1L/MALDIquantForeign_uncompress/spectra_1b8b305a7690/species1/0_F10/1/1SLin/fid

5.5

Intensity Calibration

We perform the Total-Ion-Current-calibration (TIC; often called normalization) to equalize the intensities across spectra. spectra <- calibrateIntensity(spectra, method="TIC")

5.6

Alignment

Next we need to (re)calibrate the mass values. Our alignment procedure is a peak based warping algorithm. MALDIquant offers alignSpectra as a wrapper around more complicated functions. If you need a finer control or want to investigate the impact of different parameters please use determineWarpingFunctions instead (see ?determineWarpingFunctions for details). 10

spectra <- alignSpectra(spectra) We want to average the technical replicates before we are looking for peaks. Our spectra are recorded thrice for each spot. That’s why we average each spot. We get the spot information using the metaData method. metaData(spectra[[1]])$spot [1] "F10" We collect all spots with a sapply call (to loop over all spectra) and use this information to create our average spectra. Because some species are measured in different runs on the same spot location we also add the species name to average only corresponding spectra. spots <- sapply(spectra, function(x)metaData(x)$spot) species <- sapply(spectra, function(x)metaData(x)$sampleName) head(spots) [1] "F10" "F10" "F10" "F11" "F11" "F11" head(species) [1] "species1" "species1" "species1" "species1" "species1" "species1"

avgSpectra
5.7

Peak Detection

The peak detection is the crucial feature reduction step. Before performing the peak detection we need to estimate the noise of some spectra to get a feeling for the signal-to-noise ratio (SNR). We use a similar approach as in section 5.4.

11

## define snrs steps: 1, 1.5, ... 2.5 snrs <- seq(from=1, to=2.5, by=0.5) ## define different colors for each step col <- rainbow(length(snrs)) ## estimate noise noise <- estimateNoise(avgSpectra[[1]], method="SuperSmoother") plot(avgSpectra[[1]], xlim=c(6000, 16000), ylim=c(0, 0.0016)) for (i in seq(along=snrs)) { lines(noise[, "mass"], noise[, "intensity"]*snrs[i], col=col[i], lwd=2) } legend("topright", legend=snrs, col=col, lwd=1)

0.0005

0.0010

1 1.5 2 2.5

0.0000

intensity

0.0015

species1.F10

6000

8000

10000

12000

14000

16000

mass averaged spectrum composed of 3 MassSpectrum objects

2 or 2.5 look like a good compromise between sensitivity and specificity. We prefer a higher sensitivity and choose a SNR of 2 (blue line) for the peak 12

detection. For the halfWindowSize we use a similar value as determined in section 5.3. peaks <- detectPeaks(avgSpectra, SNR=2, halfWindowSize=10)

plot(avgSpectra[[1]], xlim=c(6000, 16000), ylim=c(0, 0.0016)) points(peaks[[1]], col="red", pch=4)

0.0010 0.0005 0.0000

intensity

0.0015

species1.F10

6000

8000

10000

12000

14000

16000

mass averaged spectrum composed of 3 MassSpectrum objects

5.8

Post Processing

After the alignment the peak positions (mass) are very similar but not identical. The binning is needed to make similar peak mass values identical. peaks <- binPeaks(peaks) We chose a very low signal-to-noise ratio to keep as much features as possible. To remove some false positive peaks we remove peaks that appear in less than 25 % (because we have four groups) of all spectra.

13

peaks <- filterPeaks(peaks, minFrequency=0.25) Finally we create the feature matrix and label the rows with the corresponding species and spot name. We need to recollect both information because we reduce the number of spectra in the average step (see section 5.6). spots <- sapply(avgSpectra, function(x)metaData(x)$spot) species <- sapply(avgSpectra, function(x)metaData(x)$sampleName) species <- factor(species) # convert to factor # (needed later in crossval) featureMatrix <- intensityMatrix(peaks, avgSpectra) rownames(featureMatrix) <- paste(species, spots, sep=".")

5.9

Clustering

Now we use the pvclust package (Suzuki and Shimodaira, 2011) to apply a hierarchical clustering analysis with bootstrapping. library("pvclust") pv <- pvclust(t(featureMatrix), method.hclust="ward.D2", method.dist="euclidean") Bootstrap Bootstrap Bootstrap Bootstrap Bootstrap Bootstrap Bootstrap Bootstrap Bootstrap Bootstrap

(r (r (r (r (r (r (r (r (r (r

= = = = = = = = = =

0.5)... 0.6)... 0.7)... 0.8)... 0.9)... 1.0)... 1.1)... 1.2)... 1.3)... 1.4)...

Done. Done. Done. Done. Done. Done. Done. Done. Done. Done.

plot(pv, print.num=FALSE) 14

Cluster dendrogram with AU/BP values (%) au bp

99 98 100 100 100 100 100 100 74 41 59 91 85 10085100 96 95 59 5489 65 7969 82 32 95 95 94 76 59 98 92 76 41 96 78 100 100 33 100 100 100 70 55 58 34 72 43 99 99 82727457 85 75

species1.F9 species1.G2 species1.F8 species1.G1 species1.F10 species1.F11 species1.F12 species1.F7 species3.F11 species3.F6 species3.F10 species3.F5 species3.F12 species3.F7 species3.F8 species3.F9 species2.E11 species2.E12 species2.F1 species2.F3 species2.F4 species2.F2 species2.F5 species2.F6 species4.H1 species4.G10 species4.G12 species4.H2 species4.H3 species4.H4 species4.G11 species4.G9

0.000

Height

0.010

85 52

Distance: euclidean Cluster method: ward.D2

5.10

Diagonal Discriminant Analysis

We finish our analysis using the diagonal discriminant analysis (DDA) function of sda (Ahdesm¨aki and Strimmer, 2010) to find the peaks that are typical for a specific species. library("sda") ddar <- sda.ranking(Xtrain=featureMatrix, L=species, fdr=FALSE, diagonal=TRUE) Computing t-scores (centroid vs. pooled mean) for feature ranking Number of variables: 448 Number of observations: 32 Number of classes: 4 Estimating optimal shrinkage intensity lambda.freq (frequencies): 1 15

Estimating variances (pooled across classes) Estimating optimal shrinkage intensity lambda.var (variance vector): 0.1016 plot(ddar) The 40 Top Ranking Features

Features

species1 9509.9082908597 6697.37315531073 6343.12994511205 3894.59552039585 6838.04355706256 6280.55512913356 6820.96931090482 3350.96367268277 6794.36299325359 2164.39516885158 2184.88038176653 7789.26583679527 5261.3491392045 7728.03664629315 2697.87256599023 3863.94267595189 4380.49396811573 5819.36116742754 6323.9030979498 6301.46074215648 4759.62173473773 9527.97987354071 2174.26998999705 3395.1223805425 6264.5008519463 4940.55836351892 5496.86099816506 7083.42884583907 5415.12514002414 2112.05086630122 4995.55770862589 2419.49099057275 5798.27735260874 2202.18490199803 7808.83428234077 6716.64492934104 2596.53680591393 2942.59673116952 5513.54703108012 2737.08275302511

● ● ● ● ● ●

species2 0 50

species3

● ● ● ● ● ●



● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

● ●

● ● ● ● ●



● ● ● ●

● ● ●

● ●

● ●

● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●





● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●



● ● ●

0





● ● ●

● ● ● ● ● ● ●

species4 0 50

● ● ● ● ●

50



● ● ●

● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

0

● ● ● ● ● ● ● ● ● ● ●

50

t−Scores (Centroid vs. Pooled Mean)

In the plot above we could see that the peak m/z 9509 seems to be typical for species2, m/z 6343 for species4 and so on.

16

5.11

Linear Discriminant Analysis

We try the linear discriminant analysis (LDA), too (it is part of sda (Ahdesm¨aki and Strimmer, 2010) as well). ldar <- sda.ranking(Xtrain=featureMatrix, L=species, fdr=FALSE, diagonal=FALSE) Computing cat scores (centroid vs. pooled mean) for feature ranking Number of variables: 448 Number of observations: 32 Number of classes: 4 Estimating optimal shrinkage intensity lambda.freq (frequencies): 1 Estimating variances (pooled across classes) Estimating optimal shrinkage intensity lambda.var (variance vector): 0.1016 Computing the square root of the inverse pooled correlation matrix Estimating optimal shrinkage intensity lambda (correlation matrix): 0.4737 plot(ldar)

17

The 40 Top Ranking Features

Features

species1 9509.9082908597 6697.37315531073 6343.12994511205 3894.59552039585 6838.04355706256 3350.96367268277 6280.55512913356 2164.39516885158 6820.96931090482 6794.36299325359 5261.3491392045 7789.26583679527 2184.88038176653 3863.94267595189 4759.62173473773 2174.26998999705 5819.36116742754 2697.87256599023 4380.49396811573 3395.1223805425 7083.42884583907 6264.5008519463 7728.03664629315 5496.86099816506 6323.9030979498 5591.8604864584 6301.46074215648 4940.55836351892 6716.64492934104 9527.97987354071 3541.5940251498 5798.27735260874 2112.05086630122 5415.12514002414 2737.08275302511 2419.49099057275 7808.83428234077 5068.41107088429 5887.82754404096 2202.18490199803

species2 species3 −50 50 150

● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ●



● ● ● ● ●

● ● ● ● ● ●



● ● ●

● ● ● ●

● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●

−50

50

150

species4 −50 50 150

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

−50

50

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

150

Correlation−Adjusted t−Scores (Centroid vs. Pooled Mean)

5.12

Variable Selection using Cross-Validation

In this section we want to apply cross-validation to find out, how many peaks and which ones we need to discriminate between the species. We use the package crossval (Strimmer, 2014). This package provides the crossval function which needs a specific prediction function. The prediction function combines the model creation, the prediction and the comparison between the true and the predicted results. 18

library("crossval") predfun <- function(Xtrain, Ytrain, Xtest, Ytest, numVars, diagonal=FALSE) { # estimate ranking and determine the best numVars variables ra <- sda.ranking(Xtrain, Ytrain, verbose=FALSE, diagonal=diagonal, fdr=FALSE) selVars <- ra[,"idx"][1:numVars] # fit and predict sda.out <- sda(Xtrain[, selVars, drop=FALSE], Ytrain, diagonal=diagonal, verbose=FALSE) ynew <- predict(sda.out, Xtest[, selVars, drop=FALSE], verbose=FALSE)$class # compute accuracy acc <- mean(Ytest == ynew) return(acc) } We want to repeat the cross-validation 20 times and use 5 folds. K <- 5 # number of folds B <- 20 # number of repetitions To test our cross-validation setup we want to determine the performance of DDA using the top 10 features (peaks) ranked by t scores. set.seed(12345) cv.dda10 <- crossval(predfun, X=featureMatrix, Y=species, K=K, B=B, numVars=10, diagonal=FALSE, verbose=FALSE) cv.dda10$stat [1] 1 19

In the next step we look for the optimal number of peaks (which is more interesting than calculating the performance for the top 10 features). We calculate the performance of the top 1-15 (and all features) in a similar way as the top 10 features in the example above. npeaks <- c(1:15, ncol(featureMatrix))

# number of peaks

First we use DDA. # estimate accuracy for DDA set.seed(12345) cvsim.dda <- sapply(npeaks, function(i) { cv <- crossval(predfun, X=featureMatrix, Y=species, K=K, B=B, numVars=i, diagonal=TRUE, verbose=FALSE) return(cv$stat) }) The same using LDA (the only difference is diagonal=FALSE). # estimate accuracy for LDA set.seed(12345) cvsim.lda <- sapply(npeaks, function(i) { cv <- crossval(predfun, X=featureMatrix, Y=species, K=K, B=B, numVars=i, diagonal=FALSE, verbose=FALSE) return(cv$stat) }) We combine the results and put them into a table. result.sim <- cbind(nPeaks=npeaks, "DDA-ACC"=cvsim.dda, "LDA-ACC"=cvsim.lda) We find out that LDA and DDA perform very similar and we need only 9 respectively 10 features (peaks) for a perfect discrimination of the species. 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

5.13

nPeaks DDA-ACC LDA-ACC 1 0.772 0.772 2 0.877 0.889 3 0.901 0.897 4 0.911 0.922 5 0.962 0.970 6 0.967 0.992 7 0.975 0.988 8 0.984 0.991 9 0.997 1.000 10 1.000 1.000 11 1.000 1.000 12 1.000 1.000 13 1.000 1.000 14 1.000 1.000 15 1.000 1.000 448 1.000 1.000

Summary

We have shown how to identify species based on MALDI spectra using MALDIquant and pvclust. Additionaly we performed a variable selection using sda and crossval to find the minimal number of peaks for a perfect discriminant.

6

Session Information • R version 3.2.0 (2015-04-16), x86_64-pc-linux-gnu • Base packages: base, datasets, grDevices, graphics, methods, stats, utils • Other packages: MALDIquant 1.11.15, MALDIquantExamples 0.4, MALDIquantForeign 0.9.11, corpcor 1.6.7, crossval 1.0.2, entropy 1.2.1, fdrtool 1.2.14, knitr 1.10.5, pvclust 1.3-2, sda 1.3.6, xtable 1.7-4

21

• Loaded via a namespace (and not attached): XML 3.98-1.2, base64enc 0.1-2, digest 0.6.8, downloader 0.3, evaluate 0.7, formatR 1.2, highr 0.5, magrittr 1.5, parallel 3.2.0, readBrukerFlexData 1.8.2, readMzXmlData 2.8, stringi 0.4-1, stringr 1.0.0, tools 3.2.0

References Ahdesm¨aki, M. and Strimmer, K. (2010). Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. The Annals of Applied Statistics, 4(1):503–519. Bromba, M. U. A. and Ziegler, H. (1981). Application hints for savitzkygolay digital smoothing filters. Analytical Chemistry, 53(11):1583–1586. Fiedler, G. M., Leichtle, A. B., Kase, J., Baumann, S., Ceglarek, U., Felix, K., ¨ Conrad, T., Witzigmann, H., Weimann, A., Schtte, C., Hauss, J., Buchler, M., and Thiery, J. (2009). Serum peptidome profiling revealed platelet factor 4 as a potential discriminating peptide associated with pancreatic cancer. Clinical Cancer Research, 15:3812–3819. Gibb, S. (2014). MALDIquantForeign: Import/Export routines for MALDIquant. R package version 0.7. Gibb, S. and Strimmer, K. (2012). MALDIquant: a versatile R package for the analysis of mass spectrometry data. Bioinformatics, 28(17):2270–2271. R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Ryan, C. G., Clayton, E., Griffin, W. L., Sie, S. H., and Cousens, D. R. (1988). SNIP, a statistics-sensitive background treatment for the quantitative analysis of PIXE spectra in geoscience applications. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms, 34:396–402. Savitzky, A. and Golay, M. J. E. (1964). Smoothing and Differentiation of Data by Simplified Least Squares Procedures. Analytical Chemistry, 36:1627–1639. 22

Strimmer, K. (2014). crossval: Generic Functions for Cross Validation. R package version 1.0.0. Suzuki, R. and Shimodaira, H. (2011). pvclust: Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling. R package version 1.2-2. Wickham, H. and Chang, W. (2014). devtools: Tools to make developing R code easier. R package version 1.5.

23

Species Identification using MALDIquant - GitHub

Jun 8, 2015 - Contents. 1 Foreword. 3. 2 Other vignettes. 3. 3 Setup. 3. 4 Dataset. 4. 5 Analysis. 4 .... [1] "F10". We collect all spots with a sapply call (to loop over all spectra) and ..... similar way as the top 10 features in the example above.

991KB Sizes 8 Downloads 399 Views

Recommend Documents

Automatic Bird Species Identification for Large Number of Species
is important to obtain reliable information about the popu- lation of wild animals. .... In our digital era, the analog signal is sampled, several times per second, and ...

DNA barcoding for species Identification in prepared ...
associations are focusing on food safety and quality. Species identification is a major concern due to the increased awareness among consumers regarding the composition of foods and the need to verify labeling statements. The increasing demand for fi

Using py-aspio - GitHub
Jan 17, 2017 - Load ASP program and input/output specifications from file ..... Programs can be loaded from files or strings: .... AI Magazine, 37(3):53–68.

Using FeatureExtraction - GitHub
Oct 27, 2017 - Venous thrombosis. 3. 20-24. 2. Medical history: Neoplasms. 25-29. 4. Hematologic neoplasm. 1. 30-34. 6. Malignant lymphoma. 0. 35-39. 7. Malignant neoplasm of anorectum. 0. 40-44. 9. Malignant neoplastic disease. 6. 45-49. 11. Maligna

Using SqlRender - GitHub
6. 3.5 Case sensitivity in string comparisons . .... would have had no clue the two fields were strings, and would incorrectly leave the plus sign. Another clue that.

speaker identification and verification using eigenvoices
approach, in which client and test speaker models are confined to a low-dimensional linear ... 100 client speakers for a high-security application, 60 seconds or more of ..... the development of more robust eigenspace training techniques. 5.

Sparse-parametric writer identification using ...
grated in operational systems: 1) automatic feature extrac- tion from a ... 1This database has been collected with the help of a grant from the. Dutch Forensic ...

Sparse-parametric writer identification using heterogeneous feature ...
Retrieval yielding a hit list, in this case of suspect documents, given a query in the form .... tributed to our data set by each of the two subjects. f6:ЮаЯвбЗbзбйb£ ...

Identification Using Stability Restrictions
algebra and higher computational intensity, due to the increase in the dimen- sion of the parameter space. Alternatively, this assumption can be motivated.

LANGUAGE IDENTIFICATION USING A COMBINED ...
over the baseline system. Finally, the proposed articulatory language. ID system is combined with a PPRLM (parallel phone recognition language model) system ...

Multipath Medium Identification Using Efficient ...
proposed method leads to perfect recovery of the multipath delays from samples of the channel output at the .... We discuss this connection in more detail in the ...

Sparse-parametric writer identification using heterogeneous feature ...
The application domain precludes the use ... Forensic writer search is similar to Information ... simple nearest-neighbour search is a viable so- .... more, given that a vector of ranks will be denoted by ╔, assume the availability of a rank operat

Sparse-parametric writer identification using ...
f3:HrunW, PDF of horizontal run lengths in background pixels Run lengths are determined on the bi- narized image taking into consideration either the black pixels cor- responding to the ink trace width distribution or the white pixels corresponding t

SPEAKER IDENTIFICATION IMPROVEMENT USING ...
Air Force Research Laboratory/IFEC,. 32 Brooks Rd. Rome NY 13441-4514 .... Fifth, the standard error for the percent correct is zero as compared with for all frames condition. Therefore, it can be concluded that using only usable speech improves the

Electromagnetic field identification using artificial neural ... - CiteSeerX
resistive load was used, as the IEC defines. This resistive load (Pellegrini target MD 101) was designed to measure discharge currents by ESD events on the ...

Introduction to phylogenetics using - GitHub
Oct 6, 2016 - 2.2 Building trees . ... Limitations: no model comparison (can't test for the 'best' tree, or the 'best' model of evolution); may be .... more efficient data reduction can be achieved using the bit-level coding of polymorphic sites ....

Instructions for using FALCON - GitHub
Jul 11, 2014 - College of Life and Environmental Sciences, University of Exeter, ... used in FALCON is also available (see FALCON_Manuscript.pdf. ). ... couraged to read the accompanying technical document to learn ... GitHub is an online repository

Gene Identification via Phenotype Sequencing Version 1.5 ... - GitHub
1. Gene Identification via Phenotype Sequencing. Version 1.5. User manual. Zhu Z, Wang WT, Zhu JH, and Chen X. 2015-08-01 ...

Using bird species community occurrence to ... - Wiley Online Library
point counts in a 1560 km2 study area, remote-sensed data and models incorporating imperfect ... restore old-growth conditions and communities (Jönsson et al.

Electromagnetic field identification using artificial neural ...
National Technical University of Athens, 9 Iroon Politechniou Str., 157 80 Athens. 4. National ..... Trigg, Clinical decision support systems for intensive care units: ...

Character Identification in Movie Using Movie Script - IJRIT
M.Tech Student, Department of Computer Science and Engineering ... Names for the clusters are then manually selected from the cast list. ..... Video and Image processing in multimedia system, Cloud Computing and Biometric systems.

Character Identification in Movie Using Movie Script - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 1, Issue ..... In this paper, the preserved statistic properties are utilized and a proposal to .... processing in multimedia system, Cloud Computing and Biometric systems.