2014 IEEE International Symposium on Multimedia

Automatic Segmentation of Audio Signals for Bird Species Identification Thiago L. F. Evangelista1 , Thales M. Priolli1 , Carlos N. Silla Jr.1 , Bruno A. Angelico1 , Celso A. A. Kaestner2 Federal University of Technology – Paran´a 1 Corn´elio Proc´opio Campus: Av. Alberto Carazzai, 1640 – 86.300–000 – Corn´elio Proc´opio, Paran´a, Brazil 2 Curitiba Campus: Av. 7 de Setembro, 3165 – 80.230–901 – Curitiba, Paran´a, Brazil thi [email protected], thales [email protected], {carlosjunior, bangelico, celsokaestner}@utfpr.edu.br

Abstract—The identification of bird species from their audio recorded songs are nowadays used in several important applications, such as to monitor the quality of the environment and to prevent bird-plane collisions near airports. The complete identification cycle involves the use of: (a) recording devices to acquire the songs, (b) audio processing techniques to remove the noise and to select the most representative elements of the signal, (c) feature extraction procedures to obtain relevant characteristics, and (d) decision procedures to make the identification. The decision procedures can be obtained by Machine Learning (ML) algorithms, considering the problem in a standard classification scenario. One key element is this cycle is the selection of the most relevant segments of the audio for identification purposes. In this paper we show that the use of short audio segments with high amplitude - called pulses in our work - outperforms the use of the complete audio records in the species identification task. We also show how these pulses can be automatically obtained, based on measurements performed directly on the audio signal. The employed classifiers are trained using a previously labeled database of bird songs. We use a database that contains bird song recordings from 75 species which appear in the Southern Atlantic Coast of South America. Obtained results show that the use of automatically obtained pulses and a SVM classifier produce the best results; all the necessary procedures can be installed in a dedicated hardware, allowing the construction of a specific bird identification device.

a vulture with a commercial plane is equivalent to an impact of seven tones [25]. There is also a financial impact: the losses of the airline companies due to this type of accident is estimated as higher than US$ 3 millions per year [8]. Having more information, authorities can act to eliminate the problem. The above reasons justify the construction of a specific equipment to deal with bird species identification and counting. Using the recordings of songs produced by birds, the identification task can be done using Signal Processing (SP) techniques [2], [20] and Machine Learning (ML) algorithms [12], [26], [36]. An automatic bird species identification system requires: 1) a recording device to obtain the audio signal of the bird song in the field; 2) the use of audio processing techniques to improve the signal quality, since the recordings occur in noisy situations (gloves and forests for environmental monitoring, or in urban areas in the case of bird control near airports); 3) the selection of the most representative audio segments for the identification task; 4) the extraction of a set of discriminative features from the selected audio segments; 5) the use of obtained feature set as input of decision procedure to predict the species of the bird that originates the songs. In the supervised learning framework, the decision procedure employed in the last step is obtained as output of some ML classification algorithm. In general, ML algorithms require a database of recorded songs of the bird species previously labeled by an expert in order to train the classifier [26]. An adequate preprocessing, the selection of the most adequate segments of the audio, the choice of the adequate feature set and classification algorithm are the key points to obtain good classification performance [36]. We focus this work mainly on the segmentation step, that is, on the selection of the most representative parts of the audio signal. In the following, Section II summarizes similar research works and briefly formalize the automatic bird species identification problem; Section III describes the initial signal processing, pulses detection, the feature extraction procedures and indicate the classification algorithm; Section IV outlines the database used in the experiments, and presents the experimental results obtained in three identification experiments;

Keywords-Signal Processing; Pattern Recognition; Machine Learning; Bird Species Identification.

I. I NTRODUCTION The interaction between humans and birds occurs in several scenarios. Birds are numerous and easier to detect than other animal species: one can record their songs, which is a indirect and clever way to do their identification [5]. One important reason to monitor birds is the control and monitoring of the environment. Some bird species are particularly sensible to water and air pollution [1], [3]. So, detecting the number of birds of these species can be used to detect and prevent environmental problems. Birds are also responsible by several accidents involving planes. The Brazilian Center for Aeronautical Accident Investigation and Prevention (CENIPA) [8] reported that in two years a total of 1,321 aerial accidents occurs, in the Brazilian airspace, involving the collision of birds and airplanes. These events are potentially dangerous: for instance, the collision of 978-1-4799-4311-1/14 $31.00 © 2014 IEEE 978-0-7695-5437-2/14 DOI 10.1109/ISM.2014.46

223

finally, Section V presents our conclusions and indicates future research directions.

from 420 bird species. Their best obtained accuracy result was 73.19 %, using a backpropagation neural network. Lee, Han and Chuang [21] try to split the original audio signals into syllable segments, considered as the basic recognition unit. Then MFCC coefficients are computed, and the Gaussian Mixture Model and Vector Quantization were employed to find the most appropriate number of components and clusters for each species. The best obtained result was an accuracy of 84 % for a database composed of 28 bird species. Briggs et all. [4] employ a probabilistic framework to compute the audio features and a Bayes classifier that minimizes risk; the authors also show that this approach obtains results similar to a nearest-neighbor classifier where the KullbackLeibler divergence distance is used to compare histograms of features; the proposed classifiers achieve over 90 % accuracy on a dataset that contains 6 species of birds. Chou and Liu [11] use wavelets transformations to identify sections in the bird songs; for each section the first five MFCC coefficients are computed, and a Neural Network classifier is used in the species identification; they achieve an accuracy of 73.41 % in a database composed by 420 bird species. Graciarena et al. [16] use an unsupervised approach to obtain approximate note models from acoustic features; these models are used to create a recognition system that employs a n-gram statistical model developed for speaker recognition applications. In a database of 9 bird species they obtain an accuracy of 83.5 %. Neal et al. [27] employ a time-frequency audio segmentation performed by a Random Forest classifier to extract syllables of the bird songs; their method isolates 93.6 % of the acoustic energy of bird song, with a false positive rate of 8.6 %, using a test dataset of 625 field audio records. Lopes at al. [22] use three feature sets and several ML algorithms in the bird species identification problem; their goal was to evaluate the various combinations of feature sets and classifiers; the best result was obtained using short intervals of the audio records, the MARSYAS feature set [24] and a Multi Layer Perceptron. In a database composed of 101 audio records of 3 species - similar to the one used in [35], they obtain an F-measure (harmonic mean between precision and recall) of 99.7 %. The paper of Lopes et al. [23] uses a larger database that contains 1,619 bird songs recordings of 73 bird species recorded in a specific geographic region; the main objective of this work was to evaluate the performance that can be achieved when dealing with a great number of bird species. The paper shows that the F-measure drops from 95,1 % to 78,2 % when passing from 3 to 20 species, in the best results obtained by a SVM classifier. Kiapuchinski et al. [18] propose a hardware device to perform the automatic identification of birds; the proposed equipment includes audio filtering, using a spectral noise gate algorithm, that shows to improve the quality of the extracted features. We can identify some common points in the previously research works: (1) the need of strong audio preprocessing

II. B IRD S PECIES I DENTIFICATION USING AUDIO S IGNALS A. Similar Research Works As mentioned earlier, it is possible to identify a bird species from its recorded song. In fact, this problem has been studied by several researchers. Kwan et al. [19] propose a bird identification device to detect dangerous birds near airports. The authors employ a Hidden Markov Model and the Gaussian Mixture Model to perform the classification, achieving poor results due to the low signal-to-noise relation in the airport environment. Somervuo, H¨arm¨a and Fagerlund [31] apply SP techniques to the audio input signal. Using a sinusoidal modeling and MelFrequency Cepstral Coefficients (MFCC), in their best result they obtain an accuracy of 71.3 % when detecting the species of 14 common North-European birds. Vilches et al. [35] employ data mining techniques to the problem: they use the well-known ID3, J4.8, and Na¨ıve Bayes classification algorithms. Their best result – an accuracy of 98.39 % in a database with 154 songs for 3 bird species – was obtained using the J4.8 algorithm and a feature set composed by the all the audio characteristics generated by the Sound Ruler audio processing tool [32]. The authors also conclude that the identification of distinctive features is the crucial part of this application. Chou, Lee and Ni [9] propose the segmentation of the bird songs into syllables, obtained from the frequency spectrum of the signal. Syllables were obtained by the fuzzy C-means clustering method; each bird song is then modeled as a sequence of syllables using a Hidden Markov Model. In a database of 420 bird species they obtain a recognition rate of 78 %. Fagerlund [13] employ a series of Support Vector Machines (SVM) classifiers to perform bird identification. The classifiers were organized in a tree-like structure, where each node is used to separate two classes. Two feature sets were tested: the MFCC coefficients and a set of low level signal parameters. The best obtained accuracy result was 98 % in a database with 8 bird species. Cai et al. [6] use several feature sets and a neural network framework to investigate the problem. The proposed neural network architecture considers the dynamic nature of the audio signal; noise reduction algorithms were also applied to the signals. Their best obtained accuracy result was 85.6 % in a database which contains songs from 14 bird species, using a feature set composed by the MFCC coefficients. Chou, Liu and Cai [10] propose the use of an enhanced syllable segmentation method, based on Rabiner and Sambur endpoint detection method. This method is combined with a MFCC-based feature vector to deal simultaneously with syllable detection and birdsong recognition. They use songs from a commercial CD with bird songs recorded in the nature

224

from an recorded signal S. This is a particularly important point in this problem, since the audio records of bird songs usually contains long silent periods and are made in a noisy environment.

techniques to improve signal quality; (2) the usage of signal segmentation, searching to find the most representative parts (syllables, segments or pulses) in the bird songs; (3) the application of several algorithms to treat the identification problem in the classical ML classification scenario. In this paper we focus on the automatic selection of the most relevant audio parts for classification.

A. Segmentation The most representative segments for the identification task – pulses in our work – are obtained from an automatic segmentation procedure that includes several steps. Initially a Fast Fourier Transform (FFT) is applied to the input signal S in order to obtain the representation in the frequency domain. Then the energy of the signal and its spectral centroid are computed. Finally median filters and a windowing scheme to smooth the end parts of the signals are employed. The complete decision procedure that detects the bird song pulses can be outlined as follows: 1) two data sequences are computed from the signal spectrum, one for signal energy and the other for the spectrum centroid; 2) the median filter is applied to the corresponding histograms of these sequences; 3) the local maximum of the histograms is computed; 4) from the first two maximum found (X1 and X2 ) a segment limit is computed by

B. Bird Species Identification as a Classification Problem The automatic bird species identification using audio signals can be defined as the problem of finding the species of a specific bird from its recorded sounds. Bird sounds can be divided in songs and calls; songs are more melodious and are related to mating, whereas calls are short and transient sounds used in alert situations [7]. Bird songs are considered by experts as ideal for species identification [7], [35] and are considered here. In a digital recording device the sound is stored as a sequence S = of numeric values in a convenient scale, obtained by a sampling procedure; this sequence is employed in sound reproduction and for analytical processes [17]. The complete S signal can be segmented and preprocessed in order to select its most representative parts RS . Let us assume that this selection is accomplished by a function σ, that is RS = σ(S). Hence several features can be extracted from RS : if χj is an extraction function and Xj is the feature domain, ¯ = it is possible to obtain the feature vector X from RS , where each feature xj = χj (S). Identification relies on previous known audio records. We assume that a database of bird songs of a finite set B of species – manually labeled by experts – is available. If this scenario occurs then it is easy to put the focused problem as a standard ML classification task [12]: given a set of evidences ¯ obtained from some parts RS of the input signal S, it is X necessary to find the class ˆb ∈ B that most likely indicates the species of the bird that originates S. Using the Probability Theory the problem can be outlined as follows: (1) from the training database it is possible to predict how the features occur in each class, that is, the conditional ¯ probabilities (estimated by frequencies) P (X|b) for feature vectors X and b ∈ B; (2) the a priori probabilities of the bird species P (b) for b ∈ B can estimated by the class frequencies; (3) to classify a new audio record S, the segmentation and extraction functions σ and χj are employed to compute the ¯ (4) using the Bayes’ Theorem, we can vector of evidences X; ¯ and assign the obtain the a posteriori probabilities P (b|X), most probable class to input by ¯ ˆb = arg max P (X|b).P (b) ¯ b∈B P (X)

L=

J · X1 + X2 J +1

(2)

where J is a predefined windowing parameter. The procedure is executed for each window in the two sequences, in order to produce the thresholds L1 and L2 for each signal pulse. In Figure 1 we illustrate the process of obtaining pulses from a song of the Dusty Antbird (CercomacraTyrannina). We emphasize that the audio segmentation of the bird songs in pulses by the proposed SP procedure is an original contribution of this paper. B. Feature Set In order to obtain the feature vectors from the selected pulses we employ the MARSYAS framework [24], [34], a feature extractor that has been used successfully in several audio applications, specially in music applications [14], [15], [22], [23], [28], [29], [34]. As shown by Lopes et al. [22] the MARSYAS feature set has the best performance in the bird identification task when compared to other feature sets. The feature vector produced by MARSYAS is 64dimensional and includes the twelve initial Mel-Frequency Cepstral coefficients (MFCC), and the means and variances of timbral features calculated in the pulses. The Mel scale is an acoustically defined scale created from a study by Stevens et al. [33] that relates physical frequencies to the frequencies that are perceived by the human ear; this scale has been used in several applications involving speech and music. The employed timbral features include the centroid of the frequency spectrum, the spectral flux, that estimates the variability of magnitudes in this spectrum, and the rolloff, which is related

(1)

III. S EGMENTATION , F EATURE E XTRACTION AND C LASSIFICATION P ROCEDURES As previously explained, to improve the classification performance it is important to select the most representative parts

225

Brazilian Coast. The database include songs of bird species that area common to two main ecosystems, the Atlantic forest which occurs in the coastal region of Brazil and the Arauc´aria forest, composed by huge trees that are characteristic from the high-lying plateau of South America. This database was first employed by Lopes et al. [23] and includes 1,619 bird songs from 76 bird species. All bird songs were downloaded from the Xeno-Canto website [37], that contains audio records obtained directly in real environments, and therefore, with no previous preprocessing. The complete methodology employed to create this database is also described in the above referenced paper. B. Experiments In order to evaluate the proposed segmentation, several experiments were conducted, according to three scenarios: 1) the first one employs the bird songs exactly as recorded in the field; 2) the second one uses a manually-made segmentation step; in this case we manually split each audio signal in the database – that corresponds to a recorded bird song of a specific species – according to shot sound intervals were the signal present higher amplitudes – called pulses as previously explained; and finally 3) the third scenario also employ pulses, but in this case they are detected automatically using decision procedures that employ signal processing techniques. The Southern Atlantic Brazilian Coast Database was employed in all the experiments. As explained it contains 1,619 bird songs from 76 bird species; in the second scenario 10,206 pulses were obtained in the manual segmentation of the database; using the automatic procedure, in the third scenario, 20,203 pulses were obtained. As indicated, in the experiments we use the Support Vector Machines paradigm, using the Platt’s Sequential Minimization Optimization (SMO) implementation and the Polynomial kernel. Experiments were executed using the WEKA platform and a 10-fold cross-validation procedure. The obtained results are as follows: • Using the complete audio signals as recorded, the obtained overall accuracy is 29.77 %. This result corresponds to the mean of the accuracies for all classes; we observe that the employed database contains bird songs from 76 species. Using the simplifying hypothesis that the classes contain the same number of audio records, the probability of selecting a specific class at random is 1/76, or only 1.32%. The obtained results also confirm the conclusions presented in [23]: the segmentation of the audio signals in their most representative parts is absolutely necessary to improve classification. We recall that in this paper the authors evaluate the effect of the number of classes in the classification performance, and the F-measure drops from 95,1 % to 78,2 % when passing from 3 to 20 bird species. • Using the manual segmentation of the original bird songs in pulses, which are selected as signal regions with

Fig. 1. A bird song audio of the Dusty Antbird and its segmentation in pulses

to the concentration of the magnitudes in frequency ranges; finally, the signal zero crossings are computed in the timedomain. C. Classification algorithm In the papers of Lopes et al. [22], [23] several classifiers based on different paradigms were employed to select the most adequate classifier for the bird species identification problem. The tested classifiers were the classical probabilistic Na¨ıve Bayes, the instance-based k nearest neighbors (kN N ) with k = 3, the decision tree classifier J4.8, a multi-layer perceptron neural network trained with the back-propagation momentum algorithm and the support vector machine (SVM) classifier, with polynomial and Pearson kernels. The SVM classification algorithm achieves the best results, obtaining a F-measure of 99.4 % in a database with 3 species [22]. Lopes et al. [23] also obtain a their best result in the same database that is employed in this work using SVM: a Fmeasure of 89.7 % for 8 bird species. So, the experiments conducted in this work employ the SVM algorithm (SMO implementation, using the Pearson kernel). IV. B IRD S PECIES I DENTIFICATION E XPERIMENTS A. Database In this paper experiments we employ a database composed of songs from bird species that appear in the Southern Atlantic

226

TABLE I ACCURACY FOR THE DIFFERENT SEGMENTATION METHODS FOR THE INDIVIDUAL CLASSES ( BIRD SPECIES ). Class (Bird Species) Aegolius harrisii Amazilia versicolor Amazona Pretrei Anthus lutescens Attila rufus Automolus leucophthalmus Basileuterus leucoblepharus Batara cinerea Brotogeris tirica Campephilus robustus Camptostoma obsoletum Campylorhamphus falcularius Certhiaxis cinnamomeus Chiroxiphia caudata Chlorophanes spiza Cichlocolaptes leucophrus Clibanornis dendrocolaptoides Cnemotriccus fuscatus Colaptes campestris Colonia colonus Cranioleuca obsoleta Crypturellus noctivagus Culicivora caudacuta Cyanocorax caeruleus Drymophila malura Dysithamnus mentalis Emberizoides ypiranganus Gnorimopsar chopi Hemitriccus kaempferi Hemitriccus orbitatus Hypoedaleus guttatus Lathrotriccus euleri Leucochloris albicollis Leucopternis polionotus Mackenziaena leachii Malacoptila striata Mimus saturninus Muscipipra vetula

Manual Seg 88.80% 74.80% 66.70% 39.40% 76.90% 67.00% 61.80% 68.60% 55.30% 27.90% 45.10% 41.60% 48.10% 66.80% 50.30% 37.50% 51.40% 28.40% 54.80% 33.60% 45.90% 30.80% 06.70% 51.80% 54.20% 48.20% 24.00% 42.20% 0% 38.70% 62.80% 71.60% 54.70% 35.30% 38.10% 05.60% 62.10% 41.90%

Automatic Seg 80.70% 24.80% 58.70% 23.20% 45.60% 63.60% 71.10% 47.00% 46.90% 03.10% 39.40% 31.40% 37.40% 69.20% 48.30% 38.30% 43.90% 17.30% 43.10% 47.10% 32.80% 34.60% 19.70% 57.80% 52.00% 50.50% 28.40% 39.60% 0% 33.10% 54.80% 63.20% 31.60% 73.90% 50.00% 0% 50.50% 66.10%

Class (Bird Species) Myiobius barbatus Myiodynastes maculatus Myiophobus fasciatus Myrmeciza squamosa Orthogonys chloricterus Philydor atricapillus Phleocryptes melanops Phyllomyias griseocapilla Phylloscartes kronei Picumnus temminckii Piprites chloris Piprites pileata Polioptila dumicola Poospiza nigrorufa Procnias nudicollis Pseudoleistes guirahuro Pyriglena leucoptera Ramphastos dicolorus Ramphocelus bresilius Ramphodon naevius Saltator similis Schiffornis virescens Scytalopus iraiensis Sittasomus griseicapillus Streptoprocne biscutata Stymphalornis acutirostris Synallaxis spixi Tangara desmaresti Tangara peruviana Thamnophilus ruficapillus Theristicus caudatus Thraupis palmarum Thryothorus longirostris Trichothraupis melanops Trogon surrucura Vanellus chilensis Xenops minutus Xiphorhynchus fuscus

Manual Seg 36.60% 55.50% 100% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 65.90% 0% 37.30% 0% 78.60% 75.00% 60.40% 22.20% 71.80% 75.50% 78.30% 62.60% 22.60% 75.80% 70.20% 57.00% 56.70%

Automatic Seg 08.10% 37.60% 43.20% 63.70% 25.50% 43.60% 52.40% 29.10% 19.80% 11.80% 52.00% 32.80% 33.60% 75.00% 84.00% 51.70% 44.70% 78.40% 45.80% 40.00% 41.40% 74.10% 93.00% 63.50% 30.20% 82.50% 89.90% 79.80% 80.60% 74.80% 89.10% 58.80% 73.80% 64.60% 73.10% 82.30% 71.30% 0%

mentation approach has improved the classification accuracy of 43 out of 76 bird species, including 20 bird species that the manual segmentation approach failed to classify (accuracy = 0% in this cases). In these cases possibly the visual process employed to select the pulses in manual segmentation was unable to detect the correct audio segments, whereas the automatic segmentation defined in section III-A performs this task adequately.

high signal amplitudes, the obtained overall accuracy was 52.78%; • Finally, when using the automatic segmentation is depicted in section III, we obtain an overall accuracy of 59.75%. The comparison of the two last experiments is discussed in the following. As shown, the use of automatically detected pulses produce the best overall results. We argue that pulses are exactly the signal regions that better characterize the bird vocalization [22]. The automatic computation of the pulses, based on a decision procedure that acts on the frequency domain and are based on the energy and the centroid of the signal histogram, outperforms the manual segmentation procedure. Table I presents the accuracies obtained in the two last experiments, that is, using manual segmentation and using automatic segmentation, for each individual bird species in the database. The analysis of Table I shows that although the automatic segmentation approach is the best overall approach, there are some bird species where the use of the manual segmentation approach provides better results. However, the automatic seg-

V. C ONCLUSIONS AND F UTURE W ORK This paper deals with the segmentation step of the audio signal in the automatic bird species identification problem. We present experiments conducted in a database composed by bird songs from 76 species which are present in the Southern Atlantic Coast of South America. Three scenarios were considered: the first one is composed of bird songs as recorded in the field; the second one uses the manual segmentation of the audio signal in order to select only short regions where the audio signal has higher amplitudes, called pulses; the third scenario considers the automatic segmentation of the audio in pulses by on a de-

227

cision procedure which is based on classical signal processing measures computed on the bird song. The experimental results show that the automatic segmentation of the bird songs is very important to improve the classification performance. Our explanation to this fact is that the acoustic information which appears in pulses contains less environmental noise and incorporates the most important features of the corresponding bird song. The defined decision procedure employed to obtain pulses from an audio signal uses simple signal processing techniques and can be easily incorporated in small computational devices. We must now incorporate the described pulse selection technique and the bird species classifier in a dedicated hardware device designed specifically for bird species identification [18]. As future work, we plan to apply the automatic segmentation procedure in the hierarchical classification of bird species, following the framework already defined in [30].

[15] F. Gouyon, S. Dixon, E. Pampalk and G. Widmer, “Evaluating rhytmic descriptions for music genre classification”, Proceedings of the 25th International AES Conference on Virtual, Synthetic and Entertainment Audio, London, UK, 2004. [16] M. Guaciarena, M. Delplanche, E. Shriberg and A. Stolcke, “Bird Species Recognition Combining Acoustic and Sequence Modeling”, Proceedings of the 36 th. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, pp. 341–344, May 2011. [17] S. Hacker, MP3: The Definitive Guide, O’Reilly Publishers, 2000. [18] D.M. Kiapuchinski, C.R.E. Lima, and C.A.A. Kaestner “Spectral Noise Gate Technique Applied to Birdsong Preprocessing on Embedded Unit”, Proceedings of the 2011 IEEE International Symposium on Multimedia (ISM2012), Anaheim, California, pp 24–27, 2012. [19] C. Kwan, X. Zhao, Z. Ren, R. Xu, V. Stanford, C. Rochet, J. Aube and K.C. Ho, “Bird Classification Algorithms: Theory and Experimental Results”, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), Montreal, Canada, Vol. 5, pp 289–292, May 2004. [20] B.P. Lathi, Signal Processing and Linear Systems, 2nd. ed. Oxford University Press, 2004. [21] C-H. Lee, C-C Han and C-C. Chuang, “Automatic Classification of Bird Species from their Sounds using Two-Dimensional Cepstral Coefficients”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 16, No. 8, pp. 1541–1550, 2008. [22] M.T. Lopes, C.N. Silla Jr., A.L. Koerich and C.A.A. Kaestner, “Feature Set Comparison for Automatic Bird Species Identification”, in Proceedings of the 2011 IEEE International Conference on Systems, Man and Cybernetics (SMC2011), Anchorage, Alaska, pp. 965-970, 2011. [23] M.T. Lopes, L.L. Gioppo, C.N. Silla Jr., A.L. Koerich and C.A.A. Kaestner, “Automatic Bird Species Identification for Large Number of Species”, in Proceedings of the 2011 IEEE International Symposium on Multimedia (ISM2011), Dana Point, California, pp. 117-122, 2011. [24] Marsyas Web Site, available in , accessed in June 24th., 2010. [25] C.A.F. Mendonca , “The management of the aerial danger in Brazilian airports” (in Portuguese). SIPAER Report, Vol. 1, No. 1, November 2009. [26] T. M. Mitchell, Machine Learning, McGraw-Hill, 1997. [27] L. Neal, F. Briggs, R. Raich and X.Z. Fern, “Time-Frequency Segmentation of Bird Song in Noisy Acoustic Environments”, Proceedings of the 36 th. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, pp. 2012–2015, May 2011. [28] C. N. Silla Jr., C.A.A. Kaestner and A. L. Koerich, “Automatic Music Genre Classification using Ensemble of Classifiers”, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC’07), Montreal, Canada, pp. 1687–1692, 2007. [29] C. N. Silla Jr., A. L. Koerich and C. A. A. Kaestner, “A Machine Learning Approach to Automatic Music Genre Classification”, Journal of the Brazilian Computer Society, Vol. 14(3), pp. 7–18, 2008. [30] C. N. Silla Jr. and C. A. A. Kaestner, “Hierarchical Classification of Bird Species using their Audio Recorded Songs”, in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC’13), Manchester, UK, pp. 1895-1900, 2013. [31] P. Somervuo, A. H¨arm¨a and S. Fagerlund, “Parametric Representations of Bird Sounds for Automatic Species Recognition”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14, No. 6, pp. 2252– 2263, November 2006. [32] Sound Ruler Web Site, available in accessed in June 24th, 2010. [33] S.S. Stevens, J. Volkmann and E.B. Newman, “A scale for the measurement of the psychological magnitude pitch”. Journal of the Acoustic Society of America, Vol. 8, No. 3, pp. 185–190, 1937. [34] G. Tzanetakis and P. Cook, “Musical Genre Classification of Audio Signals”, IEEE Transactions on Speech and Audio Processing, Vol. 10, pp. 293–302, 2002. [35] E. Vilches, I.A. Escolbar, E.E. Vallejo and C.E. Taylor, “Data Mining Applied to Acoustic Bird Species Recognition”, Proceedings of the 18th IEEE International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, pp. 400–403, 2006. [36] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, 2005. [37] Xeno-Canto Web Site, available in accessed in June 26th, 2010.

R EFERENCES [1] R. Bardeli, D. Wolff, F. Kurth, M. Koch, K-H. Tauchert and K-H. Frommolt, “Detecting Bird Songs in a Complex Acoustic Environment and Application to Bioacoustic Monitoring”, Pattern Recognition Letters, Vol. 31, No. 12, pp.1524–1534, 2010. [2] J. Benesty, M.M. Sondhi, Y, Huang (eds.): Springer Handbook of Speech Processing, Springer, Berlin, 2008. [3] T.S. Brandes, “Automated Sound Recording and Analysis Techniques for Bird Surveys and Conservation”, Bird Conservation International, Vol. 18, pp. 163–173, 2008. [4] F. Briggs, R. Raich and X.Z. Fern, “Audio Classification of Bird Species: a Statistical Manifold Approach”, Proceedings of the 9th International Conference on Data Mining (ICDM’2009), Miami, USA, pp. 51–60, December 2009. [5] Canadian Council on Animal Care Web Site, available in , accessed in June 24th., 2010. [6] J. Cai, D. Ee, B. Pham, P. Roe and J. Zhang, “Sensor Network for the Monitoring of Ecosystem: Bird Species Recognition”, Proceedings of the 3rd IEEE International Conference on Intelligent Sensors, Sensor Networks and Information (ISSNIP’07), Melbourne, Australia, pp.293– 298, December 2007. [7] C.K. Catchpole and P.J.B. Slater, Bird Songs: Biological Themes and Variations, Cambridge University Press, 1995. [8] CENIPA (Brazilian Center for Aeronautical Accident Investigation and Prevention), “The danger of the fauna for Brazilian aviation” (in Portuguese), CENIPA Report 118. , accessed in October 2010. [9] C-H. Chou, C-H. Lee and H-W Ni, “Bird Species Recognition by Comparing the HMMs of Syllabes”, Proceedings of The 2nd International Conference on Innovative Computing, Information and Control (ICICIC’07), Kumamoto City, Japan, pp. 143–147, September 2007. [10] C-H. Chou, P-H. Liu and B. Cai, “On the Studies of Syllable Segmentation and Improving MFCCs for Automatic Birdsong Recognition”, Proceedings of the Asian Pacific Services Computing Conference (APSCC’08), Yilan, Taiwan, pp. 745–750, December 2008. [11] C-H. Chou and P-H. Liu, “Bird Species Recognition by Wavelet Transformation of a Section of Birdsong”, Proceedings of the Symposia and Workshop on Ubiquitous, Autonomic and Trusted Computation (UICATC’09), Brisbane, Australia, pp. 189–193, July 2009. [12] R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classification, John Wiley and Sons, 2nd. Ed., 2001. [13] S. Fagerlund, “Bird Species Recognition Using Support Vector Machines”, EUSASIP Journal on Advances in Signal Processing, Vol. 2007, Article ID 38637, pp. 1–8, 2007. [14] F. Gouyon, P. Herreraand P. Cano, “Pulse-dependent analysis of percussive music”, Proceedings of the 22th International AES Conference on Virtual, Synthetic and Entertainment Audio, Espoo, Finland, 2002.

228

Automatic Segmentation of Audio Signals for Bird ...

tions, such as to monitor the quality of the environment and to .... Ruler audio processing tool [32]. The authors also .... sounds used in alert situations [7].

253KB Sizes 2 Downloads 213 Views

Recommend Documents

AUTOMATIC TRAINING SET SEGMENTATION FOR ...
els, we cluster the training data into datasets containing utterances whose acoustics are most ... proach to speech recognition is its inability to model long-term sta- ..... cember 2001. [5] M. Ostendorf, V. Digalakis, and O. Kimball, “From HMMs.

Perceptual coding of audio signals
Nov 10, 1994 - “Digital audio tape for data storage”, IEEE Spectrum, Oct. 1989, pp. 34—38, E. .... analytical and empirical phenonomena and techniques, a central features of ..... number of big spectral values (bigvalues) number of pairs of ...

Perceptual coding of audio signals
Nov 10, 1994 - for understanding the FORTRAN processing as described herein is FX/FORTRAN Programmer's Handbook, Alliant. Computer Systems Corp., July 1988. LikeWise, general purpose computers like those from Alliant Computer Sys tems Corp. can be us

Automatic segmentation of the thoracic organs for ...
and another with the lungs, the heart and the rest soft tissues is achieved by ..... These scans can detect smaller lung tumors than a conventional CT scan and the ex- amination takes only a few minutes. • With bronchoscopy, a careful examination o

Automatic Bird Species Identification for Large Number of Species
is important to obtain reliable information about the popu- lation of wild animals. .... In our digital era, the analog signal is sampled, several times per second, and ...

Modeling Perceptual Similarity of Audio Signals for ...
Northwestern University, Evanston, IL, USA 60201, USA pardo@northwestern. .... The right panel of Figure 1 shows the standard deviation of participant sim- ... are only loosely correlated to human similarity assessments in our dataset. One.

Feature Set Comparison for Automatic Bird Species ...
There are many challenges to develop a system capable of monitoring birds ... different types of features and classifiers for the ABSI task; verifying for this task if ...

LNCS 6361 - Automatic Segmentation and ... - Springer Link
School of Eng. and Computer Science, Hebrew University of Jerusalem, Israel. 2 ... OPG boundary surface distance error of 0.73mm and mean volume over- ... components classification methods are based on learning the grey-level range.

Automatic segmentation of kidneys from non-contrast ...
We evaluated the accuracy of our algorithm on five non-contrast CTC datasets .... f q t qp p. +. = → min, min. (3) t qp m → is the belief message that point p ...

Automatic segmentation of the clinical target volume and ... - AAPM
Oct 28, 2017 - Key words: automatic segmentation, clinical target volume, deep dilated convolutional ... 2017 American Association of Physicists in Medicine.

automatic segmentation of optic pathway gliomas in mri
L. Weizman, L. Joskowicz. School of Eng. and Computer Science ... Most studies focus on the auto- ... tic tissue model generated from training datasets. The ini-.

ICA Based Automatic Segmentation of Dynamic H2 O ...
stress studies obtained with these methods were compared to the values from the .... To apply the ICA model to cardiac PET images, we first pre-processed and ...

Restoration of Howling Corrupted Audio Signals ...
Dec 8, 2008 - In any audio system involving simultaneous sound recording and reproduc- tion, the coupling between the loudspeakers and the microphones can lead to instabilities which result in annoying howling sound. This problem is also known as fee

Cheap 1.8L Led Automatic Cat Dog Bird Kitten Water Drinking ...
Cheap 1.8L Led Automatic Cat Dog Bird Kitten Water Dri ... Drink Dish Filter Free Shipping & Wholesale Price.pdf. Cheap 1.8L Led Automatic Cat Dog Bird ...

Multi-organ automatic segmentation in 4D contrast ...
promise as a computer-aided radiology tool for multi-organ and multi-disease ... the same range of intensities), and r=1..3 for pre-contrast, arterial and venous ... and visualization of the segmentation was generated using. VolView (Kitware, Inc.).

Automatic Skin Lesion Segmentation Via Iterative Stochastic ieee.pdf
Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Automatic Ski ... stic ieee.pdf. Automatic Ski ... stic ieee.pdf. Open. Extract. Open with. Si

Bird Checklist for bird blind-taxonomic 2016.pdf
Cerulean Warabler m m American Goldfinch c c. p s f w p s f w. FELTS AUDUBON PRESERVE BIRD SPECIES. p s f w. Whoops! There was a problem loading ...

AUTOMATIC DISCOVERY AND OPTIMIZATION OF PARTS FOR ...
Each part filter wj models a 6×6 grid of HOG features, so wj and ψ(x, zj) are both .... but they seem to be tuned specifically to shelves in pantry, store, and book-shelves respectively. .... off-the-shelf: an astounding baseline for recognition.

Endogenous Market Segmentation for Lemons
Sellers may send some messages to buyers (explicit). Or, they .... Stango (2004) for an online used computer market. 5Although many ... economy where the informational free-riding problem is so severe that a socially efficient technology .... Sellers

Endogenous Market Segmentation for Lemons
Information asymmetry between sellers and buyers often prevents socially desirable ..... where the informational free-riding problem is so severe that a socially efficient technology cannot ... a unit of the good, whose quality is either high or low.

ACTIVITY-BASED TEMPORAL SEGMENTATION FOR VIDEOS ... - Irisa
The typical structure for content-based video analysis re- ... tion method based on the definition of scenarios and relying ... defined by {(ut,k,vt,k)}t∈[1;nk] with:.