Automatic Segmentation of Audio Signals for Bird ...

Viewer
Transcript

2014 IEEE International Symposium on Multimedia

Automatic Segmentation of Audio Signals for Bird Species Identiﬁcation Thiago L. F. Evangelista1 , Thales M. Priolli1 , Carlos N. Silla Jr.1 , Bruno A. Angelico1 , Celso A. A. Kaestner2 Federal University of Technology – Paran´a 1 Corn´elio Proc´opio Campus: Av. Alberto Carazzai, 1640 – 86.300–000 – Corn´elio Proc´opio, Paran´a, Brazil 2 Curitiba Campus: Av. 7 de Setembro, 3165 – 80.230–901 – Curitiba, Paran´a, Brazil thi [email protected], thales [email protected], {carlosjunior, bangelico, celsokaestner}@utfpr.edu.br

Abstract—The identiﬁcation of bird species from their audio recorded songs are nowadays used in several important applications, such as to monitor the quality of the environment and to prevent bird-plane collisions near airports. The complete identiﬁcation cycle involves the use of: (a) recording devices to acquire the songs, (b) audio processing techniques to remove the noise and to select the most representative elements of the signal, (c) feature extraction procedures to obtain relevant characteristics, and (d) decision procedures to make the identiﬁcation. The decision procedures can be obtained by Machine Learning (ML) algorithms, considering the problem in a standard classiﬁcation scenario. One key element is this cycle is the selection of the most relevant segments of the audio for identiﬁcation purposes. In this paper we show that the use of short audio segments with high amplitude - called pulses in our work - outperforms the use of the complete audio records in the species identiﬁcation task. We also show how these pulses can be automatically obtained, based on measurements performed directly on the audio signal. The employed classiﬁers are trained using a previously labeled database of bird songs. We use a database that contains bird song recordings from 75 species which appear in the Southern Atlantic Coast of South America. Obtained results show that the use of automatically obtained pulses and a SVM classiﬁer produce the best results; all the necessary procedures can be installed in a dedicated hardware, allowing the construction of a speciﬁc bird identiﬁcation device.

a vulture with a commercial plane is equivalent to an impact of seven tones [25]. There is also a ﬁnancial impact: the losses of the airline companies due to this type of accident is estimated as higher than US$ 3 millions per year [8]. Having more information, authorities can act to eliminate the problem. The above reasons justify the construction of a speciﬁc equipment to deal with bird species identiﬁcation and counting. Using the recordings of songs produced by birds, the identiﬁcation task can be done using Signal Processing (SP) techniques [2], [20] and Machine Learning (ML) algorithms [12], [26], [36]. An automatic bird species identiﬁcation system requires: 1) a recording device to obtain the audio signal of the bird song in the ﬁeld; 2) the use of audio processing techniques to improve the signal quality, since the recordings occur in noisy situations (gloves and forests for environmental monitoring, or in urban areas in the case of bird control near airports); 3) the selection of the most representative audio segments for the identiﬁcation task; 4) the extraction of a set of discriminative features from the selected audio segments; 5) the use of obtained feature set as input of decision procedure to predict the species of the bird that originates the songs. In the supervised learning framework, the decision procedure employed in the last step is obtained as output of some ML classiﬁcation algorithm. In general, ML algorithms require a database of recorded songs of the bird species previously labeled by an expert in order to train the classiﬁer [26]. An adequate preprocessing, the selection of the most adequate segments of the audio, the choice of the adequate feature set and classiﬁcation algorithm are the key points to obtain good classiﬁcation performance [36]. We focus this work mainly on the segmentation step, that is, on the selection of the most representative parts of the audio signal. In the following, Section II summarizes similar research works and brieﬂy formalize the automatic bird species identiﬁcation problem; Section III describes the initial signal processing, pulses detection, the feature extraction procedures and indicate the classiﬁcation algorithm; Section IV outlines the database used in the experiments, and presents the experimental results obtained in three identiﬁcation experiments;

Keywords-Signal Processing; Pattern Recognition; Machine Learning; Bird Species Identiﬁcation.

I. I NTRODUCTION The interaction between humans and birds occurs in several scenarios. Birds are numerous and easier to detect than other animal species: one can record their songs, which is a indirect and clever way to do their identiﬁcation [5]. One important reason to monitor birds is the control and monitoring of the environment. Some bird species are particularly sensible to water and air pollution [1], [3]. So, detecting the number of birds of these species can be used to detect and prevent environmental problems. Birds are also responsible by several accidents involving planes. The Brazilian Center for Aeronautical Accident Investigation and Prevention (CENIPA) [8] reported that in two years a total of 1,321 aerial accidents occurs, in the Brazilian airspace, involving the collision of birds and airplanes. These events are potentially dangerous: for instance, the collision of 978-1-4799-4311-1/14 $31.00 © 2014 IEEE 978-0-7695-5437-2/14 DOI 10.1109/ISM.2014.46

223

ﬁnally, Section V presents our conclusions and indicates future research directions.

from 420 bird species. Their best obtained accuracy result was 73.19 %, using a backpropagation neural network. Lee, Han and Chuang [21] try to split the original audio signals into syllable segments, considered as the basic recognition unit. Then MFCC coefﬁcients are computed, and the Gaussian Mixture Model and Vector Quantization were employed to ﬁnd the most appropriate number of components and clusters for each species. The best obtained result was an accuracy of 84 % for a database composed of 28 bird species. Briggs et all. [4] employ a probabilistic framework to compute the audio features and a Bayes classiﬁer that minimizes risk; the authors also show that this approach obtains results similar to a nearest-neighbor classiﬁer where the KullbackLeibler divergence distance is used to compare histograms of features; the proposed classiﬁers achieve over 90 % accuracy on a dataset that contains 6 species of birds. Chou and Liu [11] use wavelets transformations to identify sections in the bird songs; for each section the ﬁrst ﬁve MFCC coefﬁcients are computed, and a Neural Network classiﬁer is used in the species identiﬁcation; they achieve an accuracy of 73.41 % in a database composed by 420 bird species. Graciarena et al. [16] use an unsupervised approach to obtain approximate note models from acoustic features; these models are used to create a recognition system that employs a n-gram statistical model developed for speaker recognition applications. In a database of 9 bird species they obtain an accuracy of 83.5 %. Neal et al. [27] employ a time-frequency audio segmentation performed by a Random Forest classiﬁer to extract syllables of the bird songs; their method isolates 93.6 % of the acoustic energy of bird song, with a false positive rate of 8.6 %, using a test dataset of 625 ﬁeld audio records. Lopes at al. [22] use three feature sets and several ML algorithms in the bird species identiﬁcation problem; their goal was to evaluate the various combinations of feature sets and classiﬁers; the best result was obtained using short intervals of the audio records, the MARSYAS feature set [24] and a Multi Layer Perceptron. In a database composed of 101 audio records of 3 species - similar to the one used in [35], they obtain an F-measure (harmonic mean between precision and recall) of 99.7 %. The paper of Lopes et al. [23] uses a larger database that contains 1,619 bird songs recordings of 73 bird species recorded in a speciﬁc geographic region; the main objective of this work was to evaluate the performance that can be achieved when dealing with a great number of bird species. The paper shows that the F-measure drops from 95,1 % to 78,2 % when passing from 3 to 20 species, in the best results obtained by a SVM classiﬁer. Kiapuchinski et al. [18] propose a hardware device to perform the automatic identiﬁcation of birds; the proposed equipment includes audio ﬁltering, using a spectral noise gate algorithm, that shows to improve the quality of the extracted features. We can identify some common points in the previously research works: (1) the need of strong audio preprocessing

II. B IRD S PECIES I DENTIFICATION USING AUDIO S IGNALS A. Similar Research Works As mentioned earlier, it is possible to identify a bird species from its recorded song. In fact, this problem has been studied by several researchers. Kwan et al. [19] propose a bird identiﬁcation device to detect dangerous birds near airports. The authors employ a Hidden Markov Model and the Gaussian Mixture Model to perform the classiﬁcation, achieving poor results due to the low signal-to-noise relation in the airport environment. Somervuo, H¨arm¨a and Fagerlund [31] apply SP techniques to the audio input signal. Using a sinusoidal modeling and MelFrequency Cepstral Coefﬁcients (MFCC), in their best result they obtain an accuracy of 71.3 % when detecting the species of 14 common North-European birds. Vilches et al. [35] employ data mining techniques to the problem: they use the well-known ID3, J4.8, and Na¨ıve Bayes classiﬁcation algorithms. Their best result – an accuracy of 98.39 % in a database with 154 songs for 3 bird species – was obtained using the J4.8 algorithm and a feature set composed by the all the audio characteristics generated by the Sound Ruler audio processing tool [32]. The authors also conclude that the identiﬁcation of distinctive features is the crucial part of this application. Chou, Lee and Ni [9] propose the segmentation of the bird songs into syllables, obtained from the frequency spectrum of the signal. Syllables were obtained by the fuzzy C-means clustering method; each bird song is then modeled as a sequence of syllables using a Hidden Markov Model. In a database of 420 bird species they obtain a recognition rate of 78 %. Fagerlund [13] employ a series of Support Vector Machines (SVM) classiﬁers to perform bird identiﬁcation. The classiﬁers were organized in a tree-like structure, where each node is used to separate two classes. Two feature sets were tested: the MFCC coefﬁcients and a set of low level signal parameters. The best obtained accuracy result was 98 % in a database with 8 bird species. Cai et al. [6] use several feature sets and a neural network framework to investigate the problem. The proposed neural network architecture considers the dynamic nature of the audio signal; noise reduction algorithms were also applied to the signals. Their best obtained accuracy result was 85.6 % in a database which contains songs from 14 bird species, using a feature set composed by the MFCC coefﬁcients. Chou, Liu and Cai [10] propose the use of an enhanced syllable segmentation method, based on Rabiner and Sambur endpoint detection method. This method is combined with a MFCC-based feature vector to deal simultaneously with syllable detection and birdsong recognition. They use songs from a commercial CD with bird songs recorded in the nature

224

from an recorded signal S. This is a particularly important point in this problem, since the audio records of bird songs usually contains long silent periods and are made in a noisy environment.

techniques to improve signal quality; (2) the usage of signal segmentation, searching to ﬁnd the most representative parts (syllables, segments or pulses) in the bird songs; (3) the application of several algorithms to treat the identiﬁcation problem in the classical ML classiﬁcation scenario. In this paper we focus on the automatic selection of the most relevant audio parts for classiﬁcation.

A. Segmentation The most representative segments for the identiﬁcation task – pulses in our work – are obtained from an automatic segmentation procedure that includes several steps. Initially a Fast Fourier Transform (FFT) is applied to the input signal S in order to obtain the representation in the frequency domain. Then the energy of the signal and its spectral centroid are computed. Finally median ﬁlters and a windowing scheme to smooth the end parts of the signals are employed. The complete decision procedure that detects the bird song pulses can be outlined as follows: 1) two data sequences are computed from the signal spectrum, one for signal energy and the other for the spectrum centroid; 2) the median ﬁlter is applied to the corresponding histograms of these sequences; 3) the local maximum of the histograms is computed; 4) from the ﬁrst two maximum found (X1 and X2 ) a segment limit is computed by

B. Bird Species Identiﬁcation as a Classiﬁcation Problem The automatic bird species identiﬁcation using audio signals can be deﬁned as the problem of ﬁnding the species of a speciﬁc bird from its recorded sounds. Bird sounds can be divided in songs and calls; songs are more melodious and are related to mating, whereas calls are short and transient sounds used in alert situations [7]. Bird songs are considered by experts as ideal for species identiﬁcation [7], [35] and are considered here. In a digital recording device the sound is stored as a sequence S = of numeric values in a convenient scale, obtained by a sampling procedure; this sequence is employed in sound reproduction and for analytical processes [17]. The complete S signal can be segmented and preprocessed in order to select its most representative parts RS . Let us assume that this selection is accomplished by a function σ, that is RS = σ(S). Hence several features can be extracted from RS : if χj is an extraction function and Xj is the feature domain, ¯ = it is possible to obtain the feature vector X from RS , where each feature xj = χj (S). Identiﬁcation relies on previous known audio records. We assume that a database of bird songs of a ﬁnite set B of species – manually labeled by experts – is available. If this scenario occurs then it is easy to put the focused problem as a standard ML classiﬁcation task [12]: given a set of evidences ¯ obtained from some parts RS of the input signal S, it is X necessary to ﬁnd the class ˆb ∈ B that most likely indicates the species of the bird that originates S. Using the Probability Theory the problem can be outlined as follows: (1) from the training database it is possible to predict how the features occur in each class, that is, the conditional ¯ probabilities (estimated by frequencies) P (X|b) for feature vectors X and b ∈ B; (2) the a priori probabilities of the bird species P (b) for b ∈ B can estimated by the class frequencies; (3) to classify a new audio record S, the segmentation and extraction functions σ and χj are employed to compute the ¯ (4) using the Bayes’ Theorem, we can vector of evidences X; ¯ and assign the obtain the a posteriori probabilities P (b|X), most probable class to input by ¯ ˆb = arg max P (X|b).P (b) ¯ b∈B P (X)

L=

J · X1 + X2 J +1

(2)

where J is a predeﬁned windowing parameter. The procedure is executed for each window in the two sequences, in order to produce the thresholds L1 and L2 for each signal pulse. In Figure 1 we illustrate the process of obtaining pulses from a song of the Dusty Antbird (CercomacraTyrannina). We emphasize that the audio segmentation of the bird songs in pulses by the proposed SP procedure is an original contribution of this paper. B. Feature Set In order to obtain the feature vectors from the selected pulses we employ the MARSYAS framework [24], [34], a feature extractor that has been used successfully in several audio applications, specially in music applications [14], [15], [22], [23], [28], [29], [34]. As shown by Lopes et al. [22] the MARSYAS feature set has the best performance in the bird identiﬁcation task when compared to other feature sets. The feature vector produced by MARSYAS is 64dimensional and includes the twelve initial Mel-Frequency Cepstral coefﬁcients (MFCC), and the means and variances of timbral features calculated in the pulses. The Mel scale is an acoustically deﬁned scale created from a study by Stevens et al. [33] that relates physical frequencies to the frequencies that are perceived by the human ear; this scale has been used in several applications involving speech and music. The employed timbral features include the centroid of the frequency spectrum, the spectral ﬂux, that estimates the variability of magnitudes in this spectrum, and the rolloff, which is related

(1)

III. S EGMENTATION , F EATURE E XTRACTION AND C LASSIFICATION P ROCEDURES As previously explained, to improve the classiﬁcation performance it is important to select the most representative parts

225

Brazilian Coast. The database include songs of bird species that area common to two main ecosystems, the Atlantic forest which occurs in the coastal region of Brazil and the Arauc´aria forest, composed by huge trees that are characteristic from the high-lying plateau of South America. This database was ﬁrst employed by Lopes et al. [23] and includes 1,619 bird songs from 76 bird species. All bird songs were downloaded from the Xeno-Canto website [37], that contains audio records obtained directly in real environments, and therefore, with no previous preprocessing. The complete methodology employed to create this database is also described in the above referenced paper. B. Experiments In order to evaluate the proposed segmentation, several experiments were conducted, according to three scenarios: 1) the ﬁrst one employs the bird songs exactly as recorded in the ﬁeld; 2) the second one uses a manually-made segmentation step; in this case we manually split each audio signal in the database – that corresponds to a recorded bird song of a speciﬁc species – according to shot sound intervals were the signal present higher amplitudes – called pulses as previously explained; and ﬁnally 3) the third scenario also employ pulses, but in this case they are detected automatically using decision procedures that employ signal processing techniques. The Southern Atlantic Brazilian Coast Database was employed in all the experiments. As explained it contains 1,619 bird songs from 76 bird species; in the second scenario 10,206 pulses were obtained in the manual segmentation of the database; using the automatic procedure, in the third scenario, 20,203 pulses were obtained. As indicated, in the experiments we use the Support Vector Machines paradigm, using the Platt’s Sequential Minimization Optimization (SMO) implementation and the Polynomial kernel. Experiments were executed using the WEKA platform and a 10-fold cross-validation procedure. The obtained results are as follows: • Using the complete audio signals as recorded, the obtained overall accuracy is 29.77 %. This result corresponds to the mean of the accuracies for all classes; we observe that the employed database contains bird songs from 76 species. Using the simplifying hypothesis that the classes contain the same number of audio records, the probability of selecting a speciﬁc class at random is 1/76, or only 1.32%. The obtained results also conﬁrm the conclusions presented in [23]: the segmentation of the audio signals in their most representative parts is absolutely necessary to improve classiﬁcation. We recall that in this paper the authors evaluate the effect of the number of classes in the classiﬁcation performance, and the F-measure drops from 95,1 % to 78,2 % when passing from 3 to 20 bird species. • Using the manual segmentation of the original bird songs in pulses, which are selected as signal regions with

Fig. 1. A bird song audio of the Dusty Antbird and its segmentation in pulses

to the concentration of the magnitudes in frequency ranges; ﬁnally, the signal zero crossings are computed in the timedomain. C. Classiﬁcation algorithm In the papers of Lopes et al. [22], [23] several classiﬁers based on different paradigms were employed to select the most adequate classiﬁer for the bird species identiﬁcation problem. The tested classiﬁers were the classical probabilistic Na¨ıve Bayes, the instance-based k nearest neighbors (kN N ) with k = 3, the decision tree classiﬁer J4.8, a multi-layer perceptron neural network trained with the back-propagation momentum algorithm and the support vector machine (SVM) classiﬁer, with polynomial and Pearson kernels. The SVM classiﬁcation algorithm achieves the best results, obtaining a F-measure of 99.4 % in a database with 3 species [22]. Lopes et al. [23] also obtain a their best result in the same database that is employed in this work using SVM: a Fmeasure of 89.7 % for 8 bird species. So, the experiments conducted in this work employ the SVM algorithm (SMO implementation, using the Pearson kernel). IV. B IRD S PECIES I DENTIFICATION E XPERIMENTS A. Database In this paper experiments we employ a database composed of songs from bird species that appear in the Southern Atlantic

226

TABLE I ACCURACY FOR THE DIFFERENT SEGMENTATION METHODS FOR THE INDIVIDUAL CLASSES ( BIRD SPECIES ). Class (Bird Species) Aegolius harrisii Amazilia versicolor Amazona Pretrei Anthus lutescens Attila rufus Automolus leucophthalmus Basileuterus leucoblepharus Batara cinerea Brotogeris tirica Campephilus robustus Camptostoma obsoletum Campylorhamphus falcularius Certhiaxis cinnamomeus Chiroxiphia caudata Chlorophanes spiza Cichlocolaptes leucophrus Clibanornis dendrocolaptoides Cnemotriccus fuscatus Colaptes campestris Colonia colonus Cranioleuca obsoleta Crypturellus noctivagus Culicivora caudacuta Cyanocorax caeruleus Drymophila malura Dysithamnus mentalis Emberizoides ypiranganus Gnorimopsar chopi Hemitriccus kaempferi Hemitriccus orbitatus Hypoedaleus guttatus Lathrotriccus euleri Leucochloris albicollis Leucopternis polionotus Mackenziaena leachii Malacoptila striata Mimus saturninus Muscipipra vetula

Manual Seg 88.80% 74.80% 66.70% 39.40% 76.90% 67.00% 61.80% 68.60% 55.30% 27.90% 45.10% 41.60% 48.10% 66.80% 50.30% 37.50% 51.40% 28.40% 54.80% 33.60% 45.90% 30.80% 06.70% 51.80% 54.20% 48.20% 24.00% 42.20% 0% 38.70% 62.80% 71.60% 54.70% 35.30% 38.10% 05.60% 62.10% 41.90%

Automatic Seg 80.70% 24.80% 58.70% 23.20% 45.60% 63.60% 71.10% 47.00% 46.90% 03.10% 39.40% 31.40% 37.40% 69.20% 48.30% 38.30% 43.90% 17.30% 43.10% 47.10% 32.80% 34.60% 19.70% 57.80% 52.00% 50.50% 28.40% 39.60% 0% 33.10% 54.80% 63.20% 31.60% 73.90% 50.00% 0% 50.50% 66.10%

Class (Bird Species) Myiobius barbatus Myiodynastes maculatus Myiophobus fasciatus Myrmeciza squamosa Orthogonys chloricterus Philydor atricapillus Phleocryptes melanops Phyllomyias griseocapilla Phylloscartes kronei Picumnus temminckii Piprites chloris Piprites pileata Polioptila dumicola Poospiza nigrorufa Procnias nudicollis Pseudoleistes guirahuro Pyriglena leucoptera Ramphastos dicolorus Ramphocelus bresilius Ramphodon naevius Saltator similis Schiffornis virescens Scytalopus iraiensis Sittasomus griseicapillus Streptoprocne biscutata Stymphalornis acutirostris Synallaxis spixi Tangara desmaresti Tangara peruviana Thamnophilus ruﬁcapillus Theristicus caudatus Thraupis palmarum Thryothorus longirostris Trichothraupis melanops Trogon surrucura Vanellus chilensis Xenops minutus Xiphorhynchus fuscus

Manual Seg 36.60% 55.50% 100% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 65.90% 0% 37.30% 0% 78.60% 75.00% 60.40% 22.20% 71.80% 75.50% 78.30% 62.60% 22.60% 75.80% 70.20% 57.00% 56.70%

Automatic Seg 08.10% 37.60% 43.20% 63.70% 25.50% 43.60% 52.40% 29.10% 19.80% 11.80% 52.00% 32.80% 33.60% 75.00% 84.00% 51.70% 44.70% 78.40% 45.80% 40.00% 41.40% 74.10% 93.00% 63.50% 30.20% 82.50% 89.90% 79.80% 80.60% 74.80% 89.10% 58.80% 73.80% 64.60% 73.10% 82.30% 71.30% 0%

mentation approach has improved the classiﬁcation accuracy of 43 out of 76 bird species, including 20 bird species that the manual segmentation approach failed to classify (accuracy = 0% in this cases). In these cases possibly the visual process employed to select the pulses in manual segmentation was unable to detect the correct audio segments, whereas the automatic segmentation deﬁned in section III-A performs this task adequately.

high signal amplitudes, the obtained overall accuracy was 52.78%; • Finally, when using the automatic segmentation is depicted in section III, we obtain an overall accuracy of 59.75%. The comparison of the two last experiments is discussed in the following. As shown, the use of automatically detected pulses produce the best overall results. We argue that pulses are exactly the signal regions that better characterize the bird vocalization [22]. The automatic computation of the pulses, based on a decision procedure that acts on the frequency domain and are based on the energy and the centroid of the signal histogram, outperforms the manual segmentation procedure. Table I presents the accuracies obtained in the two last experiments, that is, using manual segmentation and using automatic segmentation, for each individual bird species in the database. The analysis of Table I shows that although the automatic segmentation approach is the best overall approach, there are some bird species where the use of the manual segmentation approach provides better results. However, the automatic seg-

V. C ONCLUSIONS AND F UTURE W ORK This paper deals with the segmentation step of the audio signal in the automatic bird species identiﬁcation problem. We present experiments conducted in a database composed by bird songs from 76 species which are present in the Southern Atlantic Coast of South America. Three scenarios were considered: the ﬁrst one is composed of bird songs as recorded in the ﬁeld; the second one uses the manual segmentation of the audio signal in order to select only short regions where the audio signal has higher amplitudes, called pulses; the third scenario considers the automatic segmentation of the audio in pulses by on a de-

227

cision procedure which is based on classical signal processing measures computed on the bird song. The experimental results show that the automatic segmentation of the bird songs is very important to improve the classiﬁcation performance. Our explanation to this fact is that the acoustic information which appears in pulses contains less environmental noise and incorporates the most important features of the corresponding bird song. The deﬁned decision procedure employed to obtain pulses from an audio signal uses simple signal processing techniques and can be easily incorporated in small computational devices. We must now incorporate the described pulse selection technique and the bird species classiﬁer in a dedicated hardware device designed speciﬁcally for bird species identiﬁcation [18]. As future work, we plan to apply the automatic segmentation procedure in the hierarchical classiﬁcation of bird species, following the framework already deﬁned in [30].

[15] F. Gouyon, S. Dixon, E. Pampalk and G. Widmer, “Evaluating rhytmic descriptions for music genre classiﬁcation”, Proceedings of the 25th International AES Conference on Virtual, Synthetic and Entertainment Audio, London, UK, 2004. [16] M. Guaciarena, M. Delplanche, E. Shriberg and A. Stolcke, “Bird Species Recognition Combining Acoustic and Sequence Modeling”, Proceedings of the 36 th. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, pp. 341–344, May 2011. [17] S. Hacker, MP3: The Deﬁnitive Guide, O’Reilly Publishers, 2000. [18] D.M. Kiapuchinski, C.R.E. Lima, and C.A.A. Kaestner “Spectral Noise Gate Technique Applied to Birdsong Preprocessing on Embedded Unit”, Proceedings of the 2011 IEEE International Symposium on Multimedia (ISM2012), Anaheim, California, pp 24–27, 2012. [19] C. Kwan, X. Zhao, Z. Ren, R. Xu, V. Stanford, C. Rochet, J. Aube and K.C. Ho, “Bird Classiﬁcation Algorithms: Theory and Experimental Results”, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04), Montreal, Canada, Vol. 5, pp 289–292, May 2004. [20] B.P. Lathi, Signal Processing and Linear Systems, 2nd. ed. Oxford University Press, 2004. [21] C-H. Lee, C-C Han and C-C. Chuang, “Automatic Classiﬁcation of Bird Species from their Sounds using Two-Dimensional Cepstral Coefﬁcients”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 16, No. 8, pp. 1541–1550, 2008. [22] M.T. Lopes, C.N. Silla Jr., A.L. Koerich and C.A.A. Kaestner, “Feature Set Comparison for Automatic Bird Species Identiﬁcation”, in Proceedings of the 2011 IEEE International Conference on Systems, Man and Cybernetics (SMC2011), Anchorage, Alaska, pp. 965-970, 2011. [23] M.T. Lopes, L.L. Gioppo, C.N. Silla Jr., A.L. Koerich and C.A.A. Kaestner, “Automatic Bird Species Identiﬁcation for Large Number of Species”, in Proceedings of the 2011 IEEE International Symposium on Multimedia (ISM2011), Dana Point, California, pp. 117-122, 2011. [24] Marsyas Web Site, available in , accessed in June 24th., 2010. [25] C.A.F. Mendonca , “The management of the aerial danger in Brazilian airports” (in Portuguese). SIPAER Report, Vol. 1, No. 1, November 2009. [26] T. M. Mitchell, Machine Learning, McGraw-Hill, 1997. [27] L. Neal, F. Briggs, R. Raich and X.Z. Fern, “Time-Frequency Segmentation of Bird Song in Noisy Acoustic Environments”, Proceedings of the 36 th. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, pp. 2012–2015, May 2011. [28] C. N. Silla Jr., C.A.A. Kaestner and A. L. Koerich, “Automatic Music Genre Classiﬁcation using Ensemble of Classiﬁers”, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC’07), Montreal, Canada, pp. 1687–1692, 2007. [29] C. N. Silla Jr., A. L. Koerich and C. A. A. Kaestner, “A Machine Learning Approach to Automatic Music Genre Classiﬁcation”, Journal of the Brazilian Computer Society, Vol. 14(3), pp. 7–18, 2008. [30] C. N. Silla Jr. and C. A. A. Kaestner, “Hierarchical Classiﬁcation of Bird Species using their Audio Recorded Songs”, in Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC’13), Manchester, UK, pp. 1895-1900, 2013. [31] P. Somervuo, A. H¨arm¨a and S. Fagerlund, “Parametric Representations of Bird Sounds for Automatic Species Recognition”, IEEE Transactions on Audio, Speech, and Language Processing, Vol. 14, No. 6, pp. 2252– 2263, November 2006. [32] Sound Ruler Web Site, available in accessed in June 24th, 2010. [33] S.S. Stevens, J. Volkmann and E.B. Newman, “A scale for the measurement of the psychological magnitude pitch”. Journal of the Acoustic Society of America, Vol. 8, No. 3, pp. 185–190, 1937. [34] G. Tzanetakis and P. Cook, “Musical Genre Classiﬁcation of Audio Signals”, IEEE Transactions on Speech and Audio Processing, Vol. 10, pp. 293–302, 2002. [35] E. Vilches, I.A. Escolbar, E.E. Vallejo and C.E. Taylor, “Data Mining Applied to Acoustic Bird Species Recognition”, Proceedings of the 18th IEEE International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, pp. 400–403, 2006. [36] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, 2005. [37] Xeno-Canto Web Site, available in accessed in June 26th, 2010.

R EFERENCES [1] R. Bardeli, D. Wolff, F. Kurth, M. Koch, K-H. Tauchert and K-H. Frommolt, “Detecting Bird Songs in a Complex Acoustic Environment and Application to Bioacoustic Monitoring”, Pattern Recognition Letters, Vol. 31, No. 12, pp.1524–1534, 2010. [2] J. Benesty, M.M. Sondhi, Y, Huang (eds.): Springer Handbook of Speech Processing, Springer, Berlin, 2008. [3] T.S. Brandes, “Automated Sound Recording and Analysis Techniques for Bird Surveys and Conservation”, Bird Conservation International, Vol. 18, pp. 163–173, 2008. [4] F. Briggs, R. Raich and X.Z. Fern, “Audio Classiﬁcation of Bird Species: a Statistical Manifold Approach”, Proceedings of the 9th International Conference on Data Mining (ICDM’2009), Miami, USA, pp. 51–60, December 2009. [5] Canadian Council on Animal Care Web Site, available in , accessed in June 24th., 2010. [6] J. Cai, D. Ee, B. Pham, P. Roe and J. Zhang, “Sensor Network for the Monitoring of Ecosystem: Bird Species Recognition”, Proceedings of the 3rd IEEE International Conference on Intelligent Sensors, Sensor Networks and Information (ISSNIP’07), Melbourne, Australia, pp.293– 298, December 2007. [7] C.K. Catchpole and P.J.B. Slater, Bird Songs: Biological Themes and Variations, Cambridge University Press, 1995. [8] CENIPA (Brazilian Center for Aeronautical Accident Investigation and Prevention), “The danger of the fauna for Brazilian aviation” (in Portuguese), CENIPA Report 118. , accessed in October 2010. [9] C-H. Chou, C-H. Lee and H-W Ni, “Bird Species Recognition by Comparing the HMMs of Syllabes”, Proceedings of The 2nd International Conference on Innovative Computing, Information and Control (ICICIC’07), Kumamoto City, Japan, pp. 143–147, September 2007. [10] C-H. Chou, P-H. Liu and B. Cai, “On the Studies of Syllable Segmentation and Improving MFCCs for Automatic Birdsong Recognition”, Proceedings of the Asian Paciﬁc Services Computing Conference (APSCC’08), Yilan, Taiwan, pp. 745–750, December 2008. [11] C-H. Chou and P-H. Liu, “Bird Species Recognition by Wavelet Transformation of a Section of Birdsong”, Proceedings of the Symposia and Workshop on Ubiquitous, Autonomic and Trusted Computation (UICATC’09), Brisbane, Australia, pp. 189–193, July 2009. [12] R.O. Duda, P.E. Hart and D.G. Stork, Pattern Classiﬁcation, John Wiley and Sons, 2nd. Ed., 2001. [13] S. Fagerlund, “Bird Species Recognition Using Support Vector Machines”, EUSASIP Journal on Advances in Signal Processing, Vol. 2007, Article ID 38637, pp. 1–8, 2007. [14] F. Gouyon, P. Herreraand P. Cano, “Pulse-dependent analysis of percussive music”, Proceedings of the 22th International AES Conference on Virtual, Synthetic and Entertainment Audio, Espoo, Finland, 2002.

228

AUTOMATIC TRAINING SET SEGMENTATION FOR ...