.

1

Statistical Signal Processing Methods for the Identification of Clave Structure in Latin American Rhythms Mehmet Vurkaç, Student Member, IEEE

Abstract—This paper describes the preliminary investigation into the development of a statistical pattern recognition method for identifying the rhythmic structure known as clave. Autocorrelation is discussed as a way of obtaining the musical signature of the clave form. PSD estimation, cross-correlation and beat-tracking are discussed as potential methods of processing and comparing signals in light of the deterministic and statistical estimates performed. Preliminary results are discussed based on biased estimate plots of autocorrelation and PSD. Index Terms—Autocorrelation, Beat-Tracking, Clave, FootTapping, Rhythm, Signal Processing.

C

I. INTRODUCTION

LAVE is the Spanish word for key, as in the legend of a map or

graph. In the context of Latin American music, it is a term of Cuban origin for the key to the rhythmic map of a piece of music, reflecting the binary characteristic of West-African-based Latin American music (subsequently referred to as ‘LM’). Other researchers have expressed this concept of a binary framework in such terms as “asymmetric clock structure [or] construction” [1]. This is the rhythmic framework that virtually all instruments follow in many types of LM, such as salsa, rumba, bembe, samba, maracatú and afoxé, as well as in the traditional music of parts of West Africa, and even in some funk music. In the West African musical roots, “all parts are designed to function in relation to the bell pattern” described as a “single asymmetric unifying line,” which became the clave in the Americas. In the clave form, the rhythms follow a subtly alternating pattern between relatively syncopated and relatively grounded rhythms (tension and release). Since this is a pattern that can be distinguished only by the culturally- or musically-trained ear, the reader is encouraged to read the tutorial on the concept and analysis of clave [2] available from the author. The goal of this study is to characterize the rhythmic structure of LM in absolute mathematical terms to establish the framework for an algorithm for machine recognition of clave direction (see Definitions) in any relevant musical context. Such an algorithm would have commercial, musicological and pedagogical appeal. Commercial uses may include musician-friendly editing (when used in conjunction with other music recognition algorithms) and intelligent arranging features for digital studio workstations, content-searchable web databases, and algorithmic composition. Musician-friendly editing would enable locating passages in a digital recording by musical properties rather than unintuitive time indices. Intelligent arranging would involve clave-

conscious phrasing in automatic accompaniment systems. Contentsearchable databases would enable consumers to seek music by description rather than by title. And algorithmic composition is the quest for tolerable machine-composed music, or artificial creativity. Musicological inquiry involves the question of whether the clave form is an arbitrary cultural phenomenon or a mathematical characteristic. The choice of analytical, intuitive or memory methods in the teaching of LM rhythm concepts may be influenced in connection with such musicological findings.

II. DEFINITIONS A. Latin Music Terms The particular form of the clave examined in this study is the “Rumba Clave,” as seen in the following figure in waveform and percussion tablature (onset notation).

x

x

x

x

x

Fig. 1. “3-2” Rumba Clave waveform and percussion tablature notation showing the onset times of note events over one instance of clave. Each box represents a sixteenth note. The notation is read left to right, at an even pace. The time correspondence between the two representations is evident.

The “direction” of clave refers to the order in which tension and release (resolution) appear in the rhythm. The convention used here is “3 side” for the more syncopated (tension) half of the clave pattern and “2 side” for the straighter (resolution) half. Other widely-recognized clave forms are the “Son” and “Bossa” claves [1]. Though not called by the same name, numerous other rhythms in LM share the clave characteristic of alternating between tension and release, or syncopation and lack thereof.

.

2 B. Music Technology Terms

In digital music technology, “sequencing” refers to the recording of music events in MIDI format. MIDI (Musical Instrument Digital Interface) is the communications protocol for all digital musical instruments and music editing software. MIDI events include note start times (onsets), dynamics (loudness), effects parameters like reverberation time and amount, and global expression controls such as accelerando and ritardando. Quantization, unlike its original meaning in DSP, refers to the automatic adjustment of MIDI events to coincide with a user-selected time grid.

III. STATISTICAL BASIS A. Random and Deterministic Elements Quantized sequenced music is deterministic. It features no mutual interaction between musicians and no timing variation. Quantized rhythm, repeated through a cut-and-paste operation, is periodic. It consists of perfect repetitions on a time grid. Live performance of clave-based music is at best quasi-periodic, and often a sum of harmonic stochastic processes and quasi-periodic deterministic processes. This is because live performance involves interactions between musicians based on psychological, artistic and environmental factors that cannot be modeled in a deterministic fashion. The random variables in the harmonic stochastic processes can be frequency, phase and damping. Over the ensemble, all keys (sequences of pitches-the frequency random variable) and timbres (the phase and damping) are equally likely. All rhythms are not equally likely because music with a fundamental identifying characteristic in rhythm is being investigated.

B. Dependence and Correlation The stochastic processes are not independent but can be assumed uncorrelated because improvisation and melody cannot be expressed as a linear combination of the quasi-periodic rhythmic element.

C. Stationarity and Ergodicity An underlying population of all realizations of clave-based music at a given tempo is assumed. The tempo is arbitrary and was set at the musical instrument industry default of 120 BPM. Since musical pieces all have a start and an end, true stationarity (over an infinite time extent) is not valid. However, taking time-zero to be the start of each piece, local stationarity in the wide-sense can be assumed because all frequencies, phases and damping modes (except unstable) are equally likely. Furthermore, the ensemble mean of any process random variable will equal the time average of the related quantity as the time extent approaches infinity, so the process is asymptotically ergodic in the mean.

IV. THEORY “In some practical applications, correlation is used to identify periodicities in an observed physical signal which may be corrupted by random interference.” [3] In this case, “random interference” is melody and improvisation. Since the deterministic autocorrelation sequence and the biased estimate of autocorrelation for stochastic processes are found to be equivalent to within a constant [3][4], with careful use, they can be applied to stochastic, deterministic and mixed sequences. Autocorrelation sequences of inter-onset intervals (IOI) have been shown to aid the determination of meter in European

Classical Music recorded in MIDI format [5]. Similarly, the autocorrelation of the clave can be used to model the expected rhythmic structure of a clave-based piece. As in the uses of autocorrelation in communications, radar detection [6] and meter detection, any realization of clave-based music can be compared to the known model of the clave to determine the level of similarity in stylistic attributes as a function of correlation lag. The implementation of this comparison is left for further study. In the frequency domain, the deterministic and stochastic components of the music signal under study may need to be isolated for successful clave detection. It has also been demonstrated that beat-tracking of digital audio is possible through the use of filterbanks and noise shaping, where the sum of narrow bands of random noise, shaped by the envelopes of corresponding bands of a music signal, has the same rhythmic property as the music signal [7]. In consideration of the filterbank interpretation of the periodogram method of power spectral density estimation, and due to improved variance and mean-squared error performance, the 50%-overlap Welch-Bartlett PSD estimate was chosen to characterize the clave waveform in the frequency domain.

V. PROCEDURE A. Overview Digital sequencer-generated audio data, prepared for analysis with audio software tools was used as the periodic first step toward the analysis of quasi-periodic music signals. The Cuban Rumba Clave was chosen as the traditional clave form in which clave direction is most obvious to trained musicians and listeners. The assumption was made that this characteristic of the rumba clave would render it easier than other clave forms for statistical characterization.

B. Data Generation Using a Roland PMA-5 portable music sequencer, a digital recording of clave was generated. The input quantization was set to eighth notes for absolute placement of the onsets on the time grid, since swing, or human rhythm error are not concerns for this initial study. CoolEdit 2000 was used to normalize all recordings to the amplitude range of –1 to 1. The recordings were then edited in Blaze Audio RipEditBurn Version 2 to start at exactly the onset of the first beat of the measure, and clipped to exactly 24 seconds' duration. Subsequently, an additional recording was generated for the purpose of gaining some insight into the performance of the methods used in the presence of added melodic (stochastic) signals. The music sample selected was an arbitrary 24-second section of the Miles Davis piece White [8]. The original WAV files were recorded at the standard CD sampling rate of 44.1 kHz. After the first set of MATLAB analyses, a much lower sampling rate was found to be sufficient. Subsequent recordings of the same data were made at 12 kHz for computational load reduction. Additional wav files containing only two notes of the clave were also generated for the purpose of obtaining an “autocorrelation signature” of the 3 side and 2 side of the clave. The downbeat (first note for the 3-2 clave) was included in each file to serve as a time reference (Figs. 2 and 3). All data trimming and normalizing was carried out as before.

C. Statistical Estimation The files were read and saved as MATLAB vectors. Each vector was rendered zero-mean and biased autocorrelation estimates were generated. The biased autocorrelation estimate was chosen over the

.

3

unbiased estimate for two reasons. The unbiased estimate is not nonnegative-definite and can lead to PSD estimations that feature negative power density. And the biased estimate has lower meansquared error than the unbiased estimate. The Welch-Bartlett PSD estimation was carried out with a window length corresponding to two clave durations (4 seconds), 50 % overlap and the Hamming window.

x

into ratios of 1/3 and 2/3 (Fig. 3). The lesser peaks in Figure 5 are spaced so as to break the 2-second clave period into three equal (or near-equal) parts. In contrast, the autocorrelation sequence of two notes from the 3 side of clave show a decidedly irregular pattern (Fig. 6).

x

Fig. 4. Autocorrelation sequence of Rumba Clave showing the 2-second period and the complex configuration of lesser peaks.

Fig. 2. The first two notes of the syncopated side of Rumba Clave: the downbeat and the “Bombo”, the most accented beat in folkloric rumba

x

x

Fig. 5. Autocorrelation sequence of the downbeat and first note of the 2 side of Rumba Clave, with evenly-spaced tertiary peaks showing the regular rhythmic relationship between the two notes.

Fig. 3. The downbeat followed by the first note of the straight side of Rumba Clave.

VI. RESULTS A. Autocorrelation The autocorrelation estimate of the Rumba Clave was viewed in light of the autocorrelation method for detection of musical meter [5]. Similar to the use of autocorrelation for identifying measure starts and measure lengths in monophonic music, it is observed that the second highest peak in the autocorrelation estimate of the clave indicates the overall periodicity, or the duration, of the clave, which is two seconds (Fig. 4). In order to analyze the behavior of the lesser peaks, autocorrelation plots for two-note combinations were generated. One such plot from the 3 side, and fro the 2 side of clave are shown below (Figs. 5 and 6). It is interesting to note that hits 1 and 4 divide the clave duration

Fig. 6. Autocorrelation sequence of the first two notes of the 3 side of Rumba Clave, showing the expectedly uneven spacing of tertiary peaks, reflecting the syncopated rhythm between the downbeat and the Bombo.

Note that the autocorrelation of the perfectly periodic deterministic signals, and thus the peaks at every two seconds, approach zero with increasing lag because of the use of finite

.

4

segments of the signal. Regardless, the autocorrelation behavior of the isolated notes in close relationship to their auditory perception is highly encouraging for analytical understanding and machine perception of clave structure.

B. PSD The Welch PSD estimation of the Rumba Clave (Fig. 7) shows the spectral content of the signal as a narrow range of frequencies, with energy concentrated around approximately 2.2 kHz. This corresponds to the spectrum of the clave “patch” (instrument sound sample) used to generate the audio signal.

Fig. 7. Welch PSD estimate of Rumba Clave, PMA-5 “Claves” patch, # 75 [9].

Upon closer examination (Figs. 8 and 9), a family of spectral peaks, separated at what appears to be about 5/8 Hz are observed, close to the 1/2 Hz frequency of the clave form.

Fig. 10. Welch PSD estimate of Rumba Clave with added melody.

The final figure is that of the PSD estimation for the superposition of the arbitrary melodic signal [8] and the clave. The narrow band occupied by the periodic rhythmic signal, compared with the wide spread of energy in the arbitrarily selected “random” melody content, gives some support to the filterbank approach for isolating rhythmic content.

VII. CONCLUSION Within the scope of the study, encouraging results have been obtained regarding the possibility of modeling clave behavior in the time domain and aiding its detection in the frequency domain: Autocorrelation can give specific information about periodic content in a music signal that can be treated as a combination of random and deterministic elements; Interpretation of the autocorrelation sequences and estimates, and the design of narrowband filters for isolating energy in the rhythmic signature require genre-specific knowledge; And the extraction of deterministic components may become challenging as signals under analysis become more complex. The following improvements and additional areas of investigation are identified to increase the applicability and robustness of the methods undertaken: The measurement of exact distances between autocorrelation peaks; The cross-correlation of the model and a signal under study, and the use of a uniform click track for separate cross-correlation estimates with each side of clave; Investigation of other claves and clave-based rhythms; Incorporation of beat-tracking methods for use with a range of tempos; And the generalization to the quasi-periodic case for applicability to live music.

REFERENCES [1] Fig. 8. Welch PSD estimate of Rumba Clave [2] [3] [4] [5] [6]

Fig. 9. Welch PSD estimate of Rumba Clave, showing almost evenly-spaced peaks at close to the frequency of the clave form.

[7] [8] [9]

J. M. Magill and J. L. Pressing, “Asymmetric Cognitive Clock Structures in West African Rhythms,” Music Perception, vol. 15, no. 2, pp. 189– 222, Winter 1997. M. Vurkaç, “The Concept of Clave and Its Analysis in African-Based Latin American Music,” unpublished. J. G. Proakis and D. G. Manolakis, Digital Signal Processing – Principles, Algorithms, and Applications. Upper Saddle River, NJ: Prentice Hall, 1996, p. 126. D. G. Manolakis, V. K. Ingle and S. M. Kogon, Statistical and Adaptive Signal Processing. McGraw-Hill, 2000, p. 210. J. C. Brown. (1993, October). Determination of the meter of musical scores by the method of autocorrelation. J. Acoust. Soc. Am. 94 (4) pp/ 1953—1957. J. G. Proakis and D. G. Manolakis, Digital Signal Processing – Principles, Algorithms, and Applications. Upper Saddle River, NJ: Prentice Hall, 1996, p. 121. E. D. Scheirer. (1998, January). “Tempo and beat analysis of acoustic musical signals,” J. Acoust. Soc. Am.103 (1) pp. 588—601. M. Davis. Aura, Miles Davis. P. Mikkelborg, Composer; Davis, Trumpet. CD. CBS 45332, 1989 Roland PMA-5 Pocket Guide, Roland Corp., Hamamatsu, Japan, 1998.

Statistical Signal Processing Methods for the ...

Autocorrelation sequences of inter-onset intervals. (IOI) have been shown to aid the determination of meter in European. Classical Music recorded ... Digital sequencer-generated audio data, prepared for analysis with audio software tools was ...

213KB Sizes 0 Downloads 199 Views

Recommend Documents

Sonar Signal Processing Methods for the Detection and ... - IJRIT
and active sonar systems can be used to monitor the underwater acoustic environment for incursions by rapidly moving ... detection and tracking of a small fast surface craft (via its wake) in a highly cluttered shallow water ..... automatic detection

Sonar Signal Processing Methods for the Detection and Localization ...
Fourier transform converts each block of data x(t) from the time domain to the frequency domain: X ( f ) . The power spectrum | X ( f ) ... the hydrophone is 1 m above the sea floor (hr=1m). The model ... The generalized cross correlation processing

Statistical Signal Processing and Modeling
May 12, 2003 - which is output of a causal and stable LTI system with unknown impulse response. For this blind deconvolution problem, we will use the ...

Full Book PDF Statistical Signal Processing
Statistical Signal Processing: Detection, Estimation, and Time Series Analysis PDF, Read Online Statistical Signal Processing: Detection, Estimation, and Time ...

Statistical Signal Processing: Detection, Estimation, and ...
Statistical Signal Processing of Complex-Valued Data: The Theory of Improper ... Pattern Recognition and Machine Learning (Information Science and Statistics)

comparison of signal processing methods: ica and ...
Dec 15, 2011 - and the Wavelet transform as a processing tool, along with the FFT for ... Techniques for signal processing data shown in this paper are the ICA ...

Digital Signal Processing - GitHub
May 4, 2013 - The course provides a comprehensive overview of digital signal processing ... SCHOOL OF COMPUTER AND COMMUNICATION. SCIENCES.

Multimedia Signal Processing for Behavioral ...
bandwidth and storage capacity costs. Video is often an ... 'notes', defined as continuous sounds preceded and followed by ..... Analysis software for segmentation, .... (proportions, means, and SDs) of a mixture with a given number of.