IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009 1061 Equalization of Multichannel Acoustic Systems in Oversampled Subbands Nikolay D. Gaubitch, Member, IEEE, and Patrick A. Naylor, Senior Member, IEEE Abstract—Equalization of room transfer functions (RTFs) is an important topic with several applications in acoustic signal processing. RTFs are often modeled as ﬁnite-impulse response ﬁlters, characterized by orders of thousands of taps and non-minimum phase. In practice, only approximate estimates of the actual RTFs are available due to measurement noise, limited estimation accuracy, and temporal variation of source–receiver position. These issues make equalization a difﬁcult problem. In this paper, we discuss multichannel equalization with focus on inexact RTF estimates. We present a multichannel method for the equalization ﬁlter design utilizing decimated and oversampled subbands, where the fullband acoustic impulse response is decomposed into equivalent subband ﬁlters prior to equalization. This technique is not only more computationally efﬁcient but also more robust to impulse response inaccuracies compared with the full-band counterpart. Index Terms—Dereverberation, multirate audio processing, multichannel equalization. where is a ... ... ... ... ... ... ... ... ... ... ... convolution matrix, and I. INTRODUCTION E QUALIZATION of room transfer functions (RTFs) is an important research topic with several applications in acoustic signal processing, including speech dereverberation [1] and sound reproduction [2]. Although, in theory, exact equalization is possible when multiple observations are avail- able [3], there are many obstacles for practical application of RTF equalization algorithms. Consider the -tap room impulse response of the acoustic path between a source and the th microphone in an -ele- ment microphone array, , with a -transform constituting the RTF. The objective of equalization is to apply an inverse system with transfer func- tion such that (1) where and are arbitrary delay and scale factors respec- tively. Equivalently, considering the tap impulse response of , , (1) can be written in the time domain as (2) Manuscript received May 05, 2007; revised January 06, 2009. Current version published June 26, 2009. This work was supported by the Engineering and Physical Sciences Research Council, U.K., under Grant GR/S66954. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Hiroshi Sawada. The authors are with the Communications and Signal Processing Group, Imperial College London, London SW7 2AZ, U.K. Digital Object Identiﬁer 10.1109/TASL.2009.2015692 is the vector with the impulse response of the equalized RTF. The problem of equalization is to ﬁnd . When is a minimum phase system, a stable inverse ﬁlter can be found by replacing the zeros of with poles [4] (3) However, RTF equalization is not that straightforward in prac- tice because: 1) RTFs are non-minimum phase in general [5] and hence (3) does not give a stable causal solution for ; 2) the average difference between maxima and minima in RTFs are in excess of 10 dB [6]–[8] and therefore RTFs typically contain spectral nulls that, after equalization, give strong peaks in the spectrum causing narrowband noise ampliﬁcation; 3) equalization ﬁlters designed from inaccurate estimates of will cause distortion in the equalized signal [8]; 4) the length of at a sampling frequency is related to the reverberation time, , in a room by and can be several thousand taps in length [6]. Several alternative approaches, both for single and for mul- tiple microphones, have been proposed to address these issues. There are two common methods for single-channel equaliza- tion: single-channel least squares (SCLS) and homomorphic equalization [9]. SCLS equalization ﬁlters are designed by minimizing an error formed from (2) as [9], [10] (4) 1558-7916/$25.00 © 2009 IEEE 1062 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009 where denotes Euclidean distance. In homomorphic inverse ﬁltering [9], [11]–[13], the RTF is decomposed into minimum phase and all-pass components. An exact inverse can be found for the minimum phase component with (3), while the all-pass component can be equalized, for example, using a matched ﬁlter [12]. Equalizing only the magnitude was considered in [5] and [12], but was found to result in audible residual echoes. In a comparative study between these two techniques, Mourjopoulos [9] concluded that SCLS, although sometimes less accurate than homomorphic inversion, is more efﬁcient in practice. Single-channel methods typically result in large processing delay, which is problematic for many communications applica- tions, extremely long and non-causal inverse ﬁlters, and pro- vide only approximate equalization [3]. Due to the approximate nature of these inverse ﬁlters, they are less sensitive to noise and inexact RTF estimates [1]. Inherently, SCLS inverse ﬁlters only partially equalize deep spectral nulls, which can be advan- tageous in avoiding problems due to points 2) and 3) above. In the multichannel case, the non-minimum phase problem is eliminated and exact inversion can be achieved using the Be- zout’s theorem [3], [14]: given a set of RTFs, , and assuming that these do not have any common zeros, a set of ﬁl- ters, , can be found such that [3], [14] An important result is that this method accommodates multichannel equalization of large order systems, taking advantage of the shorter length of multichannel equalization ﬁlters and low sensitivity to RTF inaccuracies. The remainder of the paper is organized as follows. Multichannel equalization is described in Section II. The effects on equalization ﬁlter design from inexact RTFs are demonstrated in Section III. The subband equalization method is developed in Section IV. Section V presents a computational complexity analysis of the subband method. Simulation results demonstrating the operation of the proposed algorithm are given in Section VI and, ﬁnally, conclusions are drawn in Section VII. II. MULTICHANNEL EQUALIZATION The relation in (5) can be written in the time domain as (6) where , and . An optimization problem can then be formulated as (7) and the multichannel equalization (MCEQ) ﬁlters can be calcu(5) lated according to [14] MINT [3] was the ﬁrst multichannel equalization method based on (5). Adaptive versions have also been considered [2]. Unlike single-channel equalization ﬁlters, the length of the multichannel equalization ﬁlters is of similar order as the length of the room impulse responses and there is no processing delay [3], [14]. However, it has been observed that exact equalization is of limited value in practice, when the RTF estimates contain even moderate errors [1], [8]. Various alternatives have been proposed for improving robustness to RTF inaccuracies. Bharitkar et al. [15] use spatially averaged RTFs for the design of the equalization ﬁlter. In [16], the authors modify the desired signal in the multichannel inverse ﬁlter design, such that the late reverberation is equalized while the early reﬂections are preserved. Haneda et al. [17], [18] form an inﬁnite-impulse response (IIR) ﬁlter by decomposing the RTFs into common acoustical poles and non-common zeros. Mourjopoulos [10] uses an AR model of the RTFs rather than the all-zero model in order to reduce the ﬁlter order. The AR model of RTFs is also exploited by Hopgood and Rayner in a single-channel subband equalization approach [19]. Hikichi et al. [20], [21] introduce regularized multichannel equalization which adds robustness to noise and RTF ﬂuctuations. In this paper, we propose a new method for equalization ﬁlter design. Given a set of multichannel RTFs, we decompose the RTFs into their subband equivalent ﬁlters. These are then used to design the subband equalization ﬁlters and the equalization is performed in each subband before a full-band equalized signal is reconstructed. It is shown that this approach not only reduces the computational load, but also reduces the sensitivity to estimation errors and the effect of measurement noise in the RTFs. (8) where is the matrix pseudo-inverse [22]. The choice of equalization ﬁlter length, and, consequently, the dimensions of , , deﬁne the solution obtained with (8). If then (9) and the system is underdetermined such that several exact so- lutions exist [23]. Then the pseudo-inverse in (8) is deﬁned as and gives the minimum norm solution to (7). In the special case when the length in (9) results in an equivalence, the matrix becomes square and the pseudo-in- verse in (8) reduces to a standard matrix inverse. The exact solu- tion is then unique and equivalent to that of MINT [3]. However, as pointed out in [14], it is not always possible to choose such length for , since the relation in (9) may not give an in- teger result. Instead, a greater length is often chosen [14], [24]. A third case arises when is chosen such that , which results in an overdetermined system of equations and only a least squares solution can be obtained [23]. For this work, we consider the former, minimum norm exact solutions, and set the equalization ﬁlter length to (10) where denotes the ceiling operator giving the smallest in- teger greater than or equal to . The relation between an input signal , RTFs , equalizers , and an output GAUBITCH AND NAYLOR: EQUALIZATION OF MULTICHANNEL ACOUSTIC SYSTEMS IN OVERSAMPLED SUBBANDS 1063 Fig. 1. Full-band multichannel equalization system. signal is depicted in Fig. 1 where for ideal equalization. III. EQUALIZATION WITH INEXACT IMPULSE RESPONSES In this section we demonstrate the effects of equalization ﬁlter design when using inexact , considering both single-channel (approximate) equalization with SCLS and multichannel (exact) equalization with MCEQ. We deﬁne an inexact system impulse response, , as an impulse re- sponse with system mismatch dB, with Fig. 2. Magnitude and phase distortion versus system mismatch for (a) exact equalization with MCEQ from (8) and (b) approximate equalization with SCLS from (4). 2) Linear phase deviation is deﬁned as the deviation of the unwrapped phase from a linear ﬁt to its values and is deﬁned here as dB (11) where denotes Euclidean distance. In the remainder of this work we model system mismatch, as in [25], according to (12) where , is the iden- tity matrix, and is a zero mean Gaussian variable with the variance set to the desired system mismatch, dB. We now study the design of an equalization ﬁlter for using when dB. Furthermore, we deﬁne the equal- ized system with -point discrete Fourier transform , where is set to the nearest integer power of two larger than the length of . For evaluation pur- poses we consider the magnitude and the phase separately as follows. 1) Magnitude deviation is deﬁned here as the standard de- viation of the equalized magnitude response [8] (13) with This measure is scaling independent and equal to zero for exact equalization. (14) where is the least squares linear approximation to the phase at frequency bin . Two key effects regarding equalization ﬁlter design from inexact impulse responses are to be demonstrated: A. the performance degradation caused by increased system mismatch and B. the performance degradation caused by increased system length for a ﬁxed system mismatch. A. Effects of System Mismatch An illustrative comparison experiment was performed using an arbitrary system with two random channels , , 2 of length . System mismatch ranging from 0 to 80 dB was modeled using (12). For each case, the impulse response was equalized using the MCEQ method with , and with the SCLS method with , . The results, averaged over 100 different random channels, are displayed in Fig. 2. It is seen that equalization using the MCEQ method in- troduces large spectral distortion for dB, a level of system mismatch which is the operating range of many current (blind or non-blind) RTF estimation techniques. In contrast, the single-channel SCLS equalizer degrades much more gracefully, although equalization ﬁlters of very high orders are required. The better performance of the SCLS is a result of the least squares approximation not being able to equalize deep spec- tral nulls. Furthermore, it is observed that for dB the multichannel method results in exact equalization while the single-channel counterpart reaches a performance bound. These observations are also in accordance with the results reported in 1064 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009 motivates the development of a multichannel subband equalizer, where shortened channel length is an inherent feature. Fig. 3. Magnitude and phase distortion versus impulse response length for (a) M = 030 exact equalization with MCEQ from (8) and (b) approximate equalization with SCLS from (4), both with system mismatch dB. [8] and [26], where the authors studied equalization of RTF measured at a different location to that at the point of processing. B. Effects of System Length We examine next the interrelation between system mismatch, impulse response length and equalization accuracy. We consider an arbitrary system with two random channels , , with length varied in the range 10 to 190 taps and system mis- match dB. The lengths of the inverse ﬁlters were set to and for the MCEQ and SCLS equal- izers, respectively. Fig. 3 shows the resulting magnitude and phase distortion for the different channel lengths as an average of 100 different random channel realizations. It can be seen that the exact equalization with MCEQ considerably decreases in performance compared with the single-channel SCLS, which appears more or less constant. One reason for the performance degradation with increased system length is that, although the misalignment is kept constant, the total energy of the error in the estimates increases with system length. Moreover, increasing the order of the system results in a larger number of spectral zeros to be equalized, which affects the multichannel equaliza- tion in particular since it is more sensitive to errors in the channel estimates compared to the SCLS as seen in Section III-A. In summary, we have seen that exact multichannel equaliza- tion with inverse ﬁlters obtained from inexactly estimated sys- tems gives worse results than approximate single-channel equal- ization. However, SCLS inverse ﬁlter length of the order is not suitable for realistic applications involving acoustic impulse responses and the achieved equalization is limited even when the system mismatch is low. In addition, the deteriorating effects of exact multichannel equalization, for a ﬁxed system mismatch, were seen to increase with increased channel length. These ob- servations lead us to the conclusion that when equalization ﬁl- ters are designed from inexact system estimates, approximate solutions and short system lengths are preferable. The system length due to RTFs is a function of the room and its reverberation time and, therefore, not a controllable system parameter. This IV. MULTICHANNEL SUBBAND EQUALIZATION We now derive the subband multichannel (SB-MCEQ) equalizer. Fig. 4 shows a conceptual system diagram of the SB-MCEQ where the full-band system depicted in Fig. 1 is applied to each subband. This emphasizes three key issues to consider in such design: 1) the choice of the ﬁlter-bank, 2) the mapping of full-band to subband RTFs, and 3) the equalizer design using the subband equivalent ﬁlters. Each of these is discussed in the remainder of this section. Multirate processing [27] has been applied successfully in acoustic signal processing problems such as, for example, acoustic echo cancellation where signiﬁcant improvements have been demonstrated in the convergence of the subband adaptive ﬁlters [28]–[31]. A subband version of MINT was ﬁrst investigated in [32]. This approach uses a critically decimated ﬁlter-bank. The subband transfer functions to be equalized are estimated in a least squares sense using the observation of a known reference signal. A different multichannel subband method was proposed by Wang and Itakura [33] for a critically decimated ﬁlter-bank. Single-channel least squares equalizer is applied to each subband and each microphone and the full-band signal is reconstructed using the best microphone in each subband. The best microphone is selected for each subband using a normalized estimation error criterion from the estimation of the SCLS ﬁlters. In [19], a rigourous approach was taken and the relation between full-band and subband ﬁlters was studied for an AR model of the room impulse response. An adaptive method for multichannel equalization in oversampled subbands was proposed in [30] and was shown to provide signiﬁcant improvement over the full-band counterpart. The relation between full-band and subband ﬁltering was studied, for example, by Lanciani et al. [34] for ﬁltering of MPEG audio signals and by Reilly et al. [31] with applications to acoustic echo cancellation. The former authors derive the relations between the full-band and subband ﬁlters for critically decimated cosine modulated ﬁlter banks [27], which are shown to require cross-band ﬁltering. On the other hand, Reilly et al. [31] show that good approximations can be obtained with a diagonal ﬁltering matrix, involving only one ﬁlter per subband for complex oversampled ﬁlter-banks since these sufﬁciently suppress aliasing in adjacent subbands [30]. We now extend this approach to the multichannel case with application to RTF equalization. This method differs from the previously proposed methods in that it uses oversampled subbands in conjunction with the explicit relation between the full-band and the subband RTFs. A. Oversampled Filter-Banks The generalized discrete Fourier transform (GDFT) ﬁlter-bank [29] is employed in the subsequent development work. The advantages of this ﬁlter-bank include straightforward implementation of fractional oversampling and computationally efﬁcient implementations [29]. Within the framework of the GDFT ﬁlter-bank, the analysis ﬁlters, , are calculated GAUBITCH AND NAYLOR: EQUALIZATION OF MULTICHANNEL ACOUSTIC SYSTEMS IN OVERSAMPLED SUBBANDS 1065 Fig. 4. Subband multichannel equalization system. from a single prototype ﬁlter, , with bandwidth ac- cording to the relation [29] (15) P2: Magnitude distortion of the ﬁlter-bank is negligible (18) where the properties of the frequency and time offset terms, and , are discussed in, for example, [29]. We set these to and as in [31]. It has been shown [29] that a corre- sponding set of synthesis ﬁlters satisfying near perfect recon- struction can be obtained from the time-reversed, conjugated version of the analysis ﬁlters (16) where is the length of the prototype ﬁlter and, consequently, the length of all analysis and synthesis ﬁlters of the ﬁlter-bank. Although this ﬁlter design results in complex subband signals, for even, only subbands need to be processed since the remaining subbands are straightforward complex conjugates of these. The choices of decimation factor and number of subbands has several consequences on the algorithm. A large number of subbands requires a long prototype ﬁlter to suppress aliasing effectively. On the other hand, if too few subbands are used, the beneﬁt of shorter subband equalization ﬁlters is reduced. The choice of oversampling ratio affects the performance of the equivalent subband ﬁlters. A good tradeoff between these parameters was found in the ﬁlter-bank used for the illustrative experiments in this paper with subbands and decima- tion factor . An -tap prototype ﬁlter was designed using the iterative least squares method [29], giving an estimated aliasing suppression of 82 dB. From the properties of the GDFT outlined here, the following two properties can be assumed to be valid: P1: Aliasing is sufﬁciently suppressed in the subbands (17) where and are the -transforms of the subband analysis and synthesis ﬁlters respectively. B. Subband Decomposition Consider the subband, microphone system in Fig. 4. It is clear that, in order to design the subband equalizers , the subband RTFs must be found using, for example, complex subband decomposition [31] of their full-band counter parts . The objective of the subband decomposition is to ﬁnd a set of subband ﬁlters , given the full-band ﬁlter , such that the total transfer function of the ﬁlter bank is equivalent to the that of the full-band ﬁlter up to an arbitrary scale factor and an arbitrary delay . This can be written (19) The total transfer function of the ﬁlter-bank for the th channel is given by (20) Evoking property P1 in (17), the ﬁlter-bank transfer function reduces to (21) which allows for a single ﬁlter per subband. Next, following the approach in [31], we choose the ﬁlters in each subband such that they satisfy the relation where . (22) 1066 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009 Substituting (22) into (21) we obtain (23) Finally, due to property P2 in (18), we ﬁnd that the overall ﬁlterbank transfer function is (24) which is the desired result. Thus, the remaining problem is to solve for in (22). Decimating (22) by a factor of , the following approxima- tion can be formed: In summary, given a full-band RTF and -band ﬁlter-bank satisfying perfect reconstruction and aliasing sup- pression in the subbands, a set of subband ﬁlters of order can be found such that the overall subband transfer function is equivalent to the full-band ﬁlter response. We now aim to exploit the signiﬁcant order reduction in the subbands of the very long full-band room impulse responses. C. Subband Multichannel Equalization The multichannel equalization ﬁlters can be calcu- lated for each subband using the ﬁlters obtained from (29). Here, this is done utilizing the multichannel equalization ﬁlter design from (8), which now becomes (30) such that for each subband (25) (31) which in the time domain is equivalently written (26) where subband impulse responses, is an and are the -tap vector with ... ... ... ... ... ... ... ... ... ... ... where is the length of the analysis ﬁlters. The convolution on the left-hand side of (26) is of length , and consequently, the length of the subband ﬁlters is (27) The estimates of the subband ﬁlters are then found by solving the following optimization problem [31]: (28) The th subband, th channel ﬁlters are obtained in the least squares optimal sense according to Thus, equalization is achieved by applying the inverse ﬁlters to the subband signals of the reverberant observations in each subband , , and an equalized full-band signal is constructed. Assuming that exact equalization is achieved in each subband, the accuracy of the ﬁnal result will depend on the reconstruction properties of the ﬁlter-bank, the level of aliasing suppression and, consequently, on the design of the prototype ﬁlter. Therefore, the overall equalization of the subband method will not be exact in practice, which can be beneﬁcial as discussed in Section III. These dependencies will be explained through illustrative simulations in Section VI. V. COMPUTATIONAL COMPLEXITY In this section, we present a comparative analysis of the number of computations required for the solution of the full-band MCEQ equalizer design and the SB-MCEQ equal- izer design (including the computational cost of the subband decomposition). The comparison is made in terms of ﬂoating point operations (ﬂops), where one ﬂop is deﬁned as either one real multiplication or one real addition [22]. We consider the general optimization problem , which has a minimum norm solution , where is an arbitrary real valued matrix and is a real valued vector. The number of ﬂops required to solve this problem using the normal equations is given by [22] (32) From the dimensions of the full-band equalization ﬁlter calculation in (8), the number of ﬂops required for the MCEQ design is (29) (33) GAUBITCH AND NAYLOR: EQUALIZATION OF MULTICHANNEL ACOUSTIC SYSTEMS IN OVERSAMPLED SUBBANDS 1067 2 Fig. 5. Floating point operation count versus system length for the full-band ( ) and subband ( ) equalizers. Fig. 6. Typical example of (a) a simulated room impulse response and (b) the corresponding magnitude response. The subband equalization ﬁlter design takes into consideration two separate calculations for each of the subbands: the cost of the subband inverse ﬁlter computation in (30) and the cost of the subband decomposition in (29). The data for these calculations is complex where, generally, one complex multiply requires four real multiplies and two real additions and one complex addition requires two real additions. Under the assumption that an equal number of complex multiplications and complex additions are required to solve the system of equations considered here, we multiply the expression in (32) by a factor of four. The total ﬂops required for the subband inverse ﬁlter design can be expressed as (34) where . The key factor of the com- putational complexity is the system length and thus, the im- provement achieved by the subband method will depend on the number of subbands and on the decimation ratio. An example is given in Fig. 5 where the computational complexity is calculated with (33) and (34), respectively. The subband implementation for this example is that presented in Section IV-B with subbands decimated by . On average over all lengths, the subband approach reduces the computational complexity by a factor of 120. VI. SIMULATIONS AND RESULTS The following simulation results are presented to demonstrate the performance of the proposed SB-MCEQ equalization method. Three experiments were performed to show 1) a comparative performance evaluation with the full-band MCEQ using simulated RTFs, 2) the application of SB-MCEQ to speech dereverberation, and 3) an illustrative example of equalization of real measured RTFs. A. Experiment 1: Simulated RTFs The experiment demonstrates the performance of the SB-MCEQ equalizer, compared with the full-band MCEQ using simulated RTFs. A linear array of uniformly dis- tributed microphones with 0.1 m separation between adjacent sensors was simulated using the source-image method [11] for a room with dimensions m. The impulse response at one of the microphones and the corresponding magnitude response are depicted in Fig. 6(a) and(b), respectively. The sam- pling frequency was s kHz and the room reverberation time was s, resulting channel lengths of taps. Moreover, keeping the source–microphone conﬁguration ﬁxed, RTFs were simulated at 100 different locations in the room. System misalignment varying between 0 and 80 dB was simulated with (12). The full-band equalization ﬁlters in (8) were computed with the SLICOT toolbox [35] according to the method discussed in [23]. Fig. 7 shows the results in terms of magnitude and phase distortion, as an average of the 100 measurement locations for (a) the full-band MCEQ and (b) for the proposed subband implementation. Notably, the SB-MCEQ exhibits much more graceful performance degradation with increased misalignment compared to the full-band MCEQ and with a similar behavior as the single-channel SCLS equalizer results shown in Fig. 2. Thus, the SB-MCEQ method is shown in these results to be less sensitive to inexact impulse responses, while beneﬁting from the shorter ﬁlters of multichannel inversion. This improvement is a consequence of the reduced ﬁlter length in the subbands, which in Section III-B was demonstrated to improve the MCEQ equalizer performance. In addition, nearly perfect equalization is achieved with the SB-MCEQ method for dB. Finally, we provide two characteristic examples of the sub- band equalizer output for the simulated RTFs. Fig. 8(a) shows a typical outcome of the equalized room impulse response in 1068 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009 Fig. 7. Magnitude and phase distortions versus system mismatch for SB-MCEQ equalization of simulated room impulse responses. Fig. 9. Equalized (a) time-domain impulse response and (b) magnitude re- M = 010 sponse, using the SB-MCEQ method for dB. The magnitude = 2 63 distortion is : . (Note that the magnitude scaling of the equalized im- pulse response is of no signiﬁcance, but the relative scaling between Figs. 8(a) and 9(a) is signiﬁcant.) Fig. 8. Equalized (a) time domain impulse response and (b) magnitude re- M = 080 sponse, using the SB-MCEQ method dB. The magnitude dis- = 0 03 tortion is : . (Note that the magnitude scaling of the equalized impulse response is of no signiﬁcance.) Fig. 10. Segmental SRR for (a) speech equalized with SB-MCEQ, (b) speech equalized with full-band MCEQ, and (c) unprocessed reverberant speech at one channel. the time domain and Fig. 8(b) shows the corresponding mag- nitude response for dB. It can be seen that near perfect equalization is achieved with only small spectral distor- tion ; this distortion results from the approximations in the subband ﬁlter decomposition and in the ﬁlter-bank recon- struction. Thus, the accuracy depends on the ability of the pro- totype ﬁlter to suppress aliasing and on the oversampling ratio. The delay in the equalized impulse in Fig. 8(a) is due to the ﬁlter-bank and is governed by the order of the prototype ﬁlter . As a further illustration for a less accurate RTF estimation, a characteristic outcome for dB is shown in Fig. 9, where a more signiﬁcant spectral distortion is observed, which is due to the room impulse response inaccuracies. B. Experiment 2: Speech Dereverberation In Experiment 2, we used the impulse responses and the equalizing ﬁlters from Experiment 1 and applied these to speech dereverberation. The sentence “Hoist the load to your left shoulder.” uttered by a male talker, drawn from the IEEE corpus [36], was used as an example. The segmental signal-to-reverberation ratio (SRR) [1] was used as an objective evaluation metric. The results, averaged over 100 different source-microphone conﬁgurations, are shown in Fig. 10 for (a) speech equalized with the proposed subband approach, (b) speech equalized with the full-band MCEQ, and (c) unpro- cessed speech at the microphone closest to the talker. It can be seen that equalizing with channel estimates with misalign- ment larger than dB results in lower segmental GAUBITCH AND NAYLOR: EQUALIZATION OF MULTICHANNEL ACOUSTIC SYSTEMS IN OVERSAMPLED SUBBANDS 1069 Fig. 11. Measured (a) room impulse response and (b) the corresponding mag- nitude response. M = 050 Fig. 12. Equalized (a) time domain impulse response and (b) magnitude re- sponse, using the full-band MCEQ method for dB. SRR than in that of the unprocessed reverberant signal. The reduced sensitivity to errors in the channel estimates of the subband SB-MCEQ method is manifested here by showing that equalization can be beneﬁcial down to misalignments of dB; for this example, there is, on average over all misalignments, 9-dB improvement in segmental SRR using the subband method compared to the full-band method. C. Experiment 3: Measured RTFs Finally, we provide an example of equalization using mea- sured RTFs obtained from the MARDY database [37]. An ex- ample (a) impulse response and (b) the corresponding magni- tude response are shown in Fig. 11. System misalignment cor- responding to dB was simulated with (12), and the resulting RTFs were employed in the design the equalization ﬁlters using both the full-band and the subband methods. The equalized RTF using the full-band MCEQ and the SB-MCEQ are shown in Figs. 12 and 13, respectively. The smaller spectral distortion caused by the subband method is conspicuous. VII. CONCLUSION Equalization of acoustic impulse responses has been discussed both for single and multiple microphones. Single-microphone approaches can provide only approximate equalization, require very long inverse ﬁlters, and result in long processing delay due to the non-minimum phase property of the RTFs. On the other hand, exact equalization with no delay and with inverse ﬁlters of similar order to the room impulse responses is possible in the multimicrophone case. However, multichannel methods are very sensitive to inaccuracies in the estimated systems to be equalized, causing signiﬁcant distortions to the equalized signal. Consequently, a new algorithm was derived operating on decimated oversampled subband signals, where the full-band impulse response is decomposed into equivalent ﬁlters in the subbands and multichannel least squares equalization is applied to M = 050 Fig. 13. Equalized (a) time-domain impulse response and (b) magnitude re- sponse, using the SB-MCEQ method for dB. each subband. It was shown that this method results in substantial computational savings at the cost of very small spectral distortion due to the ﬁlter bank. Simulation results were presented to evaluate the performance of this method and equalization of channels of several thousand taps was demonstrated. Most importantly, experimental results indicated that the new method is more robust to errors in the impulse responses of the system to be equalized, which is due to a combination of shorter ﬁlters and approximation of the ﬁltering in the subbands. Thus, the proposed subband multichannel equalization beneﬁts from the reduced sensitivity to channel estimation errors, shorter equalization ﬁlters, no delay due to the equalization (the delay due to the ﬁlter bank is less than 32 ms in our examples), giving signiﬁcant advantages over existing single and multichannel techniques. 1070 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 6, AUGUST 2009 REFERENCES [1] P. A. Naylor and N. D. Gaubitch, “Speech dereverberation,” in Proc. Int. Workshop Acoust. Echo Noise Control, Eindhoven, The Netherlands, Sep. 2005, paper ID pt03. [2] P. A. Nelson, F. Orduña-Brustamante, and H. Hamada, “Inverse ﬁlter design and equalization zones in multichannel sound reproduction,” IEEE Trans. Speech Audio Process., vol. 3, no. 3, pp. 185–192, Nov. 1995. [3] M. Miyoshi and Y. Kaneda, “Inverse ﬁltering of room acoustics,” IEEE Trans. Acoust., Speech, Signal Process., vol. 36, no. 2, pp. 145–152, Feb. 1988. [4] J. G. Proakis and D. G. Manolakis, Digital Signal Processing, 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, 1996. [5] S. T. Neely and J. B. Allen, “Invertibility of a room impulse response,” J. Acoust. Soc. Amer., vol. 66, no. 1, pp. 165–169, Jul. 1979. [6] H. Kuttruff, Room Acoustics, 4 ed. New York: Taylor & Francis, Oct. 2000. [7] M. R. Schroeder, “Statistical parameters of the frequency response curves of large rooms,” J. Audio Eng. Soc., vol. 35, no. 5, pp. 299–305, May 1987. [8] B. D. Radlovic´, R. C. Williamson, and R. A. Kennedy, “Equalization in an acoustic reverberant environment: Robustness results,” IEEE Trans. Acoust., Speech, Signal Process., vol. 8, no. 3, pp. 311–319, May 2000. [9] J. Mourjopoulos, P. Clarkson, and J. Hammond, “A comparative study of least-squares and homomorphic techniques for the inversion of mixed phase signals,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 1982, vol. 7, pp. 1858–1861. [10] J. N. Mourjopoulos, “Digital equalization of room acoustics,” J. Audio Eng. Soc., vol. 42, no. 11, pp. 884–900, Nov. 1994. [11] J. B. Allen and D. A. Berkley, “Image method for efﬁciently simulating small-room acoustics,” J. Acoust. Soc. Amer., vol. 65, no. 4, pp. 943–950, Apr. 1979. [12] B. D. Radlovic´ and R. A. Kennedy, “Nonminimum-phase equalization and its subjective importance in room acoustics,” IEEE Trans. Speech Audio Process., vol. 8, no. 6, pp. 728–737, Nov. 2000. [13] M. Tohyama, R. H. Lyon, and T. Koike, “Source waveform recovery in a reverberant space by cepstrum dereverberation,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 1993, vol. 1, pp. 157–160. [14] Y. Huang, J. Benesty, and J. Chen, “A blind channel identiﬁcation-based two-stage approach to separation and dereverberation of speech signals in a reverberant environment,” IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 882–895, Sep. 2005. [15] S. Bharitkar, P. Hilmes, and C. Kyriakakis, “Robustness of spatial average equalization: A statistical reverberation model approach,” J. Acoust. Soc. Amer., vol. 116, no. 6, pp. 3491–3497, Dec. 2004. [16] M. Hofbauer and H. Loeliger, “Limitations for FIR multi-microphone speech dereverberation in the low-delay case,” in Proc. Int. Workshop Acoust. Echo Noise Control, Sep. 2003, pp. 103–106. [17] Y. Haneda, S. Makino, and Y. Kaneda, “Common acoustical pole and zero modeling of room transfer functions,” IEEE Trans. Speech Audio Process., vol. 2, no. 2, pp. 320–328, Apr. 1994. [18] Y. Haneda, S. Makino, and Y. Kaneda, “Multiple-point equalization of room transfer functions by using common acoustical poles,” IEEE Trans. Speech Audio Process., vol. 5, no. 4, pp. 325–333, Jul. 1997. [19] J. R. Hopgood and P. J. W. Rayner, “A probabilistic framework for subband autoregressive models applied to room acoustics,” in Proc. IEEE Workshop Statistical Signal Process., Aug. 2001, pp. 492–495. [20] T. Hikichi, M. Delcroix, and M. Miyoshi, “On robust inverse ﬁlter design for room transfer function ﬂuctuations,” in Proc. Eur. Signal Process. Conf., Sep. 2006, CD-ROM. [21] T. Hikichi, M. Delcroix, and M. Miyoshi, “Inverse ﬁltering for speech dereverberation less sensitive to noise,” in Proc. Int. Workshop Acoust. Echo Noise Control, Sep. 2006, pp. 1–4. [22] G. H. Golub and C. F. van Loan, Matrix Computations, ser. John Hopkins series in the mathematical sciences, 3rd ed. London, U.K.: John Hopkins Univ. Press, 1996. [23] M. Hofbauer, “Optimal Linear Separation and Deconvolution of Acoustical Convolutive Mixtures,” Ph.D. dissertation, Swiss Federal Inst. of Technol. (ETH), Zürich, Switzerland, 2005. [24] T. Hikichi, M. Delcroix, and M. Miyoshi, “Inverse ﬁltering for speech dereverberation less sensitive to noise and room transfer function ﬂuctuations,” EURASIP J. Adv. Signal Process., vol. 2007, pp. 1–12, 2007. [25] J. H. Cho, D. R. Morgan, and J. Benesty, “An objective technique for evaluating doubletalk detectors in acoustic echo cancelers,” IEEE Trans. Speech Audio Process., vol. 7, no. 7, pp. 718–724, Nov. 1999. [26] F. Talantzis and D. B. Ward, “Robustness of multi-channel equalization in an acoustic reverberant environment,” J. Acoust. Soc. Amer., vol. 114, no. 2, pp. 833–841, Aug. 2003. [27] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Cliffs: Prentice-Hall, 1993. [28] P. A. Naylor, O. Tanrikulu, and A. G. Constantinides, “Subband adaptive ﬁltering for acoustic echo control using allpass polyphase IIR ﬁlterbanks,” IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 143–155, Mar. 1998. [29] S. Weiss and R. W. Stewart, On Adaptive Filtering in Oversampled Subbands. Aachen, Germany: Shaker Verlag, 1998. [30] S. Weiss, G. W. Rice, and R. W. Stewart, “Multichannel equalization in subbands,” in Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust., Oct. 1999, pp. 203–206. [31] J. P. Reilly, M. Wilbur, M. Seibert, and N. Ahmadvand, “The complex subband decomposition and its application to the decimation of large adaptive ﬁltering problems,” IEEE Trans. Signal Process., vol. 50, no. 11, pp. 2730–2743, Nov. 2002. [32] H. Yamada, H. Wang, and F. Itakura, “Recovering of broad band reverberant speech signal by sub-band MINT method,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 1991, pp. 969–972. [33] H. Wang and F. Itakura, “Realization of acoustic inverse ﬁltering through multi-microphone sub-band processing,” IEICE Trans. Fundamentals, vol. E75-A, no. 11, pp. 1474–1483, Nov. 1992. [34] C. A. Lanciani and R. W. Schafer, “Subband-domain ﬁltering of MPEG audio signals,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Mar. 1999, vol. 2, pp. 917–920. [35] A. Varga and P. Benner, “SLICOT—A subroutine library in systems and control theory,” in Applied and Computational Control, Signal and Circuits, B. N. Datta, Ed. New York: Birkhäuser, 1999, vol. 1, pp. 499–539. [36] IEEE Subcommittee, “IEEE recommended practice for speech quality measurements,” IEEE Trans. Audio Electroacoust., vol. AU-17, no. 3, pp. 225–246, Sep. 1969. [37] J. Y. C. Wen, N. D. Gaubitch, E. A. P. Habets, T. Myatt, and P. A. Naylor, “Evaluation of speech dereverberation algorithms using the MARDY database,” in Proc. Int. Workshop Acoust. Echo Noise Control, Paris, France, Sep. 2006, paper ID 33. Nikolay D. Gaubitch (M’07) received the M.Eng. degree in computer engineering from Queen Mary, University of London, London, U.K., in 2002 and the Ph.D. degree from Imperial College London in 2006. Since 2005, he has been a Member of Research Staff in the Communications and Signal Processing Group, Imperial College London. His research interests span various aspects of speech and audio processing, including speech dereverberation, adaptive blind system identiﬁcation, multichannel acoustic system equalization, and speech enhancement. Patrick A. Naylor (M’89–SM’07) received the B.Eng. degree in electronics and electrical engineering from the University of Shefﬁeld, Shefﬁeld, U.K., in 1986 and the Ph.D. degree from Imperial College, London, London, U.K., in 1990. Since 1989, he has been a Member of Academic Staff in the Communications and Signal Processing Group, Imperial College London, where he is also Director of Postgraduate Studies. His research interests are in the areas of speech and audio signal processing, and he has worked in particular on adaptive signal processing for acoustic echo control, speaker identiﬁcation, multichannel speech enhancement, and speech production modeling. In addition to his academic research, he enjoys several fruitful links with industry in the U.K., U.S., and in mainland Europe. Dr. Naylor is an Associate Editor of the IEEE SIGNAL PROCESSING LETTERS and a member of the IEEE Signal Processing Society Technical Committee on Audio and Electroacoustics.

## 评论