127x Filetype PDF File size 2.45 MB Source: engineering.purdue.edu
IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 66, NO. 1, JANUARY 2018 35 Accuracy Directly Controlled Fast Direct Solution of General H2-Matrices and Its Application to Solving Electrodynamic Volume Integral Equations Miaomiao Ma, Graduate Student Member, IEEE, and Dan Jiao, Fellow, IEEE Abstract—The dense matrix resulting from an integral equa- fast solvers for electromagnetic analysis. Many existing fast tion (IE)-based solution of Maxwell’s equations can be compactly solvers can be interpreted in this framework, although their represented by an H2-matrix. Given a general dense H2- developments may predate the H2-framework. For exam- matrix, prevailing fast direct solutions involve approximations ple, the matrix structure resulting from the well-known whose accuracy can only be indirectly controlled. In this paper, FMM-basedmethods[5], [6] is an H2-matrix. In other words, we propose new direct solution algorithms whose accuracy is 2 directly controlled, including both factorization and inversion, the H -matrix can be viewed as an algebraic generalization for solving general H2-matrices. Different from the recursive of the FMM method. Multiplying an H2-matrix by a vector inverse performed in existing H2-based direct solutions, this new has a complexity of O(NlogN) for solving electrically large direct solution is a one-way traversal of the cluster tree from surface integral equations (SIEs). In the mathematical litera- the leaf level all the way up to the root level. The underlying ture, such as [7], it is shown that the storage, matrix-vector multiplications and additions are carried out as they are without using formatted multiplications and additions whose accuracy multiplication (MVM), and matrix–matrix multiplication of an cannot be directly controlled. The cluster bases and their rank H2-matrix can be performed in optimal O(N) complexity of the original matrix are also updated level by level based on pre- for constant-rank cases. However, no such complexity is scribed accuracy, without increasing computational complexity, shown for either matrix factorization or inversion, regardless to take into account the contributions of fill-ins generated during the direct solution procedure. For constant-rank H2-matrices, of whether the rank is a bounded constant or an increasing the proposed direct solution has a strict O(N) complexity in both variable. Later in [8] and [9], fast matrix inversion and time and memory. For rank that linearly grows with the electrical LU factorization algorithms are developed for H2-matrices. size, the complexity of the proposed direct solution is O(NlogN) They have shown a complexity of O(N) for solving constant- in factorization and inversion time, and O(N) in solution time rank cases, such as electrically small and moderate problems andmemoryforsolvingvolumeIEs(VIEs).Rapiddirectsolutions for both SIE and volume IE (VIE) [10]–[13];and a complexity of electrodynamic VIEs involving millions of unknownshave been obtained on a single CPU core with directly controlled accuracy. of O(NlogN) for solving electrically large VIEs [14]–[16]. Comparisons with state-of-the-art H2-based direct VIE solvers Thereexist other significant contributions in fast direct solvers, have also demonstrated the advantages of the proposed direct such as [17]–[25], [32], and [33]. The focus of this paper, solution in accuracy control, as well as achieving better accuracy however, is a fast direct solution to an H2-matrix, as this with much less CPU time. matrix structure is general suitable for solving both dense and Index Terms—Dense matrices, electrodynamic, electromag- sparse matrices. netic analysis, fast direct solvers, frequency domain, H2-matrix, Despite a significantly reduced complexity, in the existing integral equations (IEs), linear complexity solvers, volume direct solutions of H2-matrices, such as [8] and [9], formatted IEs (VIEs). multiplications and additions are performed instead of actual ones, where the cluster bases used to represent a matrix I. INTRODUCTION are also used to represent the matrix’s inverse as well as HE H2-matrix is a general mathematical frame- LU factors. During the inversion and factorization procedure, Twork [1]–[4] for compact representation and efficient only the coupling matrix of each admissible block is com- computation of dense matrices. It can be utilized to develop puted, while the cluster bases are kept to be the same as those in the original matrix. Physically speaking, such a choice of Manuscript received March 22, 2017; revised June 5, 2017 and the cluster bases for the inverse matrix often constitutes an July 16, 2017; accepted July 20, 2017. Date of publication August 15, 2017; accurate choice. However, mathematically speaking, the accu- date of current version January 4, 2018. This work was supported in part by the NSF under Award 1619062, in part by the SRC (Task 1292.073), and racy of the inversion and factorization cannot be directly in part by DARPA under Award HR0011-14-1-0057. (Corresponding author: controlled. The lack of an accuracy control is also observed Dan Jiao.) in the H2-based formatted matrix–matrix multiplication in the The authors are with the School of Electrical and Computer Engi- neering, Purdue University, West Lafayette, IN 47907 USA (e-mail: mathematical literature, such as [7]. If the accuracy is to be djiao@purdue.edu). controlled, the inverse as well as the matrix–matrix multipli- Color versions of one or more of the figures in this paper are available cation algorithm must be completely changed, as the original online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TMTT.2017.2734090 formatted framework does not offer a mechanism to control 0018-9480 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 36 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 66, NO. 1, JANUARY 2018 the accuracy without increasing complexity. Algorithmwise, the collect operation involves approximations, whose accuracy is not directly controlled. This operation is performed when multiplying a nonleaf block with a nonleaf block, or multiply- ing a nonleaf block with an admissible block, with the target block being admissible. In this operation, the four blocks, either admissible or inadmissible, are collected to a single admissible block based on the cluster bases of the original matrix. If the target block cannot be accurately represented by the original cluster bases, then an error would occur. When the accuracy of the direct solution is not satisfactory, one can only change the original H2-representation, i.e., change the cluster bases and/or the rank for representing the original matrix (based on a prescribed accuracy), with the hope that they can better represent the inverse and LU factors. Therefore, the accuracy of the resulting direct solution is not directly controlled. A direct solution with an explicit accuracy control should perform every multiplication and addition as it is without assuming a format of the matrix involved in the computation.Meanwhile,everyoperationin the direct solution should be either exact or performed based on a prescribed accuracy. This is what is pursued and achieved in this paper. Recently, an HSS-matrix structure [26] has also been explored for electromagnetic analysis [27]–[30]. Exact- 2 arithmetic fast factorization and inversion algorithms have Fig. 1. Illustration of an H -matrix. (a) Block cluster tree. (b) Matrix partition where admissible blocks are stored in the format of (1). been developed in [28] and [29]. In other words, the direct solution of the HSS matrix does not involve any approx- both backward and forward substitutions. The accuracy and imation. However, the HSS matrix, as a special class of the computational complexity of the proposed direct solutions H2-matrix, requires only one admissible block be formed for are analyzed in Section V. Section VI demonstrates the each node in the H2-tree. All the off-diagonal matrices in the application of this paper in solving electrodynamic VIEs. HSS matrix are represented as low-rank matrices. As a result, In Section VII, we summarize this paper. the resultant rank can be too high to be computed efficiently, especially for electrically large analysis. This is because as II. BACKGROUND OFTHISPAPER long as the sources and observers are separated, even though AnH2-matrix [1]–[4] is generally stored in a tree structure, their distance is infinitely small, their interaction must be with each node in the tree called a cluster. The number of represented by a low-rank matrix to suite the structure of an unknowns in each cluster at the leaf level is no greater than HSS matrix. From this perspective, an H2-matrix is a more- leafsize, a predefined constant. An H2-matrix is partitioned efficient structure for general applications, since the distance into multilevel admissible and inadmissible blocks based on between the sources and observers can be used to reduce the an admissibility condition. As an example, a four-level block rank required to represent a dense matrix for a prescribed cluster H2-tree is shown in Fig. 1(a), where a green link accuracy. However, the accuracy-controlled direct solution of denotes an admissible block, formed between a row cluster general H2-matrices is still lacking in the open literature. and a column cluster that satisfy the admissibility condition, The contribution of this paper is such an accuracy-controlled and a red link denotes an inadmissible block. The resultant direct solution of general H2-matrices. This solution can be H2-matrix is shown in Fig. 1(b). An admissible block in an applied to solve a general H2-matrix, be it real- or complex- H2-matrix is represented as valued, symmetric or unsymmetrical. Its complexity is O(N) Z =(V) (S ) (V )T (1) for constant-rank H2-matrices in both time and memory. For t,s t #t×k t,s k×k s #s×k rank that linearly grows with electrical size, the complexity of where V (V ) is called cluster basis of cluster t (s), S is the proposed direct solution is O(NlogN) in factorization and t s t,s called coupling matrix, # denotes the cardinality of a set, and inversion time, O(N) in solution time, and memory usage for k is rank. The cluster basis in an H2-matrix has a nested solving VIEs. As a demonstration, we show how it is used to property. This means the cluster basis for a nonleaf cluster t, solve large VIEs involving millions of unknowns on a single Vt, can be expressed by its two children’s cluster bases, core CPU, and with controlled accuracy. Vt and Vt , as the following: The rest of this paper is organized as follows. 1 2 (V ) 0 (T ) t #t ×k t k ×k In Section II, we review the mathematical background of this (Vt)#t×k = 1 1 1 1 1 (2) 0 (V ) (T ) t #t ×k t k ×k paper. In Section III, we present the proposed factorization and 2 2 2 2 2 inversion algorithms for general H2-matrices. In Section IV, where Tt and Tt are called transfer matrices. In an 2 1 2 wedescribe the proposed matrix solution algorithm, including H -matrix, the number of blocks formed by a single cluster at 2 MAANDJIAO:ACCURACYDIRECTLYCONTROLLEDFAST DIRECT SOLUTIONOF GENERAL H -MATRICES 37 each tree level is bounded by a constant, Csp. The size of an A. Computation at the Leaf Level inadmissible block is leafsize, and inadmissible blocks appear 1) For the First Leaf Cluster: We start from the leaf only at the leaf level. level (l = L). For the first cluster, whose index is denoted In the mathematical literature [7], for constant-rank cases, by i = 1, we perform three steps as follows. it is shown that the computational complexity of an Step 1: We compute the complementary cluster basis V⊥, H2-matrix is optimal (linear) in storage, MVM, and i which is orthogonal to Vi (cluster basis of cluster i). This can matrix–matrix multiplication. In [8]–[13], it is shown that be easily done by carrying out an SVD or QL factorization O(N) direct solutions can also be developed for H2-matrices on V . The cost is not high; instead, it is constant at the leaf i for constant-rank cases as well as electrically moderate cases ⊥ level. We then combine V with V to form Q matrix as the i i i whose rank can be controlled by a rank function. For following: electrically large analysis, the rank of an H2-representation ⊥ Q =[(V )#i×(#ik ) (Vi)#i×k ] (3) of IE kernels is not constant any more for achieving a i i l l prescribed accuracy. Different H2-representations also result in which k is the rank at level l (now l = L), and # denotes in different ranks and their growth rates with electrical l the number of unknowns contained in a set. size, and hence different complexities. For example, if an Step 2: Let Zcur be the current matrix that remains to be FMM-based H2-representation is used to model an IE oper- factorized. For leaf cluster i = 1, Z = Z. We compute a cur ator, the asymptotic rank would scale quadratically with the transformed matrix QHZ Q, and use it to overwrite Z ; electrical size. If an interpolation-based H2-representation is i cur i cur thus used to model an IE operator, the asymptotic rank would scale H Z =Q Z Q (4) quadratically with the electrical size in SIEs, and cubically cur i cur i with the electrical size in VIEs. Despite its full-rank model where QH denotes Q ’s complex conjugate transpose, and in SIEs, the FMM complexity is as low as O(NlogN) for i i oneMVM,becauseitscouplingmatrixisdiagonalandtransfer Q =diag{I ,I ,...,Q ,...,I } (5) matrix is sparse. On the other hand, if one uses an SVD-based i 1 2 i #{leaf clusters} minimal-rank representation, the resultant coupling matrix is is a block diagonal matrix. The ith diagonal block in Qi not diagonal. However, the asymptotic rank of such a minimal- is Qi, and other blocks are identity matrices I, the size of rank representation only scales linearly with electrical size each of which is the corresponding leaf-cluster size. The for general 3-D problems as shown by the analysis given subscripts in the right-hand side of (5) denote the indexes of in [31]. In this analysis, there are additional findings that are diagonal blocks, which are also the indexes of leaf clusters, not utilized in the FMM-based rank analysis or a source- and #{leaf clusters} stands for the number of leaf clusters. If leaf cluster i = 1 is being computed, Q appears in observer separated rank analysis, such as SVD turns to a i Fourier analysis in a linear-shift invariant system, and the the first block of (5). In addition, Qi in (4) denotes the Fourier transform of Green’s function reveals that not all the complex conjugate of Qi, as the cluster bases we construct are complex-valuedto account for general H2-matrices. In (5), plane waves are important, and only those whose wavenumber Q only consists of one Q ,wherei is the current clus- is closest to the given wavenumber k0 make the most important i i contribution. These plane waves are counted for prescribed ter being computed. This is because Qi values for other accuracy for 1-, 2-, and 3-D problems in [31]. It is found that clusters are not ready for use yet, since in the proposed the growth rate with electrical size is no greater than linear. algorithm, we will update cluster bases to take into account Regardless of which H2-matrix is generated to represent the the contribution from fill-ins. This will become clear in the integral or partial differential equation operators, the proposed sequel. direct solution is equally applicable. In this paper, when The purpose of doing this step, i.e., obtaining (4), is to we apply the proposed direct solution to solve large VIEs, introduce zeros in the matrix being factorized, since we employ a minimal-rank H2-representation of the H 0 (#ik )×k ×V = l l (6) Q i VIE operator using the technique we developed in [14]–[16]. i I k ×k l l T V ×Q =[0 I ]. (7) i k ×(#ik ) k ×k III. PROPOSED FACTORIZATIONAND i l l l l INVERSIONALGORITHMS The operation of (4) thus zeros out the first (#i kl) rows Given a general H2-matrix Z, the proposed direct solution of the admissible blocks formed by row cluster i,aswellas is a one-way tree traversal from the leaf level all the way up the first (#i k ) columns in Z , as shown by the white l cur to the root level of the H2-tree. At each level, the factorization blocks in Fig. 2. The real computation of (4) is, hence, in the proceeds from the left to the right across all the clusters. This inadmissible blocks of cluster i, as shown by the four red is very different from the algorithm of [9], [11], [14], and [15], blocks associated with cluster i = 1inFig.2. where a recursive procedure is performed. To help better Step 3: After Step 2, the first (#i kl)rowsandcolumns understand the proposed algorithm, while we present a generic of Zcur in the admissible blocks of cluster i become zero. algorithm, we will use the H2-matrix shown in Fig. 1(b), Hence, if we eliminate the first (#i kl) unknowns of cluster i as an example to illustrate the overall procedure. In Fig. 1(b), via a partial LU factorization, the admissible blocks of cluster i green blocks are admissible blocks, and red ones represent at these rows and columns would stay as zero, and thus not inadmissible blocks. affected. As a result, the resulting L and U factors, as well 38 IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, VOL. 66, NO. 1, JANUARY 2018 Fig. 2. Illustration of QHZQ . 1 1 as the fill-in blocks generated, would only involve a constant number of blocks, which is bounded by O(C2 ). sp In view of the above-mentioned property, in Step 3, we per- form a partial LU factorization to eliminate the first (#i kl) unknowns of cluster i in Zcur. Let this set of unknowns be denoted by i′. Z can be cast into the following form cur accordingly: F′ ′ Z′ Zcur = i i i c (8) Z ′ Z ci cc Fig. 3. (a) Z after partial LU of cluster 1 with fill-in blocks marked in in which F ′ ′ denotes the diagonal block of the (#i kl) cur i i blue. (b) Fill-in blocks of cluster 2 turned green after cluster basis update. unknowns, and Z ′ is the remaining rectangular block in the i c row of set i′. Here, c contains the rest of the unknowns, which do not belong to i′.TheZ ′ is the transpose block of Z ′ ,and ci i c be factorized. Thus, we let Z is the diagonal block formed by unknowns in set c.Itis cc I0 evident that Z ′ (Z ′) only consists of O(Csp) inadmissible Z = . (13) i c ci cur 0 Z blocks, as the rest of the blocks are zero. Take leaf cluster cc i = 1 as an example. It forms inadmissible blocks with clusters This matrix is shown in Fig. 3(a), where the fill-in blocks i = 2,4,6, as shown by the red blocks in Fig. 2. Hence, are added upon previous Zcur. If the fill-in block is added Z′ =[F12,F14,F16] in addition to part of F11. upon an inadmissible block, we add the fill-in directly to the i c After performing a partial LU factorization to eliminate the inadmissible block. For the example shown in Fig. 1(b), these unknowns contained in i′, we obtain blocks are the blocks marked in dark blue in Fig. 3(a) after the L′ ′ 0 1 i′ set of unknowns in cluster i = 1 is eliminated. If the fill- i i I0U′′ L′′Z′ Z = i i i i i c (9) in appears in an admissible block, we store the fill-in block cur ′ 1 Z U I 0 Z 0I ci i′i′ cc temporarily in this admissible block for future computation. where These blocks are the blocks marked in light blue in Fig. 3(a). To explain in more detail, from (11), it can be seen that the F′ ′ = L ′ ′U ′ ′ (10) i i i i i i fill-in block formed between cluster j and cluster k has the and the Schur complement is following form: ′ 1 1 ′ F =Z ′U1L1Z′ (14) Zcc = Zcc Z U′ ′L ′ ′Z (11) j,k ji ′ ′ ′ ′ i k ci i i i i i c i i i i ′ 1 1 ′ whichis also low rank in nature. The row cluster j and column in which Z U′ ′L ′ ′Z represents the fill-in blocks intro- ci i i i i i c duced due to the elimination of i′-unknowns. Because of the cluster k belong to the set of clusters that form inadmissible zeros introduced in the admissible blocks, Zi′c and Zci′ only blocks with cluster i, whose number is bounded by Csp.Ifthe consist of inadmissible blocks formed by cluster i, the number original block formed between j and k, Z , is admissible, cur,jk of which is bounded by constant Csp. Hence, the number of no computation is needed for Fj,k, i.e., we do not add it fill-in blocks introduced by this step is also bounded by a directly to the admissible block. Instead, we store it with this constant, which is O(C2 ). admissible block temporarily until cluster j is reached during sp the factorization. At that time, F will be used to compute Equation (9) can further be written in short as j,k an updated cluster basis for j. If the elimination of different l I0l clusters results in more than one fill-in block in admissible Z =L U (12) cur i i 0 Zcc Z block, the later ones are added upon existing F . cur,jk j,k in which the first matrix of (9), which is the L factor In other words, the fill-ins at the admissible Z block are cur,jk generated after a partial LU of cluster i at level l, is denoted summed up and stored in a single Fj,k. l by L . Similarly, the rightmost matrix of (9), which is the In summary, Step 3 is to perform a partial LU factorization, i U factor generated after a partial LU of cluster i at level l, as shown in (9), to eliminate the first #i kl unknowns in l is denoted by Ui. The center matrix is the one that remains to cluster i. The storage and computation cost of this step can
no reviews yet
Please Login to review.