# Talks - Edoardo Di Napoli

**Optimizing the ChASE eigensolver for Bethe-Salpeter computations**7th Workshop of the Joint Laboratory for Extreme Scale Computing.

17 July 2017.abstractwebPDFThe Chebyshev Accelerated Subspace iteration Eigensolver (ChASE) is an iterative eigensolver developed at the JSC by the SimLab Quantum Materials. The solver mainly targets sequences of dense eigenvalue problems as they arise in Density Functional Theory, but can also work on the single eigenproblem. ChASE leverages on the predominant use of BLAS 3 subroutines to achieve close-to-peak performance and potentially achieve scalability over hundreds if not thousands of computing nodes. We have recently succeeded to integrate a version of the ChASE library within the Jena BSE code. Preliminary comparison between ChASE and the conjugate gradient eigensolver (KSCG), previously used by the Jena BSE code, shows that ChASE can outperform KSCG with speedups up to 5X. In this talk we illustrate our latest results and give an outlook of the scientific problems that can be tackled once the integration is successfully completed.**Linear algebra tasks in Materials Science: optimization and portability**Accelerated Data and Computing Workshop.

July 2017.**When 1+1 > 2: The Power of Interdisciplinary Research**Opening Workshop of the SimLab Quantum Materials.

Juelich, March 2017.**Towards Automated Load Balancing via Spectrum Slicing for FEAST-like Solvers**6th Workshop of the Joint Laboratory for Extreme Scale Computing.

30 November 2016.**Optimizing Least-Squares Rational Filters for Solving Interior Eigenvalue Problems**International Workshop on Parallel Matrix Algorithms and Applications.

Bordeaux, France, 6 July 2016.**Enabling Large Scale LAPW DFT Calculations by a Scalable Iterative Eigensolver**The 2015 SIAM Conference on Computational Science & Engineering.

February 2015.abstractwebPDFIn LAPW-based methods a sequence of dense generalized eigenvalue problems appears. Traditionally these problems were solved using direct eigensolvers from standard libraries like ScaLAPACK. We developed a subspace iteration method pre-conditioned with Chebyshev polynomials of optimal degree (ChASE). This algorithm is consistently competitive with direct eigensolvers and greatly enhance performance and scalability. ChASE is included in the FLEUR software and improves its scaling behaviour for calculations of large physical systems on modern supercomputers.**A knowledge-based approach to high-performance computing in ab initio simulations**AICES Advisory Board.

14 July 2014.**An optimized subspace iteration eigensolver applied to sequences of dense eigenproblems in ab initio simulations**8th International Workshop on Parallel Matrix Algorithms and Applications (PMAA14).

Lugano, Switzerland, 4 July 2014.abstractPDFSequences of eigenvalue problems consistently appear in a large class of ap- plications based on the iterative solution of a non-linear eigenvalue problem. A typical example is given by the chemistry and materials science ab initio sim- ulations relying on computational methods developed within the framework of Density Functional Theory (DFT). DFT provides the means to solve a high- dimensional quantum mechanical problem by representing it as a non-linear gen- eralized eigenvalue problem which is solved self-consistently through a series of successive outer-iteration cycles. As a consequence each self-consistent simulation is made of several sequences of generalized eigenproblems P : Ax = λBx. Each sequence, P (1) , . . . P (l) . . . P (N ) , groups together eigenproblems with increasing outer-iteration index l. In more general terms a set of eigenvalue problems {P(1),...P(N)} is said to be a “sequence” if the solution of the l-th eigenproblem determines, in an application-specific manner, the initialization of the (l + 1)-th eigenproblem. For instance at the beginning of each DFT cycle an initial function ρ(l)(r) determines the initialization of the l-th eigenproblem. A large fraction of P (l) eigenpairs are then use to compute a new ρ(l+1)(r) which, in turn, leads to the initialization of a new eigenvalue problem P(l+1). In addition to be part of a sequence, successive eigenproblems might possess a certain degree of correlation connecting their re- spective eigenpairs. In DFT sequences, correlation becomes manifest in the way eigenvectors of successive eigenproblems become progressively more collinear to each other as the l-index increases. We developed a subspace iteration method (ChFSI) specifically tailored for sequences of eigenproblems whose correlation appears in the form of increasingly collinear eigenvectors. Our strategy is to take the maximal advantage possible from the information that the solution of the P(l) eigenproblem is providing when solving for the successive P(l+1) problem. As a consequence the subspace iteration was augmented with a Chebyshev polynomial filter whose degree gets dynamically optimized so as to minimize the number of matrix-vector multipli- cations. The effectiveness of the Chebyshev filter is substantially increased when (l) (l) inputed the approximate eigenvectors {x1 , . . . xnev }, as well as very reliable esti- (l) (l) mates, namely [λ1 , λnev], for the limits of the eigenspectrum interval [a, b] to be filtered in. In addition the degree of the polynomial filter is adjusted so as to be minimal with respect to the required tolerance for the eigenpairs residual. This result is achieved by exploiting the dependence each eigenpair residual have with respect to its convergence ratio as determined by the rescaled Chebyshev poly- nomial and its degree. The solver is complemented with an efficient mechanism which locks and deflates the converged eigenpairs. The resulting eigensolver was implemented in C language and parallelized for both shared and distributed architectures. Numerical tests show that, when the eigensolver is inputed approximate solutions instead of random vectors, it achieves up to a 5X speedup. Moreover ChFSI takes great advantage of compu- tational resources by scaling over a large range of cores commensurate with the size of the eigenproblems. Specifically, by making better use of massively parallel architectures, the distributed memory version will allow DFT users to simulate physical systems quite larger than are currently accessible.**An Optimized and Scalable Iterative Eigensolver for Sequences of Dense Eigenvalue Problems**13th Copper Mountain Conference on Iterative Methods.

Copper Mountain, Colorado, USA., 7 April 2014.abstractPDFSequences of eigenvalue problems consistently appear in a large class of applications based on the iterative solution of a non-linear eigenvalue problem. A typical example is given by the chemistry and materials science ab initio simulations relying on computational methods developed within the framework of Density Functional Theory (DFT). DFT provides the means to solve a high- dimensional quantum mechanical problem by representing it as a non-linear generalized eigenvalue problem which is solved self-consistently through a series of successive outer-iteration cycles. As a consequence each self-consistent simulation is made of several sequences of generalized eigenproblems P : A x = \lambda B x .....**Numerical methods for the eigenvalue problem in electronic structure computations.**45th IFF Spring School. Computing Solids: Models, ab initio methods and supercomputing.

Forschungszentrum Juelich, 14 March 2014.**A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW**10th INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2013).

September 2013.abstractPDFn one of the most important methods in Density Functional Theory - the Full-Potential Linearized Augmented Plane Wave (FLAPW) method – dense generalized eigenproblems are organized in long sequences. Moreover each eigenproblem is strongly correlated to the next one in the sequence. We propose a novel approach which exploits such correlation through the use of an eigensolver based on subspace iteration and accelerated with Chebyshev polynomials. The resulting solver, parallelized using the Elemental library framework, achieves excellent scalability and is competitive with current dense parallel eigensolvers.**Preconditioning Chebyshev subspace iteration applied to sequences of dense eigenproblems in ab initio simulations**Numerical Analysis and Scientific Computation with Applications (NASCA13) conference in Calais, France.

June 2013.abstractwebPDFIn many material science applications simulations are made of dozens of se- quences, where each sequence groups together eigenproblems with increasing self- consistent cycle outer-iteration index. Successive eigenproblems in a sequence possess a high degree of correlation. In particular it has been demonstrated that eigenvectors of adjacent eigenproblems become progressively more collinear to each other as the outer-iteration index increases. This result suggests one could use eigenvectors, computed at a certain outer-iteration, as approximate solutions to improve the performance of the eigensolver at the next one. In order to opti- mally exploit the approximate solution, we developed a block iterative eigensolver augmented with a Chebyshev polynomial accelerator (BChFSI). Numerical tests show that, when the sequential version of the solver is fed approximate solutions instead of random vectors, it achieves up to a 5X speedup. Moreover the parallel shared memory implementation of the algorithm obtains a high level of efficiency up to 80 % of the theoretical peak performance. Despite the eigenproblems in the sequence being relatively large and dense, the parallel BChFSI fed with ap- proximate solutions performs substantially better than the corresponding direct eigensolver, even for a significant portion of the sought-after spectrum.**Improving the performance of applied science numerical simulations: an application to Density Functional Theory.**March 2013.

Invited Talk at Columbia University, New York, USA..abstractPDFIn the early days of numerical simulations, advances were based on the ingenuity of pioneer scientists writing codes for relatively simple machines. Nowadays the investigation of large physical systems requires scaling simulations up to massively parallel computers whose optimal usage can often be challenging. On the one hand the algorithmic structure of many legacy codes can be a limiting factor to their portability on large supercomputers. More importantly in many cases algorithmic libraries are used as black boxes and no information coming from the physics of the specific application is exploited to improve the overall performance of the simulation. What is needed is a more interdisciplinary approach where the tools of scientific computing and knowledge extracted from the specific application are merged together in a new computational paradigm. One of the most promising new paradigms borrows from the "inverse problem" concept and, by reversing the logical arrow going from mathematical modeling to numerical simulations, extracts from the latter specific information that can be used to modify the algorithm. The resulting methodology, named "reverse simulation", produces an algorithm variant specifically tailored to the scientific application. Additionally such a variant can be optimally implemented for multiple parallel computing architectures. To demonstrate its applicability I will exemplify the workings of reverse simulation on a computational method widely used in the framework of Density Functional Theory (DFT): the Full-potential Linearized Augmented Plane Wave (FLAPW) method. FLAPW provides the means to solve a high-dimensional quantum mechanical problem by representing it as a non-linear generalized eigenvalue problem which is solved self-consistently through a series of successive outer-iteration cycles. By applying the principles of reverse simulation it can be shown that eigenvectors of successive eigenproblems become progressively more collinear to each other as the outer-iteration index increases. This result suggests that one could use eigenvectors, computed at a certain outer-iteration, as approximate solutions to improve the performance of the eigensolver at the next iteration. In order to maximally exploit the approximate solution, we developed a subspace iteration method augmented with an optimized Chebyshev polynomial accelerator together with an efficient locking mechanism (ChFSI). The resulting eigensolver was implemented in C language and can be parallelized for both shared and distributed architectures. Numerical tests show that, when the eigensolver is preconditioned with approximate solutions instead of random vectors, it achieves up to a 5X speedup. Moreover ChFSI takes great advantage of computational resources by obtaining levels of efficiency up to 80% of the theoretical peak performance. In particular, by making better use of massively parallel architectures, the distributed memory version will allow users of the FLAPW method to simulate larger physical systems than are currently accessible.**Parallel block Chebyshev subspace iteration algorithm optimized for sequences of correlated dense eigenproblems**5th International Conference of the ERCIM Working Group.

December 2012.abstractPDF**Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems**Parallel Matrix Algorithms and Applications 2012.

June 2012.abstractPDFSimulations in Density Functional Theory are made of dozens of sequences, where each sequence groups together eigenproblems with increasing self-consistent cycle iteration index. In a recent study, it has been shown a high degree of cor- relation between successive eigenproblems of each sequence. In particular, by tracking the evolution over iterations of the angles between eigenvectors of suc- cessive eigenproblems, it was shown that eigenvectors are almost collinear after the first few iterations. This result suggests we could use eigenvectors, com- puted at a certain iteration, as approximate solutions for the problem at the successive one. The key element is to exploit the collinearity between vectors of adjacent problems in order to improve the performance of the eigensolver. In this study we provide numerical examples where opportunely selected block iterative eigensolvers benefit from the re-use of eigenvectors when applied to sequences of eigenproblems extracted from simulations of real-world materials. In our inves- tigation we vary several parameters in order to address how the solvers behave under different conditions. In most cases our study shows that, when the solvers are fed approximated eigenvectors as opposed to random vectors, they obtain a substantial speed-up and could become a valid alternative to direct methods.**Eigenproblems and Eigensolvers in Density Functional Theory**May 2012.

Invited talk at the Argonne Leadership Computing Facility.abstractPDFIn DFT based simulations each SCF cycle comprises dozens of large generalized eigenproblems. In a recent study, it has been proposed to consider simulations as made of dozens of sequences of eigenvalue problems, where each sequence groups together eigenproblems with equal k-vector and increasing iteration index i. It was then demonstrated that successive eigenproblems in a sequence are strongly correlated to one another. In particular, by tracking the evolution over iterations of the angle between eigenvectors of adjacent iterations, it was shown the angles decrease noticeably after the first few iterations and become close to collinear. This last result suggests we could use the eigenvectors solution of a problem in a sequence as an educated guess for the eigenvectors of the successive problem. In this talk we present preliminary results that would eventually open the way to a widespread use of iterative solvers in ab initio electronic structure codes. We provide numerical examples where opportunely selected iterative solvers benefit from the reuse of eigenvectors when applied to sequences of eigenproblems extracted from simulations of real-world materials.**Block Iterative Solvers and Sequences of Eigenproblems arising in Electronic Structure Calculations**Twelve Copper Mountain Conference on Iterative Methods, March 25-30, Copper Mountain, Colorado, USA.

April 2012.

Twelve Copper Mountain Conference on Iterative Methods, March 25-30, Copper Mountain, Colorado, USA.abstractwebPDFResearch in several branches of chemistry and materials science relies on large ab initio numerical simulations. The majority of these simulations are based on computational methods developed within the framework of Density Functional Theory (DFT). DFT provides the means to solve a high-dimensional quantum mechanical problem by transforming it into a large set of coupled one-dimensional equations, which is ultimately represented as a non-linear generalized eigenvalue problem. The latter is solved self-consistently through a series of successive iteration cycles: the solution computed at the end of one cycle is used to generate the input in the next until the distance between two successive solutions is negligible. Typically a simulations requires tens of cycles before reaching convergence. After the discretization -- intended in the general sense of reducing a continuous problem to one with a finite number of unknowns -- each cycle comprises dozens of large generalized eigenproblems $P^{(i)}_{\bf k}: A^{(i)}_{\bf k} x - \lambda B^{(i)}_{\bf k} x$ where $A$ is hermitian and $B$ hermitian positive definite. Within every cycle the eigenproblems are parametrized by the reciprocal lattice vector ${\bf k}$ while the iteration index $i$ denotes the iteration cycle. The size of each problem ranges from 10 to 40 thousand and the interest lays in the eigenpairs corresponding to the lower 10-20\% part of the spectrum. Due to the dense nature of the eigenproblems and the large portion of the spectrum requested, iterative solvers are generally not competitive; as a consequence, current simulation codes uniquely use direct methods. Generally, the eigenproblem $P^{(i+1)}_{\bf k}$ is solved in complete isolation from all $P^{(i)}$s. This is surprising in light of the fact that each $P^{(i+1)}_{\bf k}$ is constructed by manipulating the solutions of all the problems at iteration $i$. In a recent study, the author harnessed this last observation by considering every simulation as made of dozens of sequences $\{P_{\bf k}\}$, where each sequence groups together eigenproblems with equal {\bf k}-vector and increasing iteration index $i$. It was then demonstrated that successive eigenproblems in a sequence are strongly correlated to one another. In particular, by tracking the evolution over iterations of the angle between eigenvectors $x^{(i)}$ and $x^{(i+1)}$, it was shown the angles decrease noticeably after the first few iterations. Even though the overall simulation has not yet converged, the eigenvectors of $P^{(i)}_{\bf k}$ and $P^{(i+1)}_{\bf k}$ are close to collinear. This result suggests we could use the eigenvectors of $P^{(i)}$ as an educated guess for the eigenvectors of the successive problem $P^{(i+1)}$. The key element is to exploit the collinearity between vectors of adjacent problems in order to improve the performance of the solver. While implementing this strategy naturally leads to the use of iterative solvers, not all the methods are well suited to the task at hand. In light of these considerations, exploiting the evolution of eigenvectors depends on two distinct key properties of the iterative method of choice: 1) the ability to receive as input a sizable set of approximate eigenvectors and exploit them to efficiently compute the required eigenpairs, and 2) the capacity to solve simultaneously for a substantial portion of eigenpairs. The first property implies the solver achieves a moderate-to-substantial reduction in the number of matrix-vector operations as well as an improved convergence. The second characteristic results in a solver capable of filtering away as efficiently as possible the unwanted components of the approximate eigenvectors. In accord to these requirements, the class of block iterative methods constitutes the natural choice in order to satisfy property 1); by accepting a variable set of multiple starting vectors, these methods have a faster convergence rate and avoid stalling when facing small clusters of eigenvalues. When block methods are augmented with polynomial accelerators they also satisfy property 2) and their performance is further improved. In this study we present preliminary results that would eventually open the way to a widespread use of iterative solvers in ab initio electronic structure codes. We provide numerical examples where opportunely selected iterative solvers benefit from the re-use of eigenvectors when applied to sequences of eigenproblems extracted from simulations of real-world materials. At first we experimented with a block method (block Krylov-Schur) satisfying only property 1). At a later stage we selected a solver satisfying both properties (block Chebyshev-Davidson) and performed similar experiments. In our investigation we vary several parameters in order to address how the solvers behave under different conditions. We selected different size for the sequence of eigenproblems as well as an increasing number of eigenpairs to be computed. We also vary the block size in relation with the fraction of eigenpairs sought after and analyze its influence on the performance of the solver. In most cases our study shows that, when the solvers are fed approximated eigenvectors as opposed to random vectors, they obtain a substantial speed-up. The increase in performance changes with the variation of the parameters but strongly indicates that iterative solvers could become a valid alternative to direct methods. The pivotal element allowing the achievement of such a result resides in the transposition of the ``matrix'' of eigenproblems emerging from electronic structure calculations; while the computational physicist solves many rows of {\bf k}-eigenproblems (one for each cycle) we look at the entire computation as made of {\bf k}-sequences. This shift in focus enables one to harness the inherent essence of the iterative cycles and translate it to an efficient use of opportunely tailored iterative solvers. In the long term such a result will positively impact a large community of computational scientists working on DFT-based methods.**Quantum Theory of Materials - Introduction to Density Functional Theory and its Computational Challenges**December 2011.

Class series given during the 11th EU Regional school, AICES, Aachen, Germany.abstractPDFDensity Functional Theory (DFT) is one of the most used ab initio theoretical frameworks in materials science. DFT-based methods are growing as the standard tools for simulating new materials. Simulations aim at recovering and predicting physical properties (electronic structure, total energy differences, magnetic properties, etc.) of large molecules as well as systems made of many hundreds of atoms. DFT reaches this result by solving self-consistently a rather complex set of quantum mechanical equations leading to the computation of the one-particle density n(r), from which physical properties are derived. In order to preserve self-consistency, numerical implementations of DFT methods consist of a series of iterative cycles; at the end of each cycle a new density is computed and compared to the one calculated in the previous cycle. The end result is a series of successive densities converging to a n(r) approximating the exact density within the desired level of accuracy. The course is divided in two parts. The first part is concerned with theoretical and conceptual foundations of DFT: we will introduce basic concepts of many-body quantum mechanics, proceed to illustrate the fundamental building blocks of DFT, and finally present a broad overview of the three most used ab initio methods. In the second part we will focus on one specific method, FLAPW, and analyze its computational aspects in details; the material will be presented paying special attention on the interrelation between the physics and the numerics of the problem. In order to facilitate the exposition, numerous examples will be presented and discussed in class. A basic knowledge of quantum mechanics concepts is assumed.**Estimating the number of eigenvalues in a interval using the eigenproblem resolvent**July 2011.

25th Umbrella Symposium, Aachen, Germany.abstractwebSymmetric generalized eigenvalue problems arise in many applications in chemistry physics and engineering. In several cases only a small fraction of the eigenpairs, usually located in a interval at one of the extremities of the spectrum, is required. Obtaining previous knowledge of the number of eigenvalues within the interval boudaries is often beneficial. For instance, for those iterative methods where a projector is used in conjunction with a Rayleigh-Ritz quotient this is an essential ingredient for reducing the number of iterations and increasing the accuracy. We present a potentially inexpensive technique to estimate the number of eigenvalues in a generic interval based on numerical manipulations of the eigenproblem resolvent.**Sequences of Generalized Eigenproblems in DFT**June 2011.

10th IMACS International Symposium on Iterative Methods in Scientific Computing, Marrakech, Morocco.abstractPDFResearch in several branches of chemistry and material science relies on large numerical simulations. Many of these simulations are based on Density Functional Theory (DFT) models that lead to sequences of generalized eigenproblems \seq. Every simulation normally requires the solution of hundreds of sequences, each comprising dozens of large and dense eigenproblems; in addition, the problems at iteration $i+1$ of \seq are constructed manipulating the solution of the problems at iteration $i$. The size of each problem ranges from 10 to 40 thousand and the interest lays in the eigenpairs corresponding to the lower 10-30\% part of the spectrum. Due to the dense nature of the eigenproblems and the large portion of the spectrum requested, iterative solvers are not competitive; as a consequence, current simulation codes uniquely use direct methods. In this talk we present a study that highlights how eigenproblems in successive iterations are strongly correlated to one another. In order to understand this result, we need to stress the importance of the basis wave functions, which constitute the building blocks of any DFT scheme. Indeed, the matrix entries of each problem in \seq are calculated through superposition integrals of a set of basis wave functions. Moreover, the state wave functions---describing the quantum states of the material---are linear combinations of basis wave functions with coefficients given by the eigenvectors of the problem. Since a new set of basis wave functions is determined at each iteration of the simulation, the eigenvectors between adjacent iterations are only loosely linked with one another. In light of these considerations it is surprising to find such a deep correlation between the eigenvectors of successive problems. We set up a mechanism to track the evolution over iterations $i=1,\dots,n$ of the angle between eigenvectors $x^{(i)}$ and $x^{(i+1)}$ corresponding to the $j^{th}$ eigenvalue. In all cases the angles decrease noticeably after the first few iterations and become almost negligible, even though the overall simulation is not close to convergence. Even the state of the art direct eigensolvers cannot exploit this behavior in the solutions. In contrast, we propose a 2-step approach in which the use of direct methods is limited to the first few iterations, while iterative methods are employed for the rest of the sequence. The choice of the iterative solver is dictated by the large number of eigenpairs required in the simulation. For this reason we envision the Subspace Iterations Method---despite its slow convergence rate---to be the method of choice. Nested at the core of the method lays an inner loop $V \leftarrow A V$; due to the observed correlation between eigenvectors, the convergence is reached in a limited number of steps. In summary, we propose evidence in favor of a mixed solver in which direct and iterative methods are combined together.**Matrix Structure Exploitation in Generalized Eigenproblems Arising in Density Functional Theory**October 2010.

8th International Conference on Numerical Analysis and Applied Mathematics (ICNAAM 2010), Rhodos Island, Greece.abstractIn this short paper, the authors report a new computational approach in the context of Density Functional Theory (DFT). It is shown how it is possible to speed up the self-consistent cycle (iteration) characterizing one of the most well-known DFT implementations: FLAPW. Generating the Hamiltonian and overlap matrices and solving the associated generalized eigenproblems $Ax = \lambda Bx$ constitute the two most time-consuming fractions of each iteration. Two promising directions, implementing the new methodology, are presented that will ultimately improve the performance of the generalized eigensolver and save computational time.**Improving the performance of Density Functional Theory self-consistent cycle**August 2010.

Invited colloquium given at the RWTH Physics Department, Aachen, Germany.abstractDensity Functional Theory (DFT) is one of the most effective frameworks for studying large quantum mechanical systems. DFT-based methods are growing as the standard tools for designing and testing new materials. The core of DFT relies on a self-consistent scheme formed by a sequence of outer iterations (cycles), each one containing multiple large dense generalized eigenpencils. At each step, the configuration of the system is determined by the solution of all the eigenproblems from the previous iteration. In this talk we will show a preliminary study aimed at improving the performance of the overall DFT scheme. Contrary to the current trend we plan to decrease the accuracy of each outer-iteration in order to speed up its execution. The direct consequence will be to have a higher number of faster iterations with an overall gain in execution time towards convergence. Success is guaranteed by acting on the most expensive operations of the self-consistent cycle: namely matrix generation and eigenproblem solution.