We present a new fast algorithm for computing the Boys function using a nonlinear approximation of the integrand via exponentials. The resulting algorithms evaluate the Boys function with real and complex valued arguments and are competitive with previously developed algorithms for the same purpose.
I. INTRODUCTION
The Boys function1
appears in problems of computing Gaussian integrals, and over the years, there were many algorithms proposed for its evaluation; see, e.g., Refs. 2–10. The Boys function is related to a number of special functions, for example, the error function and the incomplete Gamma function, and (for a pure imaginary argument) to the Fresnel integrals.
It is common (see, e.g., Refs. 5 and 7) to use recursion to compute the Boys function for different n. The recursion is obtained via integration by parts,
and can be run starting with n = 1 so that we need to have the value or starting from a large n = nmax and going to n = 1,
so that we need to have the value . In order to avoid a loss of accuracy, the choice of which recursion to use depends on the size z and nmax. Iterating recursion (2), the dominant term expressing via is . We set
and choose (2) when and (3) if . For example, if nmax = 18, then z* ≈ 6.75. We note that other choices of the parameter z* are possible.
At each step, recursions (2) and (3) require only three multiplications and one addition (since the coefficients can be computed in advance and stored), so it is hard to obtain a more efficient alternative if one needs to compute these functions for a range of n, 1 ≤ n ≤ nmax. In order to initialize these recursions, we need fast algorithms for computing and . Computing for real z is straightforward since
For a real argument, an optimized implementation of the error function Erf is available within programming languages. For a complex argument, we present an algorithm for computing using a nonlinear approximation of the integrand following the approach in Ref. 11. We obtain a rational approximation of with an additional exponential factor.
We note that, as a function of complex argument, the Boys function can be highly oscillatory. In particular, if z is purely imaginary, then the Boys function is related to the Fresnel integrals,
so that
For computing , instead of tabulating this function as it is performed for a real argument in, e.g., Refs. 2, 5, 7, and 9, we use a nonlinear approximation of the integrand in (1) (see Ref. 11), leading to an approximation of the Boys function valid for the complex argument with tight error estimates. For , we compute instead of . Based on these approximations, we develop two algorithms, for real and complex valued arguments. We refer to Refs. 3, 4, and 8 for previously developed algorithms for the Boys function with the complex argument. The complex argument appears in a number of problems, for example, in calculations with mixed Gaussian/plane wave bases in molecules and scattering problems,12–16 in the context of complex scaling calculations of excited states,17 and in using gauge invariant basis functions for calculating magnetic properties.18
II. APPROXIMATION OF FOR COMPLEX VALUED ARGUMENT
We have
and use the integral
to obtain
where
is an analytic function. An algorithm for computing is essentially a quadrature for the integral in (8). Note that if, instead, we were to use a quadrature to compute via integrals in (6), then, for each z, we would need to evaluate as many exponentials as the number of quadrature terms. Importantly, when using (8), we need to evaluate e−z only once and then use the result as a factor.
A. The case .
We split integral (8) into three terms
and observe that the last term in (9) (without the factor e−z) is estimated as
Select tmax = e7/4 to obtain . For the first term in (9), we have
where M = 22 and nodes and weights are given in Table I. We note that it is possible to use the standard Gauss–Legendre quadrature on the interval , but the number of terms, M, will be larger. As a result, we obtain approximation
m . | ηm . | . | m . | ηm . | . |
---|---|---|---|---|---|
1 | 0.147 787 826 379 695 65 × 10−02 | 0.866 431 027 201 416 54 × 10−01 | 12 | 0.125 395 022 879 192 93 × 10+01 | 0.574 448 042 214 302 23 × 10−01 |
2 | 0.133 172 764 137 258 17 × 10−01 | 0.857 720 608 434 394 68 × 10−01 | 13 | 0.172 446 342 335 733 95 × 10+01 | 0.420 819 943 469 454 42 × 10−01 |
3 | 0.370 635 914 520 525 41 × 10−01 | 0.839 350 436 829 178 76 × 10−01 | 14 | 0.237 152 482 627 818 63 × 10+01 | 0.258 385 394 482 232 72 × 10−01 |
4 | 0.727 525 124 228 827 62 × 10−01 | 0.809 661 970 413 229 21 × 10−01 | 15 | 0.326 137 969 960 783 55 × 10+01 | 0.124 450 241 572 555 60 × 10−01 |
5 | 0.120 236 941 228 785 68 × 10+00 | 0.769 089 548 492 978 56 × 10−01 | 16 | 0.448 513 016 905 959 11 × 10+01 | 0.429 254 159 259 983 68 × 10−02 |
6 | 0.179 574 293 958 937 73 × 10+00 | 0.731 552 078 711 821 68 × 10−01 | 17 | 0.616 806 213 512 248 38 × 10+01 | 0.935 434 298 773 596 86 × 10−03 |
7 | 0.253 534 046 984 087 27 × 10+00 | 0.726 950 035 163 157 20 × 10−01 | 18 | 0.848 247 187 231 786 981 × 10+01 | 0.108 408 854 665 025 05 × 10−03 |
8 | 0.350 388 652 780 721 95 × 10+00 | 0.752 842 556 089 304 00 × 10−01 | 19 | 0.116 653 054 862 967 931 × 10+02 | 0.527 186 796 676 167 36 × 10−05 |
9 | 0.482 109 575 931 276 68 × 10+00 | 0.770 943 953 645 196 33 × 10−01 | 20 | 0.160 424 171 322 883 281 × 10+02 | 0.776 597 403 975 041 90 × 10−07 |
10 | 0.663 028 993 158 374 16 × 10+00 | 0.754 250 625 677 530 40 × 10−01 | 21 | 0.220 619 295 181 470 89 × 10+02 | 0.221 381 724 226 800 93 × 10−09 |
11 | 0.911 814 736 856 590 87 × 10+00 | 0.689 686 192 650 315 33 × 10−01 | 22 | 0.303 401 120 947 083 07 × 10+02 | 0.659 416 176 003 770 69 × 10−13 |
m . | ηm . | . | m . | ηm . | . |
---|---|---|---|---|---|
1 | 0.147 787 826 379 695 65 × 10−02 | 0.866 431 027 201 416 54 × 10−01 | 12 | 0.125 395 022 879 192 93 × 10+01 | 0.574 448 042 214 302 23 × 10−01 |
2 | 0.133 172 764 137 258 17 × 10−01 | 0.857 720 608 434 394 68 × 10−01 | 13 | 0.172 446 342 335 733 95 × 10+01 | 0.420 819 943 469 454 42 × 10−01 |
3 | 0.370 635 914 520 525 41 × 10−01 | 0.839 350 436 829 178 76 × 10−01 | 14 | 0.237 152 482 627 818 63 × 10+01 | 0.258 385 394 482 232 72 × 10−01 |
4 | 0.727 525 124 228 827 62 × 10−01 | 0.809 661 970 413 229 21 × 10−01 | 15 | 0.326 137 969 960 783 55 × 10+01 | 0.124 450 241 572 555 60 × 10−01 |
5 | 0.120 236 941 228 785 68 × 10+00 | 0.769 089 548 492 978 56 × 10−01 | 16 | 0.448 513 016 905 959 11 × 10+01 | 0.429 254 159 259 983 68 × 10−02 |
6 | 0.179 574 293 958 937 73 × 10+00 | 0.731 552 078 711 821 68 × 10−01 | 17 | 0.616 806 213 512 248 38 × 10+01 | 0.935 434 298 773 596 86 × 10−03 |
7 | 0.253 534 046 984 087 27 × 10+00 | 0.726 950 035 163 157 20 × 10−01 | 18 | 0.848 247 187 231 786 981 × 10+01 | 0.108 408 854 665 025 05 × 10−03 |
8 | 0.350 388 652 780 721 95 × 10+00 | 0.752 842 556 089 304 00 × 10−01 | 19 | 0.116 653 054 862 967 931 × 10+02 | 0.527 186 796 676 167 36 × 10−05 |
9 | 0.482 109 575 931 276 68 × 10+00 | 0.770 943 953 645 196 33 × 10−01 | 20 | 0.160 424 171 322 883 281 × 10+02 | 0.776 597 403 975 041 90 × 10−07 |
10 | 0.663 028 993 158 374 16 × 10+00 | 0.754 250 625 677 530 40 × 10−01 | 21 | 0.220 619 295 181 470 89 × 10+02 | 0.221 381 724 226 800 93 × 10−09 |
11 | 0.911 814 736 856 590 87 × 10+00 | 0.689 686 192 650 315 33 × 10−01 | 22 | 0.303 401 120 947 083 07 × 10+02 | 0.659 416 176 003 770 69 × 10−13 |
B. The case
In this case, we compute rather than . Since the denominator in (8) can be zero, we cannot separate terms in as in (9). Instead, we split integral (8) into two terms and obtain
The first term in (13) is approximated by using the Gauss–Legendre quadrature on the interval . The function q is analytic, and therefore, there is no singularity at t2 = −z. Since we can compute derivatives of q, the error introduced by this quadrature can be estimated using results in Sec. 5.2 of Ref. 19. For example, we obtain
with Mg = 16 and ɛg ≈ 10−14, where are the standard Gauss–Legendre nodes and weights on the interval . In the second term in (13), we drop (since its contribution is less than ) and obtain
While we obtain an explicit expression, computing arctangent of a complex argument is relatively expensive. For a complex argument, we evaluate arctangent using
As a result of dropping in the second term of (13), our approximation in (14) has a singularity at . In order to avoid using (14) in the vicinity of singularity, we use two different parameters tmax and tmax,1 and switch to the version with tmax,1 if , where .
We note that it is possible to increase the number of terms in the quadrature in order to avoid evaluating arctangent. This might be of interest on a parallel [Graphics Processing Unit (GPU) or multi-core] computer since computation of quadrature terms is trivially parallel. As a result, we obtain approximation
where . For , we have a converging series for the second integral in (9) as follows:
so that we can use
instead of (12) and
instead of (15). Since the parameter tmax is fixed, the coefficients of the series are computed offline.
m . | ηm . | wm . |
---|---|---|
1 | 0.707 194 313 205 700 10 · 101 + 0.164 872 912 507 521 15 · 102i | 0.364 436 324 028 985 01 · 10−10 + 0.264 117 510 721 075 04 · 10−10i |
2 | 0.707 194 313 205 700 10 · 101 − 0.164 872 912 507 521 15 · 102i | 0.364 436 324 028 985 01 · 10−10 − 0.264 117 510 721 075 04 · 10−10i |
3 | −0.571 432 717 151 916 35 + 0.132 785 794 532 336 33 · 102i | 0.181 852 503 467 536 33 · 10−6 − 0.218 604 589 713 993 52 · 10−5i |
4 | −0.571 432 717 151 916 35 − 0.132 785 794 532 336 33 · 102i | 0.181 852 503 467 536 33 · 10−6 + 0.218 604 589 713 993 52 · 10−5i |
5 | −0.471 930 213 303 925 06 · 101 + 0.998 352 571 123 710 32 · 101i | −0.994 891 692 720 557 48 · 10−3 − 0.230 490 791 052 030 73 · 10−3i |
6 | −0.471 930 213 303 925 06 · 101 − 0.998 352 571 123 710 32 · 101i | −0.994 891 692 720 557 48 · 10−3 + 0.230 490 791 052 030 73 · 10−3i |
7 | −0.717 046 627 728 950 89 · 101 + 0.667 123 608 398 207 68 · 101i | −0.256 252 169 858 790 06 · 10−1 + 0.358 183 352 748 769 82 · 10−1i |
8 | −0.717 046 627 728 950 89 · 101 − 0.667 123 608 398 207 68 · 101i | −0.256 252 169 858 790 06 · 10−1 − 0.358 183 352 748 769 82 · 10−1i |
9 | −0.848 997 470 547 246 99 · 101 + 0.334 348 041 684 674 91 · 101i | 0.165 068 015 448 807 23 + 0.322 739 644 717 760 45i |
10 | −0.848 997 470 547 246 99 · 101 − 0.334 348 041 684 674 91 · 101i | 0.165 068 015 448 807 23 − 0.32 273 964 471 776 045i |
11 | 0.365 644 143 631 509 73 · 102 | −0.201 046 416 615 651 64 · 10−25 |
12 | −0.324 242 392 559 219 54 · 101 | −0.395 635 369 550 420 78 · 10−3 |
13 | −0.890 660 477 331 007 53 · 101 | 0.723 499 458 050 852 92 |
m . | ηm . | wm . |
---|---|---|
1 | 0.707 194 313 205 700 10 · 101 + 0.164 872 912 507 521 15 · 102i | 0.364 436 324 028 985 01 · 10−10 + 0.264 117 510 721 075 04 · 10−10i |
2 | 0.707 194 313 205 700 10 · 101 − 0.164 872 912 507 521 15 · 102i | 0.364 436 324 028 985 01 · 10−10 − 0.264 117 510 721 075 04 · 10−10i |
3 | −0.571 432 717 151 916 35 + 0.132 785 794 532 336 33 · 102i | 0.181 852 503 467 536 33 · 10−6 − 0.218 604 589 713 993 52 · 10−5i |
4 | −0.571 432 717 151 916 35 − 0.132 785 794 532 336 33 · 102i | 0.181 852 503 467 536 33 · 10−6 + 0.218 604 589 713 993 52 · 10−5i |
5 | −0.471 930 213 303 925 06 · 101 + 0.998 352 571 123 710 32 · 101i | −0.994 891 692 720 557 48 · 10−3 − 0.230 490 791 052 030 73 · 10−3i |
6 | −0.471 930 213 303 925 06 · 101 − 0.998 352 571 123 710 32 · 101i | −0.994 891 692 720 557 48 · 10−3 + 0.230 490 791 052 030 73 · 10−3i |
7 | −0.717 046 627 728 950 89 · 101 + 0.667 123 608 398 207 68 · 101i | −0.256 252 169 858 790 06 · 10−1 + 0.358 183 352 748 769 82 · 10−1i |
8 | −0.717 046 627 728 950 89 · 101 − 0.667 123 608 398 207 68 · 101i | −0.256 252 169 858 790 06 · 10−1 − 0.358 183 352 748 769 82 · 10−1i |
9 | −0.848 997 470 547 246 99 · 101 + 0.334 348 041 684 674 91 · 101i | 0.165 068 015 448 807 23 + 0.322 739 644 717 760 45i |
10 | −0.848 997 470 547 246 99 · 101 − 0.334 348 041 684 674 91 · 101i | 0.165 068 015 448 807 23 − 0.32 273 964 471 776 045i |
11 | 0.365 644 143 631 509 73 · 102 | −0.201 046 416 615 651 64 · 10−25 |
12 | −0.324 242 392 559 219 54 · 101 | −0.395 635 369 550 420 78 · 10−3 |
13 | −0.890 660 477 331 007 53 · 101 | 0.723 499 458 050 852 92 |
Note that the series in (16) and (17) is related to the asymptotic expansion of (see, e.g., Ref. 4),
We use (17) and (18) for so that it is sufficient to keep only seven terms, yielding an error of less than 10−13.
For , we use the Taylor expansion of (6),
and we need 10 terms to maintain an accuracy of about 13 digits. While selecting parameters as above leads to algorithms with a reasonable speed, we did not optimize these choices as they may depend on several factors, e.g., computer architecture.
Since the Boys function is related to the error function (and can be used to compute it), we compared the speed of our algorithm with that of the well-known algorithm by Gautschi20 for computing the error function with a complex argument using a rational approximation of the closely related Faddeeva function. The speed of that algorithm was measured in comparison with the speed of computing . In Ref. 20, it is stated that with an accuracy of ∼10 digits, the code is 7–15 times slower than the speed of computing . Using the same comparison for our algorithm, this ratio is ∼12 for an accuracy of about 13 digits. Our algorithm is implemented using Fortran 90 compiled by Intel’s ifort with compiler flags -O3 -ipo -static and running on a laptop with GHz chipset. We timed our code by performing 106 evaluations, yielding s/evaluation in comparison with s/evaluation for with a complex argument.
While algorithms for computing the Fresnel integrals appear to be somewhat faster than using the Boys function in (5) (see, e.g., Ref. 21), we note that the generalized Fresnel integrals, e.g., , n ≥ 2, can be evaluated using our approach and plan to consider algorithms for these oscillatory special functions elsewhere.
III. APPROXIMATION OF FOR REAL AND COMPLEX ARGUMENTS
The function
decays monotonically on , and we use Algorithm 1 in Ref. 22 to construct its near optimal approximation via exponentials. We refer to Ref. 22 and references therein for the details of the algorithm that we use to obtain the necessary parameters (for an example, see Table II).
We obtain approximation
where . We note that n should be sufficiently large (e.g., n ≥ 7) to avoid the impact on the approximation of the singularity of the nth derivative of gn. Its numerical effect makes the accuracy of the current double precision implementation of Algorithm 1 in Ref. 22 insufficient to reliably produce approximation (22) for 1 ≤ n ≤ 6.
Substituting the approximation of into the integral (1), we arrive at
and estimate
Since
we have
Indeed, denoting the factor on the right-hand side of (23), , we have
and therefore, for ,
This implies that reaches it maximum at z = 0, where .
If , we compute instead of ,
Using (22), we obtain
and the estimate
For computing values of for 0 ≤ n ≤ nmax for , we use recursions
instead of (2) and
instead of (3).
IV. IMPLEMENTATION DETAILS
The speed of computation of values of for nmax ≥ 7 depends on the number of terms M in approximation (22). We demonstrate the results of approximating and display function in Fig. 1. Using only 13 terms (see Table II), we achieve accuracy for [e.g., accuracy of evaluation of is 2.08 · 10−14].
In implementing this approximation, we need to isolate cases where z is close to −ηm by using the Taylor expansion for . Since most of ηm have an imaginary part, it is a minimal effort if z is real since ηm is real in only three terms in our example in Table II. In addition, for the real argument z, we need to use only five terms with complex valued parameters as they come in complex conjugate pairs.
We implemented these algorithms using Fortran 90 on a laptop described in Sec. II. Computing the Boys functions for n = 0, …, 12 for the real argument takes s. The subroutine for the complex valued argument is slower and takes s.
V. CONCLUSION
Since their introduction in Ref. 1, the Boys functions with a real argument have widely been used for computing Gaussian integrals. When using mixed Gaussian/exponential bases, one needs to evaluate the Boys functions with complex argument. Such mixed bases are appropriate for scattering problems and for bound state problems where using only plane waves becomes too expensive near singularities. Consequently, mixed Gaussian/exponential bases provide a greater flexibility in formulation and solving problems of quantum chemistry, and we present our results, in part, to facilitate their use.
While for a real argument the Boys functions can be easily tabulated in regions where their asymptotic is not accurate, it is more difficult to apply such a straightforward implementation for a complex argument. A careful reading of Refs. 3, 4, and 8 reveals shortcomings of the existing approaches (relying mostly on expansions) to computing the Boys functions of a complex argument (see, e.g., conclusion in Ref. 8). For our approach, a better comparison is offered by Gautschi’s algorithm20 for the error function of complex argument since it is related to , as in (4); see Sec. II. Our approach of approximating a part of the integrand so that the resulting integral can be evaluated explicitly is simpler and yields tight accuracy estimates. Note that the part of the integrand we are approximating is real, while the Boys functions we are computing are complex valued. As a side remark, we note that the Boys function remains bounded for a complex argument with [and for ] and, for this reason, provides a good alternative approach for computing the error function of a complex argument.
We avoid the direct timing comparisons with existing algorithms since such comparisons are generally misleading. Given different hardware (single core, multi-core, GPU, etc.), different compilers and compiler flags, and different implementations, it is hard to compare algorithms by simply running them. Instead, one can look at algorithmic possibilities an approach offers. Our code is compact, and it is easy to simply count the total number of operations. We note that computation of each term in sums (12) and (23) is trivially parallel and only recursions in (2) and (3) require a sequential implementation (with just three multiplications and one addition per function). Thus, timing of our algorithms implemented on a multi-core or GPU computer will be much faster than the quoted timings of our implementation on a single central processing unit (CPU).
SUPPLEMENTARY MATERIAL
See the supplementary material for five Fortran 90 subroutines implementing, as an example, algorithms for computing the Boys function with indices n = 0, …, 12. The subroutine dboysfun12.f90 evaluates the Boys functions for real non-negative argument z. The subroutines zboysfun12.f90 and zboysfun00.f90 evaluate the Boys functions for complex argument z with a non-negative real part. Finally, the subroutines zboysfun00nrp.f90 and zboysfun12nrp.f90 evaluate the functions for complex argument z with a negative real part.
ACKNOWLEDGMENTS
S.S. was supported by the NSF under Grant No. CHE-1800584. S.S. was also partly supported through the Sloan research fellowship.
AUTHOR DECLARATIONS
Conflict of Interest
The authors have no conflicts to disclose.
DATA AVAILABILITY
The data that support the findings of this study are available within the article and its supplementary material.
APPENDIX: CONSTRUCTION OF QUADRATURE IN (11)
Changing variables in (7) t = eτ/2, we rewrite it as
and discretize it (following Ref. 23), yielding
where ηm, wm > 0 are arranged in an ascending order, and we estimate that
Using δ = ɛ = 10−13 in (A2) results in approximation with M = 210. We also need this approximation to satisfy
in order to obtain