Introduction to Molecular Modelling: Part 10 (Absorption spectra)

12 min readJul 30, 2022

In this post, I will cover the prediction of electronic spectra with computational chemistry. Electronic spectra usually indicate spectroscopy in the ultraviolet-visible region. Most dyes that absorb or emit light in the visible region, do so due to electronic transitions (π → π* or n→π*). Many aromatic compounds also have strong absorptions in the UV region and often show fluorescence (this is used for visualisation of TLC plates in laboratory for example). The whole branch of photochemistry is dedicated to understanding the processes that occur when light (might not be visible light) is absorbed, and electrons are excited from one level to the other. Computational chemistry is one of the tools that help in understanding photochemical processes. Reasonably accurate predictions of UV/vis absorption spectra can be obtained easily from calculations. This has twofold advantages: firstly, the calculation helps in assigning peaks or otherwise interpreting experimental spectra, and secondly, calculations are often faster and cheaper than a full experimental study and therefore can act as a preliminary guide for what experiments to run or which compounds to synthesize.

In my previous blog posts, I have used only GAMESS as my QM program. However, in this instance, I will additionally use another free QM program ORCA (to download the software, you have to register in the forum). This is because some particular features has not been implemented yet in GAMESS, such as excitation energies with double hybrid density functionals. This is a part of the calculations that I will shown in this post. However, you do not have to install Orca if you do not want to do those specific calculations.

Absorption spectra

Absorption spectrum indicates the wavelengths or energies at which your compound absorbs light. Usually electronic absorption spectra are taken in the UV-visible region. In fact, UV-visible absorption spectroscopy is one of the most common tools available in chemistry laboratories for characterisation of a range of materials.

UV-visible absorption spectroscopy looks for absorption in the range of 200 to 800 nm. In this range, you will generally find absorptions for molecules with double bond or triple bond (i.e. conjugation). Molecules that are aliphatic i.e. do not have any conjugation, usually absorb at lower wavelengths(<200 nm). The problem is that UV light below 200 nm is quite dangerous to us, and it is also absorbed rather strongly by air, so setting up such an equipment is difficult. This is why, I will focus on conjugated molecules that absorb in the usual UV-visible range in this post.

Let us take the indigo molecule as an example:

As you probably know, indigo is a deep blue natural dye, extracted from indigo plants. It was used to impart colour to textiles and was a very valuable material in the ancient and medieval ages. It also has a rather gruesome history of being associated with slavery in plantations of North America, and colonialism in India.

Anyway, the deep blue colour of indigo arises due to its strong absorption around 600 nm (i.e. orange), which can be seen from the UV-vis absorption spectrum.

UV-vis absorption spectrum of indigo (solution in DMF)

Let’s try to simulate this spectrum with QM calculations.

Method and basis set selection

For any QM calculation, the first things to choose are the method and the basis set. Please note that in this case, the method has to be able to model electron excitation, as we are not simply modelling the ground state of the molecule.

There are three commonly used methods in computational chemistry employed for calculation of excited states: TD-HF (or CIS), TD-DFT and EOM-CC. They are the excited state versions of Hartree-Fock, DFT and Coupled-Cluster (CC) respectively. They also provide more reliable and accurate results going from TD-HF to EOM-CC, in general.

TD-HF is too inaccurate, so I will not use it here. You can try it if you wish to, it is quite fast. I will use TD-DFT with a conventional functional for the first calculation. TD-DFT calculation involves first calculating a ground state DFT wavefunction, and then performing a linear reponse calculation (i.e. electron excitation) on that ground state, to obtain the excited states and the excitation energies.

In DFT, an approximate exchange-correlation functional needs to be chosen. Now, there has been various benchmarks done on this topic. Jacquemin et al. for example, recommends the functionals X3LYP, PBE0 and LC-wPBE(20) for TD-DFT based on an extensive benchmark [1].

I will use X3LYP in this calculation on indigo. As indigo is a medium sized molecule, we will use double-zeta basis sets to get results faster. I will use 6–31+G(d) basis set which is an old double zeta basis, but it works well. Now, we can proceed to the calculation.

Geometry optimization

The first step is to optimize the ground state geometry of the indigo molecule. This is because electronic absorption is much faster than nuclear movement (Frank-Condon principle), which means that when light hits the ground state molecule, electronic excitation happens before the nuclei have any chance to move. This means that the electronic excitations are “vertical” i.e. there is no change in geometry during light absorption. So, we can use the ground state geometry for calculating excitation energies. (Please note that this is an approximation that works quite well for most cases. In room temperature, real molecules are always vibrating and a molecule’s vibrational modes and electronic excitations can couple to each other so that the approximation does not provide rigorous results.)

I drew the structure with Avogadro, minimized it with MMFF94, and then used GAMESS to optimise the structure. I used PBE0-D3/6–31+G(d) in DMF solvent (SMD) for the optimisation. You can run the optimisation if you want to, but it takes a really long time (~4 hours) because indigo is not a small molecule. The final optimised structure is given here, use it for the TD-DFT calculation:

 C           6.0  -4.9694049822   0.9682374839  -0.0315637146
 C           6.0  -5.4324585697  -0.3578921421   0.0031315820
 C           6.0  -3.6048304898   1.2302445639  -0.0768950445
 C           6.0  -4.5644718339  -1.4477271825  -0.0011463425
 C           6.0  -2.7198650597   0.1515977911  -0.0855691220
 C           6.0  -3.1987349633  -1.1759183154  -0.0436300876
 H           1.0  -5.6830373663   1.7875547919  -0.0235618688
 H           1.0  -6.5035681487  -0.5441916118   0.0362083006
 H           1.0  -4.9415823102  -2.4663540736   0.0300181236
 H           1.0  -3.2285746950   2.2500406072  -0.1069224333
 C           6.0  -1.2616442978   0.0866478853  -0.1309272392
 N           7.0  -2.1360260572  -2.0590464092  -0.0467660888
 C           6.0  -0.9520534887  -1.3679828889  -0.1102769704
 C           6.0   0.2939435421  -1.9131719690  -0.1462014186
 N           7.0   1.4779193982  -1.2219830286  -0.2082525010
 C           6.0   0.6038152779  -3.3676649624  -0.1203440211
 C           6.0   2.5409333950  -2.1047962888  -0.2016884994
 C           6.0   2.0623044343  -3.4322972245  -0.1576188605
 C           6.0   3.9068024976  -1.8325262472  -0.2358803316
 C           6.0   4.7752705236  -2.9219582337  -0.2283576003
 C           6.0   4.3125451364  -4.2481093695  -0.1908056365
 C           6.0   2.9477583774  -4.5105614684  -0.1551979543
 H           1.0   2.5717590390  -5.5303938010  -0.1233109097
 H           1.0   4.2836482362  -0.8138592857  -0.2687487773
 H           1.0   5.8465253212  -2.7352962068  -0.2541624082
 H           1.0   5.0266364436  -5.0670777102  -0.1893953357
 O           8.0  -0.2387521914  -4.2659694964  -0.0746493712
 O           8.0  -0.4190160024   0.9848657828  -0.1774934208
 H           1.0   1.5271227447  -0.2120741428  -0.1684869065
 H           1.0  -2.1855037840  -3.0689510844  -0.0859293972

I have used the PBE0-D3 functional for geometry optimization, because it is a well-known functional with dispersion correction. You can use another functional if you wish (e.g. B3LYP).

TD-DFT with X3LYP

Now open Notepad and create a GAMESS input file. Put the coordinates of the optimised structure of indigo in $DATA section. In GAMESS, TD-DFT calculations can be requested by specifying TDDFT=EXCITE in $CONTRL section.

The TD-DFT calculations can take solvent effects into account if you specify the solvent in $PCM section. The experimental spectrum of indigo shown above has been recorded in DMF (dimethylformamide) solvent, so we will use that. The $TDDFT section specifies how the TD-DFT calculation has to take place. The NSTATE=15 indicates that GAMESS should look for the first 15 excited states. The keyword TAMMD=.t. turns on the Tamm-Dancoff approximation. This is a mathematical approximation that can be used in TD-DFT to cut the time by nearly half without losing much accuracy. The approximation is made by neglecting what’s called the B matrix in the response equations.

Run the calculation with GAMESS. As indigo is a medium-sized molecule, this calculation will take a lot of time. In my laptop it took 40 mins to finish.

If the calculation finished succesfully, the GAMESS output file should contain the calculated excitation energies. Since we set NSTAT=15, GAMESS will only calculate 15 excitation energies. Higher energy excitations would not be found in the list. Please note that the calculation only provides excitation energies from ground state to different excited states, it does not provide a spectrum. To simulate a spectrum that looks like the one recorded by the actual UV-vis spectroscope, we have to apply a line broadening on each excitation (i.e. convolute the spectrum). Then we will be able to compare them. (There are some caveats here… I will discuss them at the end of this post)

You will find the different excitation energies in the output file in a section that starts with “SUMMARY OF TDDFT RESULTS” :

The table shows the energies requires for excitation from ground state (0) to the 15 excited states. The oscillator strengths indicate the probability of the transition which is related to the intensity of the peak in experimental spectrum. Higher oscillator strength usually means that the peak will be stronger (although there is no straightforward mathematical relation between them and there are caveats).

To see a plot of the convoluted spectrum from the calculation results, I will use the software Gaussum (which I had used in a previous post about IR spectra). In Gaussum, open the GAMESS output file, select “Electronic transitions” on the left. Then select “UV-visible”, and set “Start:” to 250 nm and “End:” to 800 nm. The FWHM parameter determines how much each line is broadened to get the plot of the spectrum. You can change it to different values to see how the spectrum looks. I have left it at default for the following plot:

Calculated UV-vis spectrum of indigo in DMF with X3LYP/6–31+G(d)

I have annotated the plot with wavelengths corresponding to the peaks.

Ok, now compare it to the experimental spectrum of indigo.

Experimental UV-vis absorption spectrum of indigo (in DMF)

The calculated spectra is able to represent the three peaks and their relative positions accurately. However, the peak positions are not reproduced accurately in TD-DFT. For example, the peak at 612 nm appears at 527 nm, and the peak at 287 nm appears at 326 nm in the TD-DFT spectrum. It is also clear that the relative peak heights are not reproduced faithfully by TD-DFT.

So, is there a way to increase the accuracy? Well, instead of using conventional functionals (e.g. B3LYP, PBE0 etc.) we can use double-hybrid functionals. In these functionals, apart from the approximate exchange correlation functional, there is also an MP2-like perturbative correction. The MP2 part adds explicit electron correlation, which improves their accuracy by a lot. We can also run TD-DFT with double hybrid functionals, in which case there would be a MP2-like correction to the electron excitation calculation. This type of correction to excitation energies is often represented as CIS(D) and was first implemented by Head-Gordon et al. [2]

Unfortunately, GAMESS does not have this feature implemented in it. It cannot run TD-DFT with double-hybrid functionals. Therefore, I will the software Orca to run this calculation instead.

TD-DFT with B2PLYP

B2PLYP is one of the most well-known double-hybrid functionals. It uses the BLYP exchange-correlation functional along with exact exchange and MP2 correction.

To run the TD-DFT with B2PLYP in Orca, the optimised geometry of indigo has to be used. The best way to do this is to open the output of geometry optimisation with Avogadro or MacMolPlt and then write an Orca output file (from Avogadro) or export it as *.xyz file and subsquently copy the coordinates to an Orca file.

The Orca input file is given below in case you wish to run it.

! B2PLYP 6-311++G** AutoAux RIJCOSX NOSOSCF 
%tddft
 nroots 15
 maxdim 7
end
%cpcm
 smd true
 smdsolvent "n,n-dimethylformamide"
end
%pal
 nprocs 10  #this is the number of processor cores
end
%maxcore 1500 #amount of memory in MB per core* xyz 0 1
   C       -4.96940        0.96824       -0.03156
   C       -5.43246       -0.35789        0.00313
   C       -3.60483        1.23024       -0.07690
   C       -4.56447       -1.44773       -0.00115
   C       -2.71987        0.15160       -0.08557
   C       -3.19873       -1.17592       -0.04363
   H       -5.68304        1.78755       -0.02356
   H       -6.50357       -0.54419        0.03621
   H       -4.94158       -2.46635        0.03002
   H       -3.22857        2.25004       -0.10692
   C       -1.26164        0.08665       -0.13093
   N       -2.13603       -2.05905       -0.04677
   C       -0.95205       -1.36798       -0.11028
   C        0.29394       -1.91317       -0.14620
   N        1.47792       -1.22198       -0.20825
   C        0.60382       -3.36766       -0.12034
   C        2.54093       -2.10480       -0.20169
   C        2.06230       -3.43230       -0.15762
   C        3.90680       -1.83253       -0.23588
   C        4.77527       -2.92196       -0.22836
   C        4.31255       -4.24811       -0.19081
   C        2.94776       -4.51056       -0.15520
   H        2.57176       -5.53039       -0.12331
   H        4.28365       -0.81386       -0.26875
   H        5.84653       -2.73530       -0.25416
   H        5.02664       -5.06708       -0.18940
   O       -0.23875       -4.26597       -0.07465
   O       -0.41902        0.98487       -0.17749
   H        1.52712       -0.21207       -0.16849
   H       -2.18550       -3.06895       -0.08593
*

The keywords of input file are different for Orca. I will not go into a detailed explanation on what each keyword means because only the results of the TD-DFT calculation are important right now. The calculation takes 1 hr 10 min on my laptop.

Plotting the TD-DFT results is difficult because none of the currently available visualisation programs support the output of Orca v5.0. (The current version of Orca, i.e. 5.0 has been released very recently and has a different output format than Orca v4.x. Most visualisation programs have limited support for Orca v5.0.) I had to trick GaussSum into plotting the results of TD-DFT with B2PLYP by using a TD-DFT output from an older version of Orca and copying and pasting the TD-DFT output section by hand into that log file.

Here is the spectrum with B2PLYP/6–311++G**:

Calculated UV-vis spectrum of indigo in DMF with B2PLYP/6–311++G**

Again, compare it to the experimental spectrum. One thing you will notice immediately is that the peak positions are actually closer to the experimental ones on average. (268 -> 273, 287 -> 306, 612 -> 573)

The position of the first excitation (612 nm in exp.) improves significantly, going from 527 nm with X3LYP to 573 nm with B2PLYP. The first excitation is usually the most important one in photochemistry (because it is generally used for fluorescence etc.).

Double hybrid functionals in general tend to be more precise than conventional functionals. However, in general it is wiser to perform a benchmark, instead of relying blindly on a functional. These are not black-box methods i.e. the choice of the functional has to be made carefully, even if it is a double hybrid one.

There are even more reliable and generally accurate methods such as CC2, CC3 (which can be considered approximated versions of EOM-CCSD or EOM-CCSD(T)), or ADC(2). These methods are often used as the standard to benchmark against. However, they are all computationally expensive, and scale poorly with system size.

Experimental spectra vs calculation

So far we have calculated vertical excitation energies, computed a spectrum from that, and compared it to experimental spectrum. However, this method is not strictly correct. For experimental peaks to correspond to vertical excitation energies, the three following assumptions have to be met:

The geometry of the molecule remains fixed during electron excitation (Franck-Condon principle)
Electron transition probability is the highest for excitations starting from the ground state minimum geometry
The position of absorption peak is not affected by coupling to vibrational and rotational levels

These three assumptions do not hold, even for the smallest molecules.[3] However, in practice, the error from these assumptions are low enough that we can use the “vertical excitation” approximation to predict spectra in most cases. The errors from the assumptions may range from 10–100 nm.

There is a rigorous way of correcting for these errors. It requires a geometry optimisation of the excited state, and calculation of the vibrational modes (Hessian) of the excited state, along with the vibrational modes of the ground state. Then the vibrational effects can be treated rigorously, and the results would be much closer to the experiment. However, this type of calculation is time consuming, so it is generally not done.

Conclusion

So we ran TD-DFT calculations with a conventional functional X3LYP, and then with a double hybrid functional B2PLYP. In both cases, the spectrum by assuming vertical excitation approximation is reasonably close to the experimental spectrum. The number and approximate position of the peaks were reproduced correctly, but the relative intensities were not.

There are other methods, which can be potentially more accurate, like CC2, EOM-CCSD, ADC(2) which we did not test due to high computational cost.

References:

D. Jacquemin, V. Wathelet et al., J. Chem. Theory Comput., 2009, 5, 2420–2435
M. Head-Gordon, R. J. Rico et al., Chem. Phys. Lett., 1994, 219(1–2), 21–29
R. Send, M. Kuhn, F. Furche, J. Chem. Theory Comput., 2011, 7, 2376–2386