Volume 126, Issue 5 e2020JA029063
Research Article
Full Access

Analysis of Relationship Between Ionospheric and Solar Parameters Using Graphical Models

Kateřina Podolská,

Corresponding Author

Kateřina Podolská

Department of Ionosphere and Aeronomy, Institute of Atmospheric Physics of the Czech Academy of Sciences, Prague, Czechia

Correspondence to:

K. Podolská,

kapo@ufa.cas.cz

Search for more papers by this author
Petra Koucká Knížová,

Petra Koucká Knížová

Department of Ionosphere and Aeronomy, Institute of Atmospheric Physics of the Czech Academy of Sciences, Prague, Czechia

Search for more papers by this author
Jaroslav Chum,

Jaroslav Chum

Department of Ionosphere and Aeronomy, Institute of Atmospheric Physics of the Czech Academy of Sciences, Prague, Czechia

Search for more papers by this author
Michal Kozubek,

Michal Kozubek

Department of Ionosphere and Aeronomy, Institute of Atmospheric Physics of the Czech Academy of Sciences, Prague, Czechia

Search for more papers by this author
Dalia Burešová,

Dalia Burešová

Department of Ionosphere and Aeronomy, Institute of Atmospheric Physics of the Czech Academy of Sciences, Prague, Czechia

Search for more papers by this author
First published: 14 April 2021

Abstract

For the investigation of time variations of critical frequency (foF2), solar radiation flux at 10.7 cm wavelength (F10.7 index) and geomagnetic activity index Kp, we use conditional independence graphical models which describe and transparently represent the structure of relationships in the time series. We employ multivariate statistic methods applied to daily observational data obtained from midlatitude ionosondes within the period 1994–2009 (23rd Solar Cycle). It is demonstrated that conditional independence graphical models represent a robust method of multivariate statistical analysis, useful for finding a relation between one of the main ionospheric parameters and space weather conditions. This method appears to be more appropriate than correlation analysis between foF2 and the main geomagnetic and solar indices, especially for long-term data for which the model characteristics may change or time series can be interrupted. We compare the results obtained by the graphical models method with cross-correlation analysis. We found that the method is suitable for the analysis of the dependence between foF2 values and time-shifted F10.7 time series. In particular, we clearly identified +0 day shift in all cases, and +1-, +2–3-, and +4–5-day shifts in European, American, and East Asia sector, respectively. The conditional independence graphs method can be applied even in the case when classical parametric methods are not convenient.

1 Introduction

Graphical models of conditional independences are an important instrument of multivariate statistics. They describe and transparently represent the structure of dependence relationships in the set of time series of measured quantities represented by random vectors. The conditional independence graphical (CIG) models (Whittaker, 1990) can also be used when classical parametric methods are not convenient, for example, data do not have normal distribution, data are not equidistant in time or model characteristics of time series change over time. Therefore, we used data from the whole period of the 23rd Solar Cycle (SC) and both surrounding minima to test this method.

Recently, CIG have also been used in the Atmospheric and Oceanic Sciences. Method of complex networks using graphical models in climatology was used, for example, by Tsonis and Roebber (2004). Probabilistic graphical models for climate data analysis have been developed by Ebert-Uphoff and Deng (20102012) and Bannerjee (2011). Multivariate and multiscale dependence in the global climate system revealed through complex networks was elaborated by Steinhaeuser et al. (2012).

There is known that ionosphere represents the ionized upper part of the Earth's atmosphere formed at heights of mesosphere and thermosphere. The entering solar radiation ionizes the neutral particles and forms pairs of positive and negative ions and/or electrons. Ionosphere is not homogeneous. It is stratified into layers abbreviated by letters D–F. Under certain conditions the F layer splits into F1 and F2. The F layer (or F2 in case of splitting) is the highest formed layer that persists even during nighttime. Other layers quickly disappear after sunset due to recombination processes. Maximum of electron concentration is found in the F2 layer and can be considered as a representative parameter of the whole ionosphere since it reflects solar energy input and transport processes within ionosphere (e.g., Davies, 1990; Hargreaves, 1992; Prölls, 2004, etc.). In this paper, we implement method of conditional independence graph into variability of maximum ionospheric ionization.

The principal aim of this study is to employ graphical models of data with continuous distribution on ionospheric data series. The data are in the form of a realization of normally distributed random vectors. A specific problem is to determine, which variables interact and how strongly, and to decide if the data can be condensed without any loss of information. The theoretical basis of the technique is the concept of conditional independence and the prime tool is the conditional independence graph.

In particular, we will consider a structure of dependence relationships of components of the random vectors, namely critical frequency of the ionospheric F2 layer foF2 time series, intensity of the Sun radio flux at a 10.7 cm wavelength F10.7 (Tapping, 2013), and planetary Kp index that represents fluctuation of horizontal components of the geomagnetic field. Time shifts between the time series are also considered. The differences in time shifts between foF2 and F10.7 for various longitudes are discussed.

2 Observational Data

State of the ionosphere is monitored regularly by international network of ionosondes and digisondes. In principle, the ionosonde/digisonde transmits vertically electromagnetic waves (typically in the frequency range 1–22 MHz) and registers their reflections from ionosphere. Maximum frequency on which the ionosonde/digisonde detects echo is called critical frequency foF2 of the ionospheric F2 layer. The foF2 value corresponds to the maximum of ionospheric ionization. From the time-of-flight of the sounding wave, the virtual height of the reflection is computed. The output of the measurement is height–frequency characteristics of the ionosphere. The foF2 has longest observational data series, as it is used to monitor and forecast radio wave propagation conditions for telecommunications and navigation systems (Baker et al., 2019).

We collected data from six midlatitude ionosondes to compare the longitudinal dependence with pairwise dependence of time series foF2, F10.7 (including time shift +1 to +5 days for F10.7). The time-shifted series of F10.7 are denoted F10.7+n, where n is time shift in days, for example, F10.7+1 denotes time series F10.7 with 1 day shift. We used averages of five daily foF2 values measured around the local noon (10:00–14:00) by midlatitude ionosondes located in three longitudinal areas (100°W–110°W/0°E–15°E/150°E–160°E) in the period 1994–2009 in order to use representative values for noon time. For midlatitude stations, we observe either one-peak maximum or two-peak maxima diurnal foF2 course. This noon average helps to reduce effect of the noon local minimum falling in between two peaks prenoon and after-noon peaks. The examined period (1994–2009) falls into 23rd SC and surrounding minima. For seeking the F10.7 and foF2 data series shift, we decided to choose the observations from stations that are located both in the western and in the eastern hemispheres. Geographical locations and geomagnetic coordinates of the stations involved in the analysis are visualized in Figure 1 and listed in Table 1.

Details are in the caption following the image

US/UK World Magnetic chart, Epoch 2010, Geomagnetic coordinates by Geomagnetic chart by UKDSSC. Geographical locations of the ionosondes stations are in the map visualized using black markers, Boulder (BC840) ●, Dyess (DS932) ★, Chilton, Slough (RL052/SL051) ◆, Juliusruh/Rugen (JR055) ✚, Magadan (MG560) ■, and Petropavlovsk (PK553) ▲.

Table 1. List of Midlatitude Ionosondes, Listed in Order of Increasing Geomagnetic Latitude
Station Geographic Geomagnetic (year 2000)
Latitude Longitude Latitude Longitude
Dyess AFB (DS932), Texas, USA 32.5°N 99.7°W 41.4°N 32.1°W
Petropavlovsk (PK553), Russia 53.0°N 158.7°E 45.5°N 138.4°W
Boulder (BC840), Colorado, USA 40.0°N 105.3°W 48.2°N 39.8°W
Chilton, Slough (RL052/SL051), UK 51.6°N 1.3°W 53.8°N 83.6°E
Magadan (MG560), Russia 60.0°N 151.0°E 51.5°N 146.9°W
Juliusruh/Rugen (JR055), Germany 54.6°N 13.4°E 54.0°N 99.6°E

The F10.7 index represents proxy of the solar activity, whereas the planetary Kp index is a logarithmic measure of the global geomagnetic activity. The daily values of these two indices for the examined period (1994–2009) are available. The global daily value of solar radio flux F10.7 time series measured at local noon at the Dominion Radio Astrophysical Observatory (DRAO, Penticton, Canada) was used. The Kp indicates the fluctuation rate of horizontal components of the geomagnetic field and significantly increases during geomagnetic storms. Kp was used in form of the daily arithmetic mean of the 3-h standardized K-indices for the 13 Kp-observatories (Bartels & Veldkamp, 1954). Data sources are listed in the Data Availability Statement section at the end of this article. The time series of Kp, F10.7, and foF2 for individual ionospheric stations are drawn in Figure 2. As can been seen from Figure 2, there are gaps in the foF2 time series and their lengths are different.

Details are in the caption following the image

Analyzed time series of Kp, F10.7, and foF2 from selected midlatitude ionosondes stations.

Figure 3 shows the values of cumulative distribution functions of F10.7 and foF2 (multiplied by 10 to display in one plot) for the individual stations (Normal probability plot). The data are plotted against theoretical normal distributions shown by straight lines in the plots (note the logarithmic scales on vertical axes). Departures from this straight line indicate departures from normality. Only the local condition of normality is assumption for the CIG model. The time series will be transformed by differences of logarithms and tested for normality of logarithmical series to fulfill this assumption in the following calculations. This technique also tests whether the time series have sufficient continuity for the CIG model.

Details are in the caption following the image

Normal probability plots of foF2 LT noon 5-h averages versus and F10.7 daily data, 1994–2009, sorted by their geographic longitude.

3 Cross Correlation and Linear Regression Model

For graphical model of conditional independencies, comparison between cross correlations and linear regression (LR) models was used. Here, the cross correlations refer to the relations between the entries of two random vectors foF2 and F10.7 (respectively time-shifted F10.7), while the correlations of a random vector foF2 are considered to be the correlations between the entries of foF2 itself, those forming the correlation matrix (matrix of correlations) of foF2. If each of foF2 and F10.7 (respectively time-shifted F10.7 and F10.7+n) is a random variable which is realized in a time series, then the correlations of the time series of foF2 are known as autocorrelations of foF2. The cross correlations of foF2 with F10.7 (respectively time-shifted F10.7 with F10.7+n) across time are temporal cross correlations. The cross correlation satisfies the identity:
urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0001(1)
where f and g are analyzed time series and τ is the time shift/lag. The LR model performed in SAS 9.4™ (SAS Institute Inc. 2012. SAS OnlineDoc® 9.4. Cary, NC: SAS Institute Inc., https://support.sas.com/documentation/94/) was used to verify foF2 dependence on F10.7 solar parameter, time-shifted F10.7 solar parameter, and Kp geomagnetic parameter. The LR model is analytically described by Equation 2:
urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0002(2)
where Y is dependent variable (LR model) of time series foF2, Kp, F10.7+n are explanatory variables namely, a0, b0, …, b6 are the regression coefficients, ei is the error term. The variables were included into the model at a significance level of 0.05. The best model is selected by the value of the adjusted coefficient of determination (RMSE), which indicates the percentage of variability of foF2 explained by the LR model for the given data. The RMSE is given by 3:
urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0003(3)
where urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0004 and urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0005 are the approximation of the observation yi in the model with estimate parameters b0, b1, …, bk. The residuals urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0006 are the estimates of the error terms ei. Only models with statistical significance of at least 0.05 are considered further.

4 Conditional Independence Graphs Method

In this paper, we employ so-called graphical models of conditional independence (Lauritzen, 1996; Whittaker, 1990). By these, we employed the models, which combine time series of variables with different discontinuity. The aim of such calculation is to describe the structure of conditional independencies between the components of the random vector X (here the time series of ionospheric parameter) using a suitable graphical model.

The analyzed data have the form of a realized normally distributed random vector. The local condition of normality is stronger than global. Both of these conditions are equivalent under additional assumptions, for example, for a multidimensional normal distribution (Lauritzen, 1996). In the absence of an assumption of global normality, the model does not have a conditional probability distribution of future states dependent only upon the present state (Markov property) can be implemented in both the univariate and the multivariate version (Buhl, 1993; Cramer, 1998).

Individual time series analyzed in this paper are represented in the graphical model as vertices of an undirected graph, while their mutual conditional dependency is represented by an edge between graph vertices.

The aim is to obtain the most convincing estimation of relations between variables in the model and to create a graphical model based on this estimation (Schafer, 1997). We are investigating the structure of mutual dependency of its individual elements. Our starting point is a set of all graphs of conditional independencies, from which those showing the closest match with the data are chosen while the others are excluded.

Specifically, the Graphical model with graph G is a system of dependency distribution of a random vector X, fulfilling conditional independencies given by graph G. An edge {i,j} is not in the edge set E (it is excluded) if and only if XiXj / XK \ {i,j}. A special case of graphical model is the so-called saturated model with a complete graph, which has all pairs of vertices connected by an edge and hence is determined by a complete graph. Another term widely used in graphical models analysis is a clique, defined as a complete subgraph. By adding other vertices, a subgraph is created, which is no longer complete (Lauritzen, 1996; Whittaker, 1990).

A random vector X has multivariate normal distribution with zero expected value and variance matrix V = var(X), which is symmetric and positive semidefinite. This matrix has diagonal elements:urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0007 and elements off diagonal: urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0008.

The variance var X is defined as
urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0009(4)
where urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0010 is expected value. It measures deviation of random variable values from the expected value; hence, it is a measure of variability. Covariance of random vectors X, Y is defined by
urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0011(5)
which indisputably implies the relation cov(X, X) = var(X).
The estimation of a variance matrix V is a selective variance matrix with diagonal elements:
urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0012(6)
and with diagonal elements:
urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0013(7)

If variance matrix V is positive definite, then it has an inverse variance matrix D = var(X)−1 = V−1 and dij, i, j = 1, 2, …, k are its elements.

In order to build a graphical model of conditional independencies, first of all there is necessary to determine pair correlations between time series represented by graph vertices. The pair correlations are determined from an inverse matrix of the covariant matrix of the estimated model residues (Buhl, 1993). During the computation, we seek the maximum likelihood estimation of variance matrix V, on conditions given by graphical model. The selection is done by direct calculation, using a procedure stated in Section 4.2.

The selective variance matrix S is the maximum likelihood estimation of variance matrix V in a saturated complete graph. The deviance has an asymptotic χ2 distribution and is a test statistic, whether the model with graph G matches the alternatively saturated model. The number of degrees of freedom f for χ2 distribution of deviance of model with graph G equals the number of missing edges in graph G and depends on probability distribution of data (Jordan, 2004).

4.1 Deviance Analysis

As we have already stated, in order to construct the graphical model of conditional independencies, it is necessary to know the pair correlations between variables in the model. The statistical significance of these correlations is tested by a test criterion. For the purpose of testing graphical models of conditional independencies, a statistic is set, which is based on the estimation of the parameters of the model given by the method of maximum likelihood (determining the logarithm of the function and searching for its maximum).

To choose the model that fits data the best, various selection algorithms were developed in the probabilistic and Bayesian approach. In this paper, we only consider the likelihood approach, in which the main tool is the test statistics. The test statistics deviance is defined as double the difference between the logarithm of likelihood of a graphical model of a complete graph and the logarithm of the likelihood of the graphical model of the examined graph:
urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0014(8)
where l(S) denoted the logarithm of likelihood of hypothetical model (model parameters and input data are identical, its likelihood is maximal), and urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0015 denoted the logarithm of likelihood of examined model (with a limited number of parameters estimated with the method of maximum likelihood). That is, the deviance of a graphical model and graph G is calculated by the equation dev(G) = n{(urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0016)T − ln[det(urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0017)] − k}, where urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0018 and urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0019 is the maximum likelihood estimation of variance matrix V in graphical model with graph G based on random selection from a multivariate normal distribution. k is number of model parameters in examined graph. The estimations urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0020 are with probability 1 defined unambiguously.

We attempt to find the final model using predictors (here time series of solar and geomagnetic indices). A model with n parameters is a model with a complete graph, where all variability goes to the systematic component. A model with k parameters is a model with examined graph, in the case when one or more predictors are excluded (m < k parameters, where m is the number of estimated parameters). A model with a subgraph of an examined model with m − 1 parameters.

For a model with an empty graph, all variability goes to the random component. Analogous to the variance analysis, we are testing, if both models differ statistically significantly in their prediction abilities. The statistic difference of deviances is in this computation the test criterion for the difference between the model with examined graph and a model with its subgraph (Whittaker, 1990):
urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0021(9)

If the difference of deviation is urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0022where urn:x-wiley:21699380:media:jgra56430:jgra56430-math-0023 is the quantile χ2 distribution, of a model with subgraph, and k is the number of parameters of the examined model, then a submodel is not suitable, because it is too simplifying.

Thus, we are testing whether for the examined model is missing a significant predictor. We can test if the examined model and the model with complete graph differ from each other. As a test criterion, residual deviance is used and the number of degrees of freedom for quantile χ2 distribution is given by the difference between the number of variables (the number of parameters of a model with complete graph) and the number of parameters of the examined model (Edwards, 2000).

While testing whether the examined model explains any variability at all (compared to a constant) we are examining whether the examined and zero model differ from each other. As a test criterion, we use the difference of residual and zero deviance. The number of degrees of freedom for quantile χ2 distribution is given by the number of parameters of the examined model minus 1. The principle of the testing of our graphical model of conditional independencies with best fit to data is schematically depicted in Figure 4, adjusted according to Edwards (2000).

Details are in the caption following the image

The principle of testing the graphical model using deviance test statistics.

As was mentioned above, for a selection of data-best-fit graphical model, complete graph is first chosen, a likelihood function constructed, and unknown parameters are determined by its maximization through the set of these parameters. A test is run to determine if the selected graphical model and the data match. The computation of deviance of the excluded edges is done by the Iterative Proportional Fitting (IPF) algorithm. Selection algorithms for graphical models of conditional independencies are described in detail in the book (Whittaker, 1990, pp. 182–185).

This algorithm tests a model with graph G2 against an alternative model with graph G1 containing one or more additional edges than G2, (see Equation 9).

4.2 Model Selection and the Computation Method

The deviance of graphical model with graph G is calculated by iteration using the so-called IPF algorithm. Graphical models were computed using the Mathematica 8™ SW and its procedure Backward1, which applies the selection algorithm. From a saturated model with complete graph, edges are gradually drawn off (algorithm backward).

It concerns two so-called Backward algorithms, which start with complete graph. A complete and an empty graph is depicted on the left and right side of Figure 5a. We begin with the complete graph its deviance is set to 0 (devf = 0). The computation of deviance of excluded edges is executed using the IPF algorithm. The edge with the smallest insignificant deviance is selected and excluded from the graph. If all deviances in graph are significant, the computation is terminated. If no two elements of the examined random vector are conditionally dependent, all edges are gradually excluded from the model and the outcome is an empty graph (Whittaker, 1990, pp. 182–185).

Details are in the caption following the image

Example of CIG computation, Boulder (BC840). From the entire graph (panel a) in the left, all edges with computed deviance less than 3.84 (the 5% point of χ2(1)) were excluded. Resulting minimized model in the panel (a) in the right sight is shown. In the panel (b), a graphical representation of model calculation test statistics is shown. The model test statistics value for time series F10.7 and shifted F10.7+n time series are indicated by a horizontal column. Departure up to significance level up to 3.84 (the 5% point of χ2(1), red vertical line) denotes statistically significant edge in minimized graphical model. CIG, conditional independence graphical.

Afterward, the tests are run on the input data as follows:

  • (1)

    The test of independency of logarithmic data (log(K/Kt−1) = log Kt − log Kt−1) is run, based on a differentiated sign test, where Kt is the value of realization of random vector in time t and Kt−1 is the value of realization of random vector in time (t − 1). Data are transformed by differences of logarithms to fulfill the assumption of normality and independence of observed realizations within this step.

  • (2)

    The test of the normality of logarithmical series for computing the maximum likelihood estimator of the constrained covariance matrix, under the assumption of multivariate normality. In the absence of an assumption of local normality, calculation is terminated and the model cannot be created (Whittaker, 1990, pp. 182–185).

The final graphical model of conditional independencies is then selected according to the following procedure (Whittaker, 1990: pp. 241–246):

  1. The system of k-variate normal distributions for X is determined and the set of parameters described.

  2. For a multivariate normal distribution, the independency is characterized by a variance matrix V = var(X) or by its inversion D.

  3. Selection of a graphical model, which will be tested, whether it matches the data.

  4. Construction of the likelihood function.

  5. Unknown parameters are determined by the maximization of likelihood function through a set of these parameters. The requirements for the likelihood function are determined by the chosen graphical model.

  6. A test, if the selected graphical model and the data match. The test statistic (deviance) equals double difference of the maximized logarithmic likelihood functions. The first one is maximized without restriction, the second one with restriction given by the selected graphical model because deviance has χ2-distribution. Therefore, we can assess whether the graphical model fits the data.

In Figure 5, there is shown an example of the results of CIG computation for the Boulder station. In Figure 5a, each quantity in the model, data time series or missing observation, is represented by a vertex. Some of the vertices are connected by undirected edges. The absence of such edges represents an assumption of conditional independence. The thin edges denote the complete graph of foF2, Kp, F10.7, and F10.7 time-shifted time series (F10.7+n), the bold black ones are edges of the corresponding minimized graph. The vertices are labeled by the number of days of the time shift. Edges between the vertices which correspond to time-shifted series F10.7+n are trivial, and thus were excluded during our calculation. The problem of large sensitivity on the initial conditions was solved by Koller and Friedman (2009).

In Figure 5b, graphical representation of model calculation test statistics is shown. The model test statistics value for time series F10.7 and shifted F10.7+n time series are indicated by a horizontal column. Departure up to significance level up to 3.84 (the 5% point of χ2(1), red vertical line) denotes statistically significant edge in minimized graphical model. The edges between vertices which correspond to conditional dependencies between time-shifted series F10.7+n are trivial and thus were excluded before computing.

5 Comparison of Used Methods

5.1 Results of Linear Regression Model

Computed LR model is described by Equation 2. If the explanatory variable (Kp, F10.7) is excluded from the model, the appropriate parameters do not appear in Table 2 that presents the estimated coefficients of the LR models with solar and geomagnetic activity for parameters Kp, F10.7, and shifted F10.7. If all the explanatory variables were gradually excluded from the model, the model is not listed in the table either. The stations are sorted by their geomagnetic latitude. RMSE denotes adjusted coefficient of determination.

Table 2. Estimated Coefficients of the Linear Regression Models With Solar and Geomagnetic Activity for Parameters Kp, F10.7 and Shifted F10.7
Station a0 F10.7 F10.7+1 F10.7+2 F10.7+3 F10.7+4 F10.7+5 Kp R2 RMSE
Dyess AFB (DS932) −11.9747 0.3411 −0.1442 −0.1459 0.1696 0.3078 4.7577 0.3597 32.4393
Petropavlovsk (PK553) −3.1350 0.2433 −0.1095 −0.1141 0.2551 3.9093 0.1446 39.3992
Boulder (BC840) 4.3286 0.3996 −0.1249 −0.1280 0.3525 1.9065 0.5156 27.0022
Magadan (MG560) 16.4584 0.1819 −0.1136 0.1931 1.0661 0.1433 34.1668
Chilton, Slough (RL052) 26.2886 0.3063 −0.0457 −0.0439 −0.0779 −0.0389 0.2973 −0.9968 0.5478 19.0867
Juliusruh (JR055) 9.8420 0.3476 −0.0690 −0.1608 0.3458 0.3268 0.5344 23.0645

Cross-correlation matrix of computed LR models is depicted in Figure 6. The picture shows that due to the variability in the coherency of time series during SCs, the series are coherent in parts only. Residual analysis was performed. We calculated 95% confidence intervals for estimates of the mean value of the dependent variable foF2, 95% confidence intervals for individual estimates and 95% confidence intervals for estimates of model parameters. Variables were included in the model if they were significant at least at 0.05 level.

Details are in the caption following the image

Cross-correlation matrix of computed linear regression models with solar and geomagnetic activity parameters Kp, F10.7, and shifted F10.7.

5.2 Results of CIG Model

The data were transformed by differences of logarithms to fulfill the assumption of normality and independence of analyzed time series. Test of the local normality of logarithmical series for computing the maximum likelihood estimator of the constrained covariance matrix (see Section 4.2) was achieved for all analyzed data series F10.7+n, foF2, and Kp. Therefore, minimized graphical models could be created for the time series obtained for all six midlatitude ionosondes.

The initial graphical model for data best fitting is initially selected as an entire graph. The graph vertices represent the components of a random vector (F10.7+n, foF2, Kp), the edges between two vertices represent the dependences of these variables (in Table 3 marked with green checks). In our selection of minimized model, all edges with computed deviance, see Equation 9 less than 3.84 (the 5% point of χ2(1)), were excluded from the entire graph (in Table 3 marked with red crosses). The results of the CIG model computation are shown in Figure 7 and graphs of minimized models are given in Figure 8. The results are divided according to the station locations.

Table 3. Summary of Conditional Independence Graphs (CIG) Model Computation, Sorted by Their Geomagnetic Latitude
Station Deviance of minimalized graphical model F10.7 F10.7+1 F10.7+2 F10.7+3 F10.7+4 F10.7+5 Kp Edge number Test p-value
Dyess AFB (DS932) 4.86 image image image image image image image 4 0.047
Petropavlovsk (PK553) 3.91 image image image image image image image 3 0.039
Boulder (BC840) 7.56 image image image image image image image 4 0.037
Magadan (MG560) 6.47 image image image image image image image 4 0.049
Chilton, Slough (RL052) 9.14 image image image image image image image 4 0.033
Juliusruh (JR055) 7.38 image image image image image image image 4 0.039
Details are in the caption following the image

Results of the conditional independence graphs (CIG) model test statistics computation, sorted by their geographic longitude. The graphical representation of model calculation test statistics for all analyzed ionosondes stations is shown. The model test statistics value for time series F10.7 and shifted F10.7+n time series are indicated by a horizontal column. Departure up to significance level up to 3.84 (the 5% point of χ2(1), vertical red line) denotes statistically significant edge in minimized graphical model.

Details are in the caption following the image

Minimized conditional independence graphs (CIG) models computation results. The minimized model for all analyzed ionosondes stations. All edges with computed deviance less than 3.84 (5% point of χ2(1)) were excluded from the entire graph (gray lines). The resulting minimized conditional independence graphs is drawn by black lines. The graph vertices represent the components of a random vector (F10.7+n, foF2, Kp), the edges between two vertices represent the conditional dependences of these variables.

Final minimized CIG models selected by IPF algorithm for all stations during the years 1994–2009 are visualized in Table 3. The F10.7 shifted time series graph vertices are labeled by the number of days of the shift. The dependence time shifts were clearly identified. The 0-day shift F10.7 time series is naturally identified in all cases as a trivial dependence.

In our selection of minimized model, all edges with computed deviance, see Equation 9 less than 3.84 (the 5% point of χ2(1)), were excluded from the entire graph. Computed total models deviances, Equation 8, and test p-values for all eight surveyed time series in time period 1994–2009. The stopping threshold for this calculation was 0.05.

6 Discussion

The conditional independence graphs method can be applied even in the case when classical parametric methods are not convenient, for example, when the data are not normally distributed along the entire length of time series (Anderson & Olkin, 1985). The minimized graphical models were calculated for using data from all six selected midlatitude ionosondes (including interrupted and shorter time series) because they fulfill the condition of the local normality of logarithmical series.

The method of conditional independence graphs proves its usefulness for studying the correlations between foF2, the solar radio flux at 10.7 cm, and geomagnetic activity index Kp. From our analyses, it is obvious that the time shift is conditionally dependent on the geographical longitude. It is possible to divide the resulting minimized graphs into three groups: “European,” “American,” and “Far East” stations. For the European stations (Juliusruh/Rugen, Chilton), there is typically a conditional dependence between the foF2 parameter and the F10.7 time series shifted by +1 or +2 days.

The American stations (Boulder and Dyess) typically indicate conditional dependence +2/+3 days shifted F10.7 series. The Far East stations series (Petropavlovsk, Magadan) seems to indicate a higher conditional dependence foF2 with +4/+5 days shifted series F10.7.

It is necessary to point out, that we have received three categories of dependencies. Moreover, these three classes correspond to the three locations of ionospheric stations involved in the analysis. First class corresponds to East Asia stations, second class Europe, and the third to North America. It might be natural to expect slight geomagnetic dependence reflecting change of the geomagnetic dip going over our latitudinal span. However, since no dependence on geomagnetic coordinates has been identified, we have looked for another phenomenon that might come into the process. As it has been widely reported, the state of ionosphere is significantly affected by the dynamics of lower-lying atmosphere. Works of Kazimirovsky et al. (2003), Laštovička (2006), Yigit and Medvedev (2015), Yigit et al. (2016), and many others demonstrated internal wave coupling processes in the Earth's atmosphere by mean of propagating atmospheric waves (Gravity, Tidal, and Planetary) originating in the whole atmosphere and down as low as height of the troposphere. Atmospheric waves are generated by broad range of sources and propagate through the atmosphere where they alter its state. Dissipation of atmospheric waves plays an important role in the energy transport and secondary wave generation. Hence, their impact could be detected far from their source location. Secondary gravity waves and their effects were in detail analyzed for instance by Vadas et al. (2018). The atmospheric waves are of high importance for understanding and circulation in the mesosphere and lower thermosphere (MLT) region. On the wind measurements from midlatitudes, Jacobi (2014) demonstrated long-term analyses of mean circulation within MLT region with respect to atmospheric wave activity. Fourteen years of SABER satellite observation brought information about Gravity Waves climatology in the stratosphere and mesosphere region (see for instance Liu et al. [2017]). Excitation mechanism of nonmigrating tides that have significant impact at mesospheric and thermospheric heights is in detail modeled and discussed by Myioshi et al. (2017). Further, El Nino–Southern Oscillation (ENSO)-related variation of the E region was reported by Chang et al. (2018) suggesting that ENSO signatures can be transmitted to Es formation mechanisms, potentially through modulation of vertically propagating atmospheric tides that alter lower thermospheric wind shears.

Within the troposphere, generation of internal waves is often related to meteorological phenomena. Internal atmospheric waves interact with themselves and/or with the undisturbed atmospheric flow in a complicated dynamical system with long-range dependencies. Koucká Knížová et al. (2015) demonstrated extremely high similarities in the behavior of the foF2 time series up to ground distance of about 1,000 km and attributed them to mesoscale system forcing. Also, Yu et al. (2016) discussed that different meteorological conditions are likely responsible for the longitudinal differences in observation of medium-scale traveling ionospheric disturbances and midlatitude spread F. Effects of the strong mountain wave on the mesosphere, thermosphere, and consequently on ionosphere wintertime Southern Andes have been demonstrated by Vadas and Becker (2019) and Xiao et al. (2016).

Taking into account theoretical, modeling, and experimental studies showing impact of atmospheric waves of lower atmosphere origin up to ionospheric heights, we suspect that regional character of the atmosphere at tropospheric heights may be crucial for explanation of the three different dependencies of foF2 and F10.7.

Similar mechanism, climatology (involving atmospheric entire wave activity) of the region, might influence variability in different activities of internal atmospheric waves in various climatological regions and hence different ionospheric response to solar forcing. We suspect that climatology of the troposphere must be taken into account for modeling of the ionospheric variability. It means that troposphere and its climatology may affect time-lags in response to the forcing and or/may account for long memory of the system. A future analysis of differences in atmospheric wave activity in the ionosphere at various longitudes could help for better understanding of the considered coupling between troposphere and ionosphere and for supporting of the suggested hypothesis.

Climate is considered as the average weather in a given area over a longer period of time. The classical period used for climate description is 30 years, as defined by the World Meteorological Organization. Climate is determined by a region's climate system with its major components: the atmosphere, the hydrosphere, the cryosphere, the land surface, and the biosphere. The most significant parameters are temperature, atmospheric pressure, wind, solar irradiance, humidity, precipitation, and topography. The Earth climate strongly depends on large-scale global circulation organized into three major cells (Hadley, Ferrel, and Polar cells) and forming major circulation pattern. In addition, variability of the local or regional atmospheric circulation related to terrain, topography, or geographical position must be considered. The distance from water bodies plays an important role as well. The coupled atmosphere–ocean system has an influence not only on circulation pattern but also on the temperature variability and precipitation regime. An overview of Earth climatology can be found for instance in Hartmann (1994) and climate variability in Barnet et al. (2005), or climate report Perlwitz et al. (2017).

The world climate classification was first introduced by German scientist Köppen in 1900. His work has been later in 1950s enlarged by Geiger. These publications are not easily accessible. Translation of the historical work (Köppen, 2011) shows the division of the Earth into thermal zones according to the duration of hot, moderate, and cold periods and to the impact of heat on the organic world.

Actual world maps of climates can be found for instance in Peel et al. (2007), Kottek et al. (2006), Rubel and Kottek (2011), and others. Over more than 100 years, the Köppen–Geiger climate classification is the most widely used classification describing climate regime of the particular region.

Climatology of the particular region is divided into 5 main classes and 30 subtypes. It is based on threshold values and seasonality of monthly air temperature and precipitation. Different regions in a similar class share common vegetation characteristics as climate has been recognized as a major driver of global vegetation distribution. Climatological classification for time span 1980–2016 is provided by Beck et al. (2018). This time range is convenient for our study. In the paper, there can be found table of region classification together with the plot (Figure 1a—present-day map for time span 1980–2016) of their distribution around the globe. On the map, we clearly identify that East Asia stations are classified as D class: cold, Europe stations as C class: moderate, and North American stations as B class: arid (Beck et al., 2018, p. 1, Figure 1a). According to the morphology, the ionosphere is divided into three major regions—polar, midlatitudes, and equatorial. The classification is in principle based on geometry of the geomagnetic field and general behavior of charged particles within the particular region is consensual.

In our study, we have analyzed stations that all fall into the midlatitudes and therefore the response to the solar and geomagnetic forcing should be in agreement. Slight geomagnetic dependence would confirm our expectations. However, we have not identified significant geomagnetic dependence. On the contrary, we found out that our six time series can be classified into three groups. These three groups differ in longitude and in climatic regime of the region where the observatories are located.

Here, we hypothesize, climatological regime of the lower-lying atmosphere may be responsible for the difference in the ionospheric response to the solar and geomagnetic forcing.

Here, we would like to point out that we have analyzed data from ionospheric stations in a rather short span of latitudes, all stations belong to midlatitudes (41.4°N–54°N) and one would have expected either practically same response/dependence of foF2 to F10.7 cm and/or geomagnetic dependence as solar and geomagnetic forcing is considered as the most important. However, we have not identified any geomagnetic dependence.

7 Conclusion

We used the directed conditional independence graphs to emphasize the qualitative assumptions underlying the complex models. The final results of our analysis by conditional independence graphs show that the dependence time shifts between foF2 and F10.7 were clearly identified, namely +0 day shift in all cases, and +1-, +2–3-, and +4–5-day shifts in European, American, and East Asia sector, respectively. The CIG method has been found to be more eligible for this purpose than correlation analysis between foF2 and geomagnetic and solar indices because of its relative insensitivity to data gaps or long-term changes of data series. Especially for long-term data for which the characteristics of the model change over time or time series are interrupted. CIG method seems more effective than wavelet-based cross-correlation analysis (Roux et al., 2012) or scale analysis when solving this problem. In the minimized graphical model, the interconnections between all series of indices were not identified, so the effect was not complex but varies according to the station region climate type. Taking into account that ionosphere is strongly coupled with lower-lying atmosphere, we hypothesized that troposphere could contribute to the difference in time dependencies and time-lags in ionospheric response to the external solar forcing. The climatology of the particular region is a very complicated system that is highly influenced by atmospheric circulation on wide range of scales. We suggest that similar phenomena predominantly circulation pattern forming the particular type of the climate and related wave fields may be responsible for the observed time shifts in the ionospheric response to the external forcing (solar and geomagnetic).

Acknowledgments

The support under the grants 18-01969S and 18-01625S by the Czech Science Foundation is acknowledged. This work was supported in part by ESA through contract 4000126709/19/NL/IS “VERA.” World Data Center for Geomagnetism, Deutsches GeoForschungsZentrum (GFZ), NWRA/CoRA, NorthWest Research Associates, and UK Solar System Data Centre (UKSSDC) Rutherford Appleton Laboratory is acknowledged for providing ionospheric and geomagnetic measurement. Vladislav Chýna are acknowledged for compiling IPF selection algorithm. Ludmila Třísková and Vladimír Truhlík from the Institute of Atmospheric Physics CAS are acknowledged for the consultations of ionospheric and solar data.

    Data Availability Statement

    The data used for time series assembling were provided in case of Kp by World Data Center for Geomagnetism, Kyoto University, Japan [online] [cit. 2016-04-01]. <https://wdc.kugi.kyoto-u.ac.jp/kp/index.html>, in case of F10.7 by NWRA/CoRA, NorthWest Research Associates, Boulder, USA [online] [cit. 2016-04-01]. <http://www.nwra.com/spawx/spawx.html>, and in case of foF2 and ionosondes information by UK Solar System Data Centre (UKSSDC), Rutherford Appleton Laboratory, Oxfordshire, GB [online] [cit. 2016-05-02]. <http:/www.ukssdc.ac.uk/wdcc1/ionosondes/secure/iono-data.shtml>. Access is enabled by clicking “cancel” button on the login box.