Volume 125, Issue 20 e2020JD033254
Research Article
Full Access

Gridded Versus Station Temperatures: Time Evolution of Relationships With Atmospheric Circulation

Martin Hynčica,

Corresponding Author

Department of Physical Geography and Geoecology, Faculty of Science, Charles University, Prague, Czechia

Czech Hydrometeorological Institute, Ústí nad Labem, Czechia

Correspondence to:

M. Hynčica,

martin.hyncica@natur.cuni.cz

Search for more papers by this author
Radan Huth,

Department of Physical Geography and Geoecology, Faculty of Science, Charles University, Prague, Czechia

Institute of Atmospheric Physics, Czech Academy of Sciences, Prague, Czechia

Search for more papers by this author
First published: 18 October 2020

Abstract

Interpolated data sets are often considered to be a reliable source of information on a variety of meteorological variables, such as temperature and precipitation. Users expect the interpolated data to be rather similar to those directly observed at stations, which is not always true: well documented is the influence of interpolation on, e.g., extremes. Here another kind of discrepancy between gridpoints and station observations is presented: the time evolution of relationships between temperature and atmospheric circulation. One of the most widely utilized gridded temperature data sets, CRU TS (Climatic Research Unit gridded Time Series), is compared with 634 station time series from GHCN (Global Historical Climatology Network) in the Northern Extratropics. We analyze running correlations (calculated for 15-year windows) of monthly values between modes of atmospheric circulation variability (identified in the ERA-40 reanalysis) and temperature anomalies in winter from 1957 to 2002. The smallest differences in the running correlations are found in Europe and North America due to a dense station network. On the other hand, the sites with considerable differences are located mainly in mountainous regions or in isolated locations. In order to uncover causes of these differences, we analyze two sites in more detail. Mike (the North Sea) is an isolated site where the gridpoint temperature is affected by rather distant Scandinavian stations. At Songpan (central China; 2,852 m a.s.l), the terrain configuration in mountainous region influences the gridpoint value, in which the effect of stations with much lower altitude and different climate conditions is dominant.

1 Introduction

Atmospheric circulation may be analyzed with various approaches, one of which is the detection of modes of low-frequency variability (referred to as “circulation modes” hereafter). Circulation modes mostly consist of several distant centers, between which pressure or geopotential heights are highly correlated. Circulation modes can be identified by two major approaches: one-point correlation maps (e.g., Li & Ruan, 2018; Raible et al., 2014; Wallace & Gutzler, 1981) or principal component analysis (e.g., Barnston & Livezey, 1987; Huth et al., 2006; Yu et al., 2018). Circulation modes affect surface climate elements, such as temperature and precipitation; many studies have looked into it, e.g., Bueh and Nakamura (2007), Iles and Hegerl (2017), Lim (2015), Linkin and Nigam (2008), Liu et al. (2014), Moore and Renfrew (2012), Piper et al. (2019), Pokorná and Huth (2015), and Wang and Zhang (2015). However, the relationships of circulation modes with surface climate elements change in space and time (Beranová & Huth, 2008; Jacobeit et al., 2001; Polyakova et al., 2006; Slonosky et al., 2001; Zuo et al., 2016). One of the first studies describing such a nonstationarity (Chen & Hellström, 1999) shows the high temporal variability of the relationship of the North Atlantic oscillation (NAO) with temperature in Sweden where the gradually decreasing correlations between NAO and temperature from the beginning of the 20th century to the early 1920s followed by its strengthening were revealed. Increasing correlations of NAO with temperature are detected for the last quarter of the 20th century in western, central, and southeastern Europe, while the opposing tendency is observed in Iceland and Norway (Beranová & Huth, 2008, where also the nonstationarity of other circulation modes is examined over Europe). The nonstationarity has been suggested to be caused by the shift of the action centers of circulation modes (Jung et al., 2003): for instance, the eastward shift of the NAO centers during the second half of the 20th century may cause the strengthening of the relationships with temperature over large parts of Europe (Beranová & Huth, 2008).

All the aforementioned studies use station data only. Yet gridded data sets provide regularly spaced surface data which may be utilized for the description of the changing relationships with circulation modes over large areas and in a longer temporal perspective. Gridded data sets are mostly produced by interpolating station data to a regular grid by various interpolation methods, such as natural neighbor, kriging, triangulation, cubic spline, and angular distance weighting: their description is given in, e.g., Avila et al. (2015), Hofstra and New (2009), New et al. (2000), and Willmott and Robeson (1995). The gridded data sets, however, have limitations given by the interpolation process and changes in station data, spatiotemporal changes in the density of stations entering the interpolation in particular (e.g., van der Schrier et al., 2013). The station time series are corrected (homogenized) before entering the interpolation process, therefore the influence of other factors such as, e.g., the change of observational site and increasing effect of urbanization is suppressed.

In general, gridded data sets maintain well the values around the mean state of input data, while the temporal variations are typically oversmoothed, which leads to an underestimation of variance and extreme values in gridded data (Beguería et al., 2016; Gervais et al., 2014; Herold et al., 2016; Hofstra et al., 2009, 2010). The density of the station network is crucial in the appearance of oversmoothing: it manifests stronger at gridpoints computed over sparse station network (e.g., less populated areas, higher elevations) where distant stations outside a grid-box, which share less similarity with near-to-gridpoint stations, contribute to the interpolation (Herrera et al., 2019; Hofstra et al., 2010). Thus, the gridded data have to be treated with caution when analyzing changes in extremes, mainly over the areas with low-density station network. Gridding affects also trends (Beier et al., 2012; Donat et al., 2014; Krauskopf & Huth, 2020) and a shape of underlying statistical distribution, including higher-order statistical moments (Cavanaugh & Shen, 2015; Director & Bornn, 2015; Gross et al., 2018; Rhines et al., 2017).

Nevertheless, the number of stations entering the interpolation affects the output even in an area with a dense station network, as is shown by comparing the Climatic Research Unit (CRU) data set (where approximately 400 stations are used) with the high-resolution national data set (incorporating about 2,400 stations) over China (Xu et al., 2020). Differences in seasonal temperature trends reach up to 0.13°C/decade there. The CRU data set also produces lower precipitation sums in higher elevations (above 1,500 m a.s.l.), indicating that the accuracy of the data set decreases with increasing elevation (Shi et al., 2017; Zhu et al., 2015). Generally, precipitation sums provided by interpolated data sets are lower than observed while the bias increases with altitude (Fallah et al., 2020).

As we have just demonstrated, gridded data sets have been evaluated and compared with station data for a variety of characteristics. However, the sensitivity of relationships with atmospheric circulation to the type of surface climate data has not been studied yet. Here we compare the temporal evolution of relationships of circulation modes detected in ERA-40 reanalysis and temperature between stations and their nearest gridpoints in the CRU data set (Harris et al., 2020). A particular attention is paid to the pairs of the station and gridpoint data, for which the time series exhibit substantial discrepancies. The goal of the paper is solely the comparison of gridded and observed data, and subsequent seek for the causes of differences between both data types. It is not our intention to improve or modify the interpolation procedure used for the creation of gridded data.

2 Materials and Methods

Monthly temperature at 634 stations (obtained from the Global Historical Climatology Network-version 3, GHCN; Lawrimore et al., 2011) and at the gridpoints in Climate Research Unit gridded Time Series v 4.01 (Harris et al., 2020, further named as the CRU data set) nearest to the stations is used. Two types of station series are available in GHCN: unadjusted (i.e., data are not homogenized by the authors of GHCN but they may have been modified before by, e.g., national meteorological services) and adjusted (data are homogenized by the authors of GHCN). We employ unadjusted data; nevertheless, both types of series provide fairly similar results.

The studied area covers continents in the Northern Extratropics, approximately north of 20°N. At all gridpoints and stations, monthly long-term temperature anomalies from 1957 to 2002 as a reference period (or shorter if a station time series ends earlier) in winter months (December, January, and February) are calculated. It should be stressed that the station data are retrieved independently of the CRU data set, thus only some of the GHCN stations were incorporated in the gridding process of the CRU data set.

Atmospheric circulation is described by circulation modes, which are detected with the use of rotated principal component analysis (PCA), based on the correlation matrix of monthly 500 hPa height anomalies north of 20°N inclusive (excluding the North Pole) in the ERA-40 reanalysis (DJF, 1957–2002; Uppala et al., 2005). Circulation modes in ERA-40 are the most similar (i.e., having the lowest differences) to other reanalyses (Hynčica & Huth, 2020), which is the reason why we opt for it. Correlations of time series between “traditional” reanalyses (ERA-40, JRA-55, and NCEP-1) exceed 0.95 for the majority of circulation modes in winter (Hynčica & Huth, 2020). The utilization of another reanalysis would therefore not change our results.

The data are available on a 2.5° × 2.5° grid. We use the double spacing (5° × 5°), which is sufficient given the large spatial autocorrelation in the data. The decreasing area of gridboxes toward the pole is eliminated by using a quasi-equal-area grid, in which individual gridpoints are omitted at individual latitudes so that an average gridbox area is roughly the same in all latitudes and approximates that on the Equator (Barnston & Livezey, 1987; Huth, 2006). Ten principal components are rotated, resulting in 10 circulation modes, of which nine that correspond to those described already by Barnston and Livezey (1987) are considered for further analysis and interpretation (Figure 1).

image
Circulation modes in winter in the ERA-40 reanalysis and their explained variance (in %). Names and abbreviations are adopted from Barnston and Livezey (1987).

For the description of the time evolution of relationships of temperature with atmospheric circulation, running Pearson correlations are computed between the scores (i.e., the time series of intensity) of all the nine circulation modes and monthly temperature anomalies at all 634 stations and gridpoints. Running correlations are calculated for the moving 45-month (15 winters) windows with a one-month shift; the running correlation of each window is assigned to its central month. Hence, time series of running correlations with nine circulation modes are produced for all 634 stations and their nearest gridpoints. Finally, the maximum difference between the time series of running correlations at every pair is determined for all circulation modes.

For the examination of differences between the two data sets, we apply the same procedure that is used in the creation of the CRU data set, namely, the Angular Distance Weighting interpolation method (ADW; Hofstra & New, 2009; New et al., 2000; Shepard, 1968). ADW is a combination of two components. The first one weighs each station by its distance in the radius of 1,200 km (for temperature) from the gridpoint. It controls the contribution of stations to the gridpoint by the exponential function. The second component evaluates the directional isolation of each station and ensures that isolated stations gain more weight (Caesar et al., 2006; Harris et al., 2014; Hofstra & New, 2009; New et al., 2000; Shepard, 1968). ADW thus assigns the largest weight to the stations nearest to the gridpoint and to those being most isolated, the latter preventing the gridpoint to be heavily affected by a region with a high density of observations. We perform ADW to the selected group of stations used for the creation of the CRU data set to explore whether a station network used for gridding may explain differences between station and the nearest gridpoint. The stations contributing to the calculation of a given gridpoint in the CRU data set are found out using a freely available utility in Google Earth (which is maintained by the team around CRU; Harris et al., 2020). Then, temperature anomalies at eight or less selected stations (this condition is also used in the CRU data set) are interpolated in each time step.

3 Results

3.1 Overall Evaluation

Figure 2 displays running correlations for four station-gridpoint pairs. In Figure 2a, the situation one would normally expect is shown: running correlations at the station and the gridpoint almost coincide. The gridpoint almost perfectly represents the relationship with the atmospheric circulation at the nearby station. As we show later, such a high congruity of both time series prevails at most locations. However, there are sites with rather large differences in the time course of circulation-to-temperature relationships between the station and the closest gridpoint, of which we show and discuss several examples. The correlations may run almost in parallel with approximately the same difference through the analysis period as in Songpan for the NA mode (Figure 2d) or the difference of correlations may vary in time as for Mike and NAO (Figure 2b), and Daggett and PNA (Figure 2c). Although the interpolation method contributes to this disagreement as it brings a little smoothing to gridded time series, its effect cannot explain such large differences; there must be other factors causing the discrepancies between the gridded and station data. Differences between running correlations are mostly not statistically significant at the 5% level (after applying the Fisher transformation, see, e.g., Hynčica & Huth, 2020, for the formula and testing details). Nevertheless, their magnitudes often oscillate around 0.3, which is fairly close to being significant (the significance is detected for differences around 0.4). Our intention is to determine the causes responsible for the differences at gridpoint-station pairs; for that purpose, we consider the differences in running correlations large enough to be inspected in more detail regardless of their (in)significance.

image
Running correlations (1957–2002) of temperature at four (a, b, c, d) selected stations (red) and their nearest CRU gridpoints (black) with the indicated circulation modes. The locations of stations are depicted in Figure 3. Time series in bold refer to periods with significant differences between running correlations at the 5% level.

The number of circulation modes, for which the maximum difference in running correlations between the station and gridpoint exceeds 0.15, is shown in Figure 3 for all sites. Running correlations are in a high agreement at the majority of station-gridpoint pairs where the large differences are detected for two or fewer circulation modes only. On the other hand, running correlations differ more for the sites located over the southeastern and southern Asia, around the Black Sea, in the Alps, the Iberian Peninsula, locally over Siberia, and in western North America (Figure 3). There are locations where correlations at the station and at the nearest gridpoint differ by more than 0.15 for all the nine modes. In order to identify plausible causes of this disagreement, a deeper analysis of two locations is carried out in the following sections.

image
The number of circulation modes with the maximum difference between running correlations at the station and at the closest gridpoint exceeding 0.15. Locations of the sites from Figures 2 and 12 are highlighted by a square and letter (A = Aktobe; B = Mike; C = Daggett; D = Songpan; E = León).

3.2 Example 1: Mike

Station Mike (in operation from 1949 to 1999) was an isolated ship station located in the North Sea at 66°N, 2°E. Four gridpoints are located around it at the same distance; we analyze that at 65.75°N, 2.25°E (this choice does not have any effect on our results). Station Mike was utilized in the gridding of the CRU data set and station series in the CRU station database almost perfectly matches that in GHCN. Figure 4 depicts monthly temperature anomalies at the station and the gridpoint, and their running correlations with all circulation modes. Somewhat counterintuitively, anomalies at the gridpoint exhibit larger variability than at the station, although one expects the opposite as the interpolation suppresses local variance (Beguería et al., 2016; Gervais et al., 2014). Considerable differences between running correlations at the station and at the gridpoint occur for all circulation modes, the largest being detected for EU1 (0.46), NA (0.33), and NAO (0.27). Three circulation modes significantly affect temperature at Mike, i.e., their running correlations stay outside the significance bounds for most of time (Figure 4): NAO, EU1, and EU2. Since NAO and EU1 exhibit substantial differences in running correlations during most of the studied period and the differences for the latter mode are significant for a considerable time, both are inspected in more detail. Correlations at the gridpoint are higher than at the station for NAO, while the opposite holds for EU1.

image
Monthly temperature anomalies (top left) and their running correlations with the circulation modes at station Mike (red) and its closest CRU gridpoint (black). Gray dashed lines show statistical significance at the 5% level, time series in bold refer to periods when differences between running correlations are significant at the 5% level as well.

Correlations with NAO and EU1 at all stations that have contributed to the calculation of temperature at the gridpoint closest to Mike at any time during 1901–2016 are shown in Figure 5a. A stronger relationship with NAO and a weaker relationship with EU1 are evident for stations in Scandinavia; this is in accord with how the gridpoint behaves (Figure 4). Figure 5b identifies stations incorporated in the calculation of the gridpoint temperature from 1957 to 2002 (left) and their availability (right). Seven of eight stations that were used for interpolation to the gridpoint until December 1999, when most of the time series (including Mike) terminate, are situated in Scandinavia where the relationships with atmospheric circulation are different from Mike; the eighth station is Mike itself. Hence, the gridpoint in the CRU data set appears to be heavily affected by Scandinavian stations and the contribution of the nearest station Mike is thus suppressed. This may explain the discrepancy of correlations between the Mike station and the gridpoint.

image
(a) Correlations of stations contributing to the calculation of temperature at the CRU gridpoint closest to Mike (triangle) with NAO (left) and EU1 (right). For station names see Figure 7. Stations highlighted with square are used in reinterpolation. (b) Stations contributing to the CRU gridpoint between 1957 and 2002 (left, in green) and their altitude and availability; with green color highlighting the period when the station contributed to the CRU gridpoint (right). Stations are ordered by the distance, with the nearest station on the top.

To verify this hypothesis, we use a selection of stations for reinterpolation of temperature to the CRU gridpoint. We refer to temperature at the gridpoint constructed this way as “reinterpolated temperature” (RIT), while the original temperature at the CRU gridpoint is called “CRU temperature” (CRUT). Only Mike and stations west of it (on Iceland, Jan Mayen, the Faroe Islands, and Shetlands; marked by squares in Figure 5a) are used in reinterpolation. Figure 6 (top) shows that running correlations of RIT with NAO and EU1 are similar to Mike. Dissimilar running correlations of CRUT confirm the high impact of Scandinavian stations, which overweigh the influence of the nearest station, Mike, despite more than double weight assigned to it by ADW compared to other stations. Moreover, a rather high temperature variability of Scandinavian stations is transferred to the gridpoint (Figure 4, top left) where temperature variance is substantially larger than at the station, although one would expect a lower temporal variability due to the influence of sea on local climate at Mike.

image
(top) Running correlations of NAO and EU1 with temperature at Mike, its nearest CRU gridpoint (CRUT), and reinterpolated temperature at the gridpoint (RIT); (bottom) running correlations with NAO and EU1 at the stations (numbers refer to the locations shown in Figure 5) used in reinterpolation. Gray dashed lines show statistical significance at the 5% level.

Running correlations of RIT and at Mike do not match perfectly, however (Figure 6, top): running correlations with NAO start deviating from each other after 1982, which is caused by decreasing correlations with NAO at the majority (five out of seven) of contributing stations in approximately the last 20 years (Figure 6, bottom). That trend projects to RIT, while it is observed only marginally at the station. Running correlations of RIT with EU1 are lower than for Mike until 1987. Afterward, the stagnation or a slight decrease in running correlations at Mike are accompanied by their increases especially at southwestern stations (Dalatangi, Teigarhorn, and Torshavn, Figure 6), which in conclusion causes running correlations of RIT to approach station temperature at Mike.

Aggregated information on relationships with NAO and EU1 for all stations contributing to the CRU gridpoint is given in Figure 7. The correlation of RIT (blue dashed vertical line) is much closer to Mike in comparison with CRUT (black dashed vertical line) for both circulation modes. Mean absolute differences in running correlations between individual stations and CRUT/RIT (black/blue bar) indicate that CRUT shares large similarity with Scandinavian stations, mainly those located in central and southern Norwegian coast. On the other hand, most of the stations west of Mike, used for reinterpolation (shown in red in Figure 7), exhibit a considerable dissimilarity from CRUT. CRUT is thus driven mainly by easterly located Scandinavian stations, the influence of which overweighs the contribution of the nearest station and results in a substantial discrepancy between the station and its nearest gridpoint in the running correlations with the circulation modes.

image
For stations contributing to interpolation to the gridpoint nearest to Mike are shown: correlations with temperature at stations (1957–2002, ordered by their distance from the gridpoint closest to Mike; gray bars in the left graph of each panel), and mean absolute differences in running correlations between stations and CRUT (black), and between stations and RIT (blue; right graph of each panel). Stations in red are used for reinterpolation. Correlations with temperature at the CRU gridpoint and reinterpolated temperature (RIT) are shown by black and blue dashed vertical line, respectively. Left panel is for NAO, right panel for EU1.

3.3 Example 2: Songpan

The other station analyzed in more detail is Songpan, central China (32.65°N, 103.57°E, 2,852 m a.s.l); the closest gridpoint is located at 32.75°N, 103.75°E. It is worthy to note that unlike Mike, Songpan is not included in the gridding of the CRU data set. The site exhibits considerable differences in running correlations for almost all circulation modes (Figure 8), the largest maximum differences being detected for NA (0.50), NAO (0.35), and EU1 (0.33). We present the analysis for the NA and EU1 modes because their running correlations with temperature are significantly different from zero for a considerable time. Running correlations with NA are systematically overestimated at the gridpoint; the opposite behavior is detected for EU1. The same approach as for Mike is applied to uncover causes behind the differences between the station and the closest gridpoint.

image
As in Figure 4, but for Songpan.

Figure 9a shows correlations with NA and EU1 at all stations contributing to the gridpoint closest to Songpan at any time during 1901–2016. Stations form two distinct groups: those located west, northwest, and south of Songpan correlate positively with EU1, while their correlations with NA are near zero or negative. Stations north, east, southeast, and far southwest of Songpan behave in an opposite way: they correlate positively with NA, but have near zero correlations with EU1. The difference between the two groups appears to be related to elevation (Figure 9b): stations at high elevations form the first group, while stations in the second group are low lying. The Songpan station is situated on the border between the two groups, but at a high elevation. The four stations with the largest influence on CRUT (2, 3, 4, and 5) belong to the second group; that is why the behavior of running correlations of CRUT follows them, although running correlations at the station are similar to the first group, likely because of a similar elevation.

image
As in Figure 5, except for Songpan. The Songpan site is highlighted by gray square in (a). For station names see Figure 11. For stations contributing to interpolation to the CRU gridpoint in 1957–2002, the elevation is shown in color in (b) left.

The influence of stations in the second group is evaluated by reinterpolation, to which only selected stations from the first group enter (highlighted by squares in Figure 9a). The agreement of RIT with the time series at Songpan is substantially improved relative to CRUT for both modes (Figure 10, top). The match of the time series for NA is very close until 1993 when the decreasing correlation at Songpan is not captured by most stations incorporated in RIT (Figure 10, bottom). A stronger correlation of RIT than of station temperature with EU1 prior to 1982 is caused by the simple fact that most of the contributing stations have a stronger correlation with the mode which, however, decreases in the 1980s.

image
As in Figure 6, but for NA and EU1 at Songpan. UDEL stands for the University of Delaware data set. Numbers refer to locations shown in Figure 9.

RIT has a larger similarity with Songpan station than CRUT in both the overall correlation and the average difference of running correlations (Figure 11). On the contrary, CRUT is obviously more similar to stations in the second group (mainly to stations 2, 3, and 4). Thus, the reinterpolation proves that CRUT is substantially affected by lowland stations east of Songpan, the absence of which in reinterpolation results in a better congruence with the Songpan station. The influence of elevation is further examined with the use of the University of Delaware Air Temperature & Precipitation data set (UDEL; Willmott & Matsuura, 2001), which incorporates a combination of interpolation methods, including digital elevation model, so the data set is expected to better account for the effect of elevation. Running correlations with the gridpoint in UDEL (in the identical position as in the CRU data set) are fairly similar to the station (Figure 10), which may be caused by either different stations used for interpolation or the utilization of more complex interpolation methods including an elevation model. In any case, it proves that both elevation and inadequate amount of stations entering interpolation in the CRU data set may cause a different behavior of relationships with circulation modes between the station and gridpoint.

image
As in Figure 7, but for Songpan. Numbers refer to locations shown in Figure 9.

3.4 Other Examples

Some stations seem to have unique and very local relationships with atmospheric circulation, which are hard to capture even by interpolation of the close and most similar stations. For instance, the reinterpolation is conducted for various groups of selected stations for the gridpoints nearest to Daggett (western North America) and León (northern Spain). At both gridpoints, only a partial improvement of relationships to circulation is achieved, as the bias between relationships with circulation modes is still present even after the reinterpolation (Figure 12). Thus, a strong signal of a specific local climate cannot be fully reproduced in a gridded data set on some occasions.

image
Running correlations of temperature at León (site E in Figure 3) and Daggett (site C in Figure 3), and of CRUT and RIT at the closest gridpoint, with EA and PNA, respectively. The time series are shorter due to the insufficient data needed for reinterpolation.

4 Discussion and Conclusions

We investigate relationships of temperature at 634 stations and at their closest gridpoints in the CRU data set with nine circulation modes in the Northern Hemisphere Extratropics using running correlations. At the majority of the station-gridpoint pairs, running correlations are in a high agreement; however, some notable exceptions occur mainly over southeastern Asia, the Black Sea, the Iberian Peninsula, foothills of the Alps, and western North America. Some station-gridpoint pairs in these regions exhibit a considerable discrepancy between running correlations for even five and more circulation modes.

We identify the cause of the discrepancy in local geographical and climate settings and illustrate it on two examples. One example is Mike, an isolated ship station in the North Sea. Temperature at the gridpoint nearest to Mike is strongly influenced by rather distant stations on the Scandinavian coast with different relationships with circulation, while the influence of the station itself is suppressed although ADW assigns more than double weight to it compared to other stations. The stations located to the west of Mike (on Iceland, the Faroe Islands, and Shetlands) do not enter the interpolation because of their larger distance, despite their similar relationship with circulation. The isolated island position results in rather large differences between the station and gridpoint also, e.g., in Honolulu (21.32°N, 157.93°W). However, the isolated position itself does not automatically imply a disagreement between station and gridded data: some island sites exhibit negligible differences in running correlations (e.g., Jan Mayen, 70.93°N, 8.67°W).

The bias at the other site, Songpan, is explained by the specific terrain setting. Although the station is located at a high elevation near the edge of a mountain plateau, close lowland stations with rather different climate conditions are used in the interpolation, while stations at high elevation and with climate conditions similar to the site are mainly excluded from the interpolation because of their larger distance from the gridpoint. Similar effects can be induced by other geographical barriers, such as mountain ridges (e.g., around the Alps, the Rocky Mountains, and the Scandinavian Mountains), large lakes or inland seas (such as near the Black Sea), or generally vast areas without station data (such as in less populated areas in northern and central Asia).

Gridpoints do not perfectly correspond to the nearest station because they represent spatial average created by interpolation of stations from or even outside the given radius. Interpolation brings some smoothing into gridpoint values, which causes the discrepancy between majority of gridpoints and station observations. Nonetheless, the influence of smoothing on relationships with atmospheric circulation is fairly small because the running correlations with circulation modes are in a high agreement at majority of the sites. For examples discussed above, the spatial setting of station network is more important, leading to overrepresentation of some stations due to either altitude or isolation of the site. In some cases, we cannot even acquire the relationship with atmospheric circulation similar to the nearest station, in spite of various groups of stations entering interpolation. Hypothetically speaking, one way how to improve gridpoint values (and consequently relationships with atmospheric circulation) is to increase the number of observations entering interpolation, which is, however, problematic given the strict quality control rules imposed on station data. The other way might be a modification of the ADW procedure, in which the distance is the only criterion for the selection of stations used for interpolation now. Other criteria might be included in the procedure, such as elevation, variable searching radius, and intercorrelations of station anomalies, all of which should prevent the overrepresentation of some region in a gridpoint. A different interpolation method brings different results: e.g., the interpolation method incorporating elevation in the UDEL data set results in a larger congruence of the Songpan gridpoint/station pair; on the contrary, the other issues may emerge in the data set over other regions. Although our study uses a specific criterion for evaluation of the gridded data set, the results are relevant also to other gridded data sets, particularly to those created by ADW. Moreover, different issues may arise in a case of considering other climate variables (precipitation, humidity, etc.).

A high station density is supportive to the agreement in correlations with atmospheric circulation between stations and near gridpoints, but does not always guarantee this agreement, as examples from the Iberian Peninsula, northern Scandinavia, and western United States demonstrate. Our study suggests that although gridded data sets are generally a good proxy for station data when relationships with atmospheric circulation are analyzed, one must take the gridded data with caution and examine their suitability (i.e., their correspondence with station data in the analyzed region) before they are used.

Acknowledgments

We would like to thank Phil Jones and Ian Harris, Climatic Research Unit at the University of East Anglia, for their helpful comments on the manuscript. This research was supported by the Czech Science Foundation, project 17-07043S. Martin Hynčica was also supported by the Grant Agency of the Charles University, student project 426216.

    Data Availability Statement

    We acknowledge ECMWF for providing the ERA-40 reanalysis data, which were downloaded from https://apps.ecmwf.int/datasets/data/era40-moda/levtype=pl/ [downloaded 2016-10-18]. We further acknowledge CRU for providing the CRU TS gridded data set and station time series used for the creation of the data set (both acquired from https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.01/ [downloaded 2017-09-08]). Google Earth interface of CRU TS can be accessed from https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.04/ge/ [downloaded 2019-05-18] (the original interface for 4.01 is not available anymore). UDEL data are provided by the NOAA/OAR/ESRL PSL, Boulder, Colorado, United States, from https://psl.noaa.gov/data/gridded/data.UDel_AirT_Precip.html [downloaded 2020-08-03]. The GHCN station data (provided by NOAA) were obtained from https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets/global-historical-climatology-network-monthly-version-3 and https://gis.ncdc.noaa.gov/maps/ncei/summaries/monthly (in a map format) [downloaded between 2018-02-24 and 2018-03-02].