Volume 5
Issue 2
Agronomy
JOURNAL OF
POLISH
AGRICULTURAL
UNIVERSITIES
Available Online: http://www.ejpau.media.pl/volume5/issue2/agronomy/art06.html
GEOSTATISTICAL APPROACH TO DATA FROM FIELD EXPERIMENTS WITH CHECK PLOTS
Janusz Gołaszewski
The paper suggests some indicators for the application of spatial methods in field experimentation. The indicators were based on the data from two fieldbreeding experiments with pea and field bean. Partially balanced square lattice designs were applied. The Smith’s index of soil heterogeneity b, chemical properties of the soil e.g. pH, Mg, P and K contents as well as data obtained from check plots sown with a single variety were used to evaluate spatial variation across the experiments. The Smith’s index of soil variability b showed a potential as a convenient tool to assess the purposefulness of background variation analysis by applying spatial methods. When b<0.6 one can expect a significantly increased efficiency of the experiment. Therefore the application of the nearest neighbour analysis or kriging to the data obtained from a net of check plots can produce the concomitant variable which can reduce the experimental error effectively.
Key words: heterogeneity of the soil, homogeneity of the soil, kriging, nearest neighbour analysis, NNA, semivariogram, Smith’s index of soil variability, soil variability, spatial variation.
In agricultural research, the key question to answer is generally expressed as a hypothesis to be verified through field experimentation. The real environmental conditions of an experimental site are produced by an almost infinite number of biotic and abiotic factors like soil origin, its fertility, moisture, plant damage by insects, diseases, etc. All of them, together with inherent variability that exists in the experimental material to which treatments are applied, cause differences in yields from plot to plot, even when sown with a single cultivar. Differences across experimental plots treated alike constitute the experimental error and may form a particular spatial structure. For field experimenters it is evident that in order to measure the error validly, randomisation of treatments in a number of replications should be applied. Thus a researcher conducting the experiment in natural conditions knows also that, to some extent, the error can be controlled by blocking and proper arrangem ent of plots in the field.
The literary coverage on agricultural experimentation shows that, out of many environmental factors affecting field experiment results, only soil variability seems to play a pivotal role in adequate treatment effect estimation. The term “soil variability” is hard to define unequivocally. Soil is said to be homogeneous, heterogeneous or very heterogeneous. If soil is homogeneous the information from each plot is independent, which is easily interpretable. Excluding possible technical mistakes, all the spatial interference, if there is any, can be attributed to inherent properties of the experimental material, such as competitiveness between morphologically different breeding forms.
Field experimenters are believed to face a problem when soil is heterogeneous. It is then necessary to consider technical aspects of establishing the experiment and blocking is most frequently considered to be sufficient enough to control soil variability.
In practice, field experimenters focus mainly on the significance of treatment differences, even if the soil is heterogeneous. All the other valuable experimental conditions are neglected. Little attention, if any, is given to possible advantages of detailed result analysis.
When can it be decided if the soil is homogeneous or heterogeneous? How to decide if additional analytic activities should be undertaken or maybe they should by dome as part of a routine?
The term soil variability comprises all sources of spatial variation and should be considered as the entirety of environmental factors and human activities as well as their interactions. Thus, regarding a given experiment and its location in the field, it can be said that in general soil variability is the current state of soil conditions together with the external factors affecting the soil. For that current state of the soil the plots layout and treatments are planned as defined by the experiment design. The sequence of the analyses should be the same; first the analysis of the background of the experiment and then formal verification of the working hypothesis.
Let us consider and verify two seemingly improbable hypotheses based on the results obtained from two plant breeding experiments with check plots. In field experiments blocking is not essential for valid control of soil variability. The experimental design applied is not a prerequisite for final statistical field experiment analysis.
Characteristics of the field experiments
The considerations based on the data from two field breeding experiments of 1998 with pea and field bean located at the Tomaszkowo Experiment Station of the University of Warmia and Mazury in Olsztyn. A highly variable soil of the experimental field is typical for the Warmia and Mazury, Poland’s northeastern region.
Partially balanced square lattice designs (P_IB for pea and F_IB for faba bean) with four replications laid out in four experimental strips were applied in both experiments. Twentyfive pea breeding forms were tested in each replication and the check plots were situated at every 2, 3, 4 and 5 test plot in the subsequent replications. The pea forms were morphologically very different and so spring wheat was used as an intercrop between the plots to reduce the interplot interference to the minimum. In the field bean experiment, 49 forms were tested and check plots were spaced regularly at every 7^{ }plot in each replication.
Before setting up the experiment, soil samples were taken for chemical soil analyses, e.g. acidity (pH) and content of available nutrients (P_{2}O_{5}, K_{2}O, Mg). A 4x6 m measuring net with total 98 sampling points in pea experiment and the 4 x 5 m measuring net with total 100 sampling points in faba bean experiment were applied. The plant height was recorded prior to harvest, after which the plants were threshed and the seeds were weighed.
Statistical result analysis included variance analysis, completely randomised design (CRD), randomised block design (RBD), incomplete block design (IB) together with the analysis of covariance with concomitant variables determined according to the Papadakis’s method (NNA) [12] and Bartlett’s [2] iterative approach and kriging [19]. The methods efficiency was determined [15].
Smith’s index of soil variability
In 1938 Smith [14] suggested a single measure to describe soil heterogeneity, known as index of soil variability or Smith’s index of soil variability. Based on 44 uniformity trials with a broad spectrum of species, Smith formulated the law referred to in literature as Smith’s variance law. The law assumes that variance of the plots consisting of x basic units (V_{x}) is proportionally related to the variance of basic units (V_{1}) and inversely proportional to the number of basic units in the plot raised to a power of b.
The parameter b in the equation is the index of soil variability which can assume the values from 0 to 1. The larger the value, the more homogeneous the soil, and inversely. It is worth noticing that b corresponds to all sources of environmental variation, not only to the soil variability. The parameter b is calculated as a coefficient of a logarithmic regression
Uniformity trials used in methodological works, although highly valuable in determining the character of soil variability, are rarely used in practice mainly due to high costs. The next suggestion by Koch and Rigney [7] and Lin and Binns [9] was to calculate b from the results of regular experiments with blocking. The idea of Smith’s variance law has been maintained, but here the variance of plots consisting of x basic units is the variance of plots of the size of block, while the variance of basic units is the variance of single plots. For a twofactor experiment in the splitplot design there exists three categories of plots: plots of the size of blocks, subblocks and plots.
Koch and Rigney [7] suggested the following procedure to estimate b from incomplete block designs.
Estimation of the expected mean squares according to a proper ANOVA (Table 1)
Calculation of comparable variances
Logarithmic regression with plots of the size of replication, block and plot (plot size can be expressed as a real plot size in m^{2} or as a number of single plots building up the replication and block).
Table 1. Variance components for incomplete block design 
Variation 
d.f. 
MS 
EMS 
Replications 
r 1 
V_{1} 

Blocks _{adj.} 
r (b1) 
V_{2} 

Treatments 
bc1 

Error 
cr (b1)  (bc1) 
V_{3} 
*some effects of blocks are present; parameter l takes different values according to the type of IB design 
Lin and Binns [9] suggested a threestage procedure for the estimation of b from the randomised block design (RBD):
Estimation of the expected mean squares from ANOVA of RBD
Estimation of intrablock correlation r
 Estimation of b following the formula
The intrablock correlation can be calculated for incomplete blocks in a similar way as a measure
of correlation of plot means inside blocks r_{1}
of correlation of block means inside the replication r_{2}
Table 2 presents variance components, coefficient of intrablock correlation and the indices of soil variability calculated for plant height and seed yield. To make some generalization, the results were supported by the results from two other experiments, conducted earlier with the same species in the same field, laid out with balanced lattice designs; abbreviation: P_BIB stands for pea experiment and F_BIB for faba bean experiment. The results in Table 2 confirm a high similarity of the indices of soil variability regardless of the calculation method used. Excluding the experiment P_BIB, the indices were also similar for the traits studied.
Table 2. Variance components, intrablock correlations and indices of soil variability in breeding experiments with pea and faba bean 
Symbol 
Variance components^{1} 
According to 
According to 

r_{1} 
r_{2} 
b 
r 
b 

Plant height 

P_BIB 
49.6 
0.00 
135.9 
0.27 
0.65 
0.36 
0.27 
0.38 
P_IB 
2.40 
37.3 
458.4 
0.08 
0.02 
0.89 
0.02 
0.87 
F_BIB 
48.7 
17.2 
64.7 
0.50 
0.65 
0.22 
0.44 
0.20 
F_IB 
47.9 
241.2 
69.9 
0.81 
0.16 
0.36 
0.54 
0.15 
Seed yield 

P_BIB 
5756 
3242 
57854 
0.13 
0.28 
0.63 
0.10 
0.62 
P_IB 
14655 
0 
124281 
0.11 
0.38 
0.60 
0.11 
0.61 
F_BIB 
22548 
7224 
29847 
0.50 
0.66 
0.22 
0.44 
0.20 
F_IB 
7745 
20717 
19096 
0.60 
0.25 
0.36 
0.36 
0.25 
^{1}0 – value lower than 0 
The coefficient of the intrablock correlation was interrelated with the index of soil variability. Figure 1 demonstrates the empirical relation between values of rho and b. Generally, the higher the rho, the lower the b, which means the higher the correlation between adjacent plots, the higher the soil variability.
For the four experiments, the conclusion about soil variability seems to be explicit. The soil in the experiments with pea was much more homogenous than in the experiments with faba bean.
Fig. 1. Interrelationship between the coefficient of intraclass correlation and the index of soil variability in pea and faba bean breeding experiments 
What is the practical aspect of the statement and in what way can the value of b be useful? So far, the index of soil heterogeneity has been used for planning future experimental activities in the same field, such as
Optimisation of a plot size (Smith’s cost law [14])
Determination of a convenient plot size [6]
Determination of the capacity of a block and the required number of replications [9].
Determination of the shape of plots in a block [21].
The magnitude of the index of soil variability points to purposefulness of the analysis of spatial variation in the experiment and a possible gain in the efficiency of the experiment due to alternative analysis used [4].
Modelling of background variation
For modelling the background variation neighbour analysis and kriging were used. The first one was calculated according to the Papadakis’s nearestneighbour first difference [12]. Essentially, this technique involves subtracting the mean treatment yield from the yield of each plot and, subsequently, using the average of the residual yields of adjacent plots as the concomitant variable in the analysis of covariance. The iterative approach suggested by Bartlett [2] was applied. The iteration was continued until the nearestneighbour local trends for each treatment averaged to zero.
Wilkinson et al. [20] discussed some limitations of the iterated nearestneighbour analysis. These include loss of efficiency due to yield correction with the treatments means, and upward bias in the treatment Fratio. However, these limitations are usually not significant unless there are substantial nonlinear trend effects in the experiment.
Kriging is a means of spatial prediction that can be used for soil and agricultural properties. It is a form of weighted local averaging. It is optimal in the sense that it provides estimates of values at unrecorded places without bias and with minimum and known variance. It is worth noticing that there are several other interpolation methods such as linear interpolation, inverse distance, least squares polynomials, etc., but they are often theoretically unsatisfactory. They may give biased interpolation, they provide no estimate of the error of interpolation. Neither do they attempt to minimize that error.
Kriging is based on the theory of regionalized variables developed by Matheron [10,11] and Krige [8]. For an indepth study of geostatistical methods, books by Journel and Huijbregts [5] (mining) and Webster and Oliver [19] (pedology) can be used as a source of reference.
The first stage in kriging is the measurement of spatial variation in a property of interest. This measure is called a semivariance.
Consider a transect along which observations have been made at regular intervals to give values z(i), i = 1,2,..,N, then the relation between pairs of points, h interval apart, can be expressed as the variance of the differences between all such pairs. So, the perobservation variance is half this value thus:
[1] 
For example, the estimate of semivariance for a single transect with no missing observation when h=1 is:
[2] 
A general form of this equation is given by:
for i = 1,2,..., N(h) [3] 
The expression gamma(h) is known as the semivariance, and is a measure of similarity, on average, between points a given distance h, apart. The more alike are the points, the smaller is gamma(h) and vice versa.
These equations refer to the single transect but the generalization of that formula to the twodimensional area is quite straightforward. Besides, different directions for semivariance calculation can be taken. It can be done vertically or horizontally, it can be done for diagonals or it can be done for all possible directions. The limitation is the number of pairs, N(h), for a given distance, h. To obtain a valid estimation of semivariance, it should be minimum about 2030 [18].
As above, gamma depends on h, and the function relating the two is known as the semivariogram. The results of the works by Trangmar et al. [17], Perrier and Wilding [13], Stroup et al. [16] suggest that the typical semivariogram model for the agricultural studies is mainly linear or spherical. These models have certain important characteristics: (i) it shows the nature of the geographic variation in the property of interest, and (ii) it is needed to provide kriged estimates at previously unrecorded points.
In most instances gamma (h) increases with increasing h to a maximum, approximately the variance of the data. The distance a is known as the range and it is assumed that points closer together than the range are spatially dependent; points further apart bear no relation to one another. The intercept C_{0}, when h=0, is known as nugget variance and this phenomenon is known as nugget effect. Practically, the nugget effect embraces fluctuation of the property that occurs over distances shorter than the sampling interval and also the measurement errors. The component C represents the range of variance due to spatial dependence in the data. The sum of the nugget variance C_{0} and the component C is known as a sill, when variance is stabilising.
The chemical properties of the soil as indicators of soil fertility and the information on plant height and seed yield from check plots sown with a standard variety were used to describe the structure of spatial variation across the experiments. The semivariances of the properties were estimated and finally kriging was used to predict proper values for each plot.
The maps (Fig. 2) show spatial distribution of pH and available macronutrients across the experimental site. The first 16 m of the field width correspond to the width of four experimental strips of the experiments. It was only for the Mg content in the experiment with pea and for all the properties in the experiment with faba bean that spatial dependence could be noticed visually from the maps. The distribution of semivariances versus distance h confirmed these remarks (Fig. 3). The linear model was fitted for the Mg content in the pea experiment and spherical models for all the soil properties in the other experiment (Table 3).
Fig. 2. Contour map of soil properties of pea (a) and faba bean (b) experimental sites 
Fig. 3. Semivariograms of soil properties for pea (a) and faba bean (b) experimental sites 
a)  b) 
Table 3. Semivariogram parameters of soil properties in the grain legumes experiment 
Trait Effect 
a(in m) 
C_{0} 
C_{1}* 
C_{0}+C_{1} 

Chemical properties 

P_IB 
pH pure nugget effect 
 
 
 
 
Mg linear 
 
0.105 
0.008 
 

P_{2}O_{5} pure nugget effect 
 
 
 
 

K_{2}O pure nugget effect 
 
 
 
 

F_IB 
pH spherical 
18.0 
0.005 
0.070 
0.075 
Mg spherical 
22.5 
0.450 
2.185 
2.635 

P_{2}O_{5} spherical 
19.5 
2.200 
16.494 
18.694 

K_{2}O spherical 
17.4 
0.900 
7.228 
8.128 

Traits noted from check plots 

P_IB 
plant height pure nugget 
 
 
 
 
seed yield pure nugget 
 
 
 
 

F_IB 
plant height linear 
 
154 
6.451 
 
seed yield pure nugget 
 
 
 
 
* slope in the case of linear model 
Semivariances estimated on the basis of the values of plant height and seed yield from check plots demonstrated spatial dependence only for plant height in the experiment with faba bean (Fig. 4a). The semivariances of plant height and seed yield for pea and seed yield for faba bean showed random variation. The distribution of the values predicted by kriging for the traits are presented in the contour maps (Fig. 4b).
Fig. 4. Semivariograms (a) and contour maps after kriging (b) for plant height and seed yield of standard variety in pea and faba bean check plots 
Table 4 contains the mean square errors from the proper ANOVAs and ANCOVAs. Smith’s indices of soil variability b>0.6 and b<0.3 were taken for pea and faba bean experiments, respectively. It can be assumed that the soil in the experiment with pea (P_IB) was homogenous and information from adjacent plots was independent. All the methods produce similar values of MSEs. It means that an application of a much advanced design when we know bvalue before laying out the experiment and much more sophisticated data analysis after the execution of the experiment will not be very effective, albeit globally, alternative approaches give certain advantages in the sense of recognising the background variation and reducing MSE in relation to the completely randomised design (up to 10%, depending on the method and the trait analysed). On the other hand, in the experiment with faba bean (F_IB) in which the value of b was low (b<0.3), the value s of MSE from the methods were highly different. In comparison with the MSE from the completely randomised design (CRD), all the methods significantly reduced the experimental error.
Table 4. Mean square error from ANOVA and ANCOVA of alternative approaches by nearest neighbour analysis (NNA) and kriging (KR) 
Variance analysis 
P_IB 
F_IB 

Plant height 
Seed yield 
Plant height 
Seed yield 

Smith’s b>0.6 
Smith’s b<0.3 

ANOVA 




CR 
498 
136335 
359 
47559 
RBD 
485 
121572 
281 
37225 
IB 
458 
124281 
70 
19098 
CR with standard 
527 
145776 
110 
30302 
ANCOVA 




NNA (II iteration) 
467 
120751 
69 
16382 
KR (pH, Mg, P_{2}O_{5,} K_{2}O) 
500 
121519 
194 
33733 
KR (plant height of standard) 
500 
129178 
153 
32500 
KR (seed yield of standard) 
 
125718 
 
24404 
The greatest reduction of MSE was obtained for ANCOVA with residuals from adjacent plots according to NNA as the concomitant variable. It is worth noticing that in the estimation of the faba bean seed yield additional information from check plots and, subsequently, a proper analysis of covariance led to the value of MSE similar to the analogous value from the classical analysis of the incomplete block design.
Table 5 presents the indices of relative efficiency (RE) of the methods applied in relation to the completely randomised design for which the value equals one. The NNA had the highest efficiency in the estimation of pea plant height (but only 6% to CRD). As for the estimation of seed yield, only the methods with kriging produced values of RE on the level of RBD efficiency (about 10%). The similar estimates of MSE in different data analyses do not ensure that randomisation and blocking will adequately compensate for spatial effects [22].
Table 5. Relative efficiency (RE) of different approaches in analysis of the pea and field bean results (in relation to completely randomised design (CRD) 
Variance analysis 
P_IB 
F_IB 

Plant height 
Seed yield 
Plant height 
Seed yield 

Smith’s b>0.6 
Smith’s b<0.3 

ANOVA 




RBD 
1.01 
1.09 
1.21 
1.21 
IB^{*} 
1.02 
1.00 
3.47 
1.71 
CRD with standard 
0.94 
0.94 
3.25 
1.57 
ANCOVA 




NNA (II iteration) 
1.06 
1.00 
5.17 
2.24 
KR (pH, Mg, P_{2}O_{5,} K_{2}O) 
1.00 
1.12 
1.85 
1.41 
KR (plant height of standard) 
1.00 
1.06 
2.35 
1.60 
KR (seed yield of standard) 
 
1.08 
 
1.63 
^{*} to RBD 
In the faba bean experiment the analysis of variance for IB design and CRD with standard were highly effective and in the context of the alternative approaches to data analysis they set the threshold of efficiency on a high level. All the methods gave significantly improved efficiency. The most effective method was NNA, especially in plant height estimation, in which it was over 5 times as efficient as the completely randomised design. Kriging showed a similar efficiency as classical ANOVA for the experiment with check plots sown with the standard variety and slightly lower than for classical analysis of incomplete block design. It clearly proves that as for a high variability of the experimental site, the NNA method or kriging can be a good alternative for classical approaches to data analysis from plant breeding field experiments. This suggestion is in accordance with the one of Ball et al. [1]. On the basis of breeding trials with spring wheat, the authors concluded that the analysis wit h NNAadjusted data in comparison with unadjusted RCB analysis resulted in larger estimates of variance components. When spatial effects occur, Ball et al. [1] suggested that plant breeders should consider spatial methods as a supplemental tool in effective data analysis.
With a high soil variability of the experimental site, a precise analysis of the background variation should precede treatment comparisons.
Smith’s index of soil variability calculated prior to or for the present experiment can be used as a convenient tool to assess the purposefulness of analysis of the background variation in the experiment to improve the efficiency of the experiments by applying the alternative methods of data analysis. Depending on the magnitude of b, one can expect:
Blocking as an experimental tool of the allocation of treatments to experimental units and the reduction of interblock soil variability from the errors involved in comparing treatments can be supported by alternative methods, like NNA or kriging, for local control of spatial variation.
Kriging on the basis of a net of check plots placed on experimental units inside the experimental strips can produce one of the concomitant variables facilitating efficient treatment comparison.
Quick methods for evaluation of purposefulness of correcting data on spatial variability should be incorporated.
b > 0.6 – little gain, if any
0.3 < b < 0.6 – the alternative approaches should be considered
b < 0.3 – very significant efficiency improvement.
REFERENCES
Ball S.T., Mulla D.J., Konzak C.F., 1993. Spatial heterogeneity affects variety trial interpretation. Crop Sci. 33, 931 935.
Bartlett M.S., 1978. Nearest neighbour models in the analysis of field experiments (with discussion). J. Royal Stat. Soc. 40 B, 147174.
Gołaszewski J., 1999. Application of geostatistical methods to analysis of the data from a pea breeding trial. Biometrical Letters 36 (2), 145157.
Gołaszewski J., 2000. Statistical treatment of spatial variability in field trials. Natural Science 5, 159176.
Journel A.G., Huijbregts C.J., 1978. Mining geostatistics. Academic Press, London.
Hatheway W.H., 1961. Convenient plot size. Agronomy J. 53(4), 279280.
Koch, E.J., Rigney J.A., 1951. A method of estimating optimum plot size from experimental data. Agronomy J. 43, 1721.
Krige D.G., 1966. Twodimensional weighted moving average trend surfaces for oreevaluation. J. South African Inst. Mining & Metallurgy 66, 1338.
Lin C.S., Binns M.R., 1984. Working rules for determining the plot size and number of plots per block in field experiments. J. Agr. Sci. 103, 1115.
Matheron G., 1963. Principles of geostatistics. Economic Geology 58, 12461266.
Matheron G., 1971. The theory of regionalized variables and its applications. Cahiers du Centre de Morphologie Mathématique, Fointainbleau 5.
Papadakis J.S., 1937. M'ethode statistique pour les experiences du champ. Institute d'Amerlioration des Plantes a Thessaloniki, Bull. Scientifique 23.
Perrier E.R., Wilding L.P., 1986. An evaluation of computational methods for field uniformity studies. Adv. in Agronomy 39, 265312.
Smith H.F., 1938. An empirical law describing heterogeneity in the agricultural crops. J. Agr. Sci. 28, 123.
Steel R. G. D., Torrie J. H., 1980. Principles and procedures of statistics. A biometrical approach. McGrawHill Book Company, New York.
Stroup W.W., Baenzinger R.S., Mulitze D.K., 1994. Removing spatial variation from wheat yield trials: a comparison of methods. Crop Sci. 86, 6266.
Trangmar B.B., Yost R.S., Uehara G., 1985. Application of geostatistics to spatial studies of soil properties. Adv. in Agronomy 38, 4594.
Webster R., Burgess T.M., 1984. Sampling and bulking strategies for estimating soil properties in small regions. J. Soil Sci. 35, 127140.
Webster R., Oliver M.A., 1990. Statistical methods in soil and land resource surveys. Oxford University Press.
Wilkinson G. N., Eckert S. R., Hancock T. W., Mayo O., 1983. Nearest neighbour (NN) analysis of field experiments (with discussion). J. Royal Stat. Soc. 45 B, 151211.
Zhang R., Warrick A.W., Myers D.E., 1994. Heterogeneity, plot shape effect and optimum plot size. Geoderma 62, 183197.
Zimmerman D.L., Harville D.A., 1991. A random field approach to the analysis of fieldplot experiments and other spatial experiments. Biometrics 47, 223239.
Janusz Gołaszewski
Department of Plant Breeding and Seed Production
University of Warmia and Mazury in Olsztyn
Pl. Łódzki 3,10724 OlsztynKortowo, Poland
email: januszg@uwm.edu.pl
Responses to this article, comments are invited and should be submitted within three months of the publication of the article. If accepted for publication, they will be published in the chapter headed ‘Discussions’ in each series and hyperlinked to the article.