ADVERTISEMENTS:
The following points highlight the top seven statistical procedures of biostatistics. The procedures are: 1. Calculation of Average or Central Tendency 2. Variability 3. Standard Deviation 4. Standard Error of Mean 5. Significance of Difference in Means of Small Samples or Student’s ‘t’ Test 6. Chi-Square Test (x2-Test) 7. Analysis of Variance (ANOVA).
Procedure # 1. Calculation of Averages or Central Tendency:
The average in general term describes the centre of a series of measurements, like weight, height, or any other numerical features. The average or central value helps to compare it with similar type of observations on other groups and serves to compare between two dissimilar series. There are three types of independent measures to determine central tendency, like- mean, median and mode.
A. Mean:
ADVERTISEMENTS:
The means are of three types, Arithmetic, Geometric and Harmonic.
i. Arithmetic mean of the observed scores is obtained by summing up all the observations and dividing the total by number of observations. A series of observations is indicated by the letter X, and individual observations by X1, X2, X3, ……………., Xn and mean by (X), total number of observations by n and the sum of observation by Σ.
Mean = Total of Σ of observations/Total no. of observations.
or X = X1+ X2 + X3— + Xn/n = Σxi/n
ADVERTISEMENTS:
This formula is good for small size ungrouped series. As for example, the number of orange per plant be 5, 6, 7, 8, 9, 5, 4, 5, 6, 7, 5, 5, then
ΣX = (5 + 6 + 7 + ……….. + 5)
= 72 N = 12
Arithmetic mean of ungrouped series,
X = 72/12 = 6
If the number of observations be large, they have to be grouped and frequency distribution table is to be prepared first. In all grouped series weighted mean is to be computed and not the ordinary mean.
To find the value or weight, contributed by each group separately, multiply the group value or mid value of group with frequency. Make total of these values and then divide the total by the total number of observations in the sample. This mean is known as “weighted mean” or “grand mean”.
Example:
If average income of 12 lady doctors is Rs. 1400 per month and that of 22 male doctors is Rs. 1600 per month, what is the mean income or average income of all doctors?
ADVERTISEMENTS:
The average income might be calculated as Rs. 1500 per month for all doctors by (1400 + 1600/2 = 1500), but in fact this is not so; to calculate the real average income contributed by each group of doctors should be considered.
Total income of all lady doctors = (x̅1 – n1 = 1400 x 12) = Rs. 16800.00
(mean income of the group-l = x̅1 multiplied by frequency n1).
Total income of all male doctors = (x̅2 x n2 = 1600 x 22) = Rs. 35200.00
ADVERTISEMENTS:
(mean income of the group-II = x̅2 multiplied by frequency n2).
Total income of all the doctors = 16800 + 35200 = 52000.00 (sum of income Σx̅1 x̅2)
The average income of the doctors =
x̅ = n1x̅1 + n2x̅2/n1 + n2 = 52000.00/34 = 1529.4118
ADVERTISEMENTS:
ii. Geometric mean:
The geometric mean is the nth root of products of n items of a series. The geometric mean of items 4 and 25 will be √4 x 25 = √100 = 10. The equation for the geometric mean is,
iii. Harmonic mean:
ADVERTISEMENTS:
It is useful for measurements on a reciprocal scale. Harmonic mean is the reciprocal of average of reciprocals of the observed scores (x) of a sample when x score is higher than ‘o’ in value. When v is the sample size.
B. Median:
When all the observations of any experiment are arranged in order of magnitude either ascending or descending the middle of observations is known as median. Median is a better indicator of an attribute than mean when the lowest and highest observations are wide apart or not so evenly distributed.
Example:
In the absentee list of children in a school in the series 4, 6, 8 (10), 12, 14, 32 the median value (10) is a better indicator of average than mean (86/7 = 12.3). To have a better idea of average, one should ignore unduly high observation like 32 in this series.
ADVERTISEMENTS:
Mean of the remaining will be 54/6 = 9.00 which is much nearer to the median (10) than 12.3. Median = (n + 1/2)th item, when total number of observations is n. Here (n + 1/2) is the observation (item) number and numerical values of this item denotes the median.
C. Mode:
The maximum repeated observations in the series is the modal number; on that basis the population may be unimodal, bimodal or multimodal. In the series 5, 4, 3, 2, 2, 8, 7, 2, 9, 10, 11, 2 is repeated 3 times, whereas others once only. So modal value would be 2 and distribution of the population is unimodal.
Procedure # 2. Variability:
The biological information, whether qualitative or quantitative, expressed in numerical number are very much variable. This variability is an essential feature of many types of materials. Such variability are of three types — biological, real and experimental.
Biological variability is the variation among individuals within the same group or category in respect to certain variety. Real variability is the difference between two readings or observations among the different classes more than defined limits in nature. Experimental variability is the difference developed due to sampling defects, or instrumental defects or measurement defects or personal defect.
Any type of deviation from the arithmetic mean is termed as variability. The sum of squared deviations from the arithmetic mean gives the total variance. Deviations from the arithmetic’s if denoted by “di” which is equal to xi-X̅, Total variance
Mean of deviations’ square gives the variance of the sample. That is
Variance (S2) = Σdi2/ n – 1
The divisor “n – 1” is termed as degrees of freedom (df). The degrees of freedom correspond to the number of independent deviations that are available from the data or can be calculated by deducting from the number of values available from the number of constants that are calculated from the data. Here mean is such a constant and hence the number of degrees of freedom is one less than n, the no. of observations.
When the data are grouped in the form of a frequency distribution, the variance
Procedure # 3. Standard Deviation (SD):
The standard deviation (s) is the square root of the variance. Thus
SD is computed by the following formulae from the original and ungrouped size scores (X1) when total sample is large (N ≥ 30)
Example:
Computation of the variance and SD of 12 persons with mean respiratory rate per minute being 22, 22, 23, 20, 24, 20, 16, 17, 18, 19, 18, 21.
Arrange the data:
a. The X̅ = Σxi/n = 240 / 12 = 20
b. The deviation of each score from the mean d = xj – X̅ is worked out.
c. Each deviation is squared (xj-X̅)2 and value entered in the table and values are then totalled —
Coefficient of variation (CV):
The above measures of dispersion, i.e., variance values are expressed in some units. A measure of variation, independent of unit measurement becomes essential for comparison among different varieties of the same population or among different populations in respect to same variable. For such comparison coefficient of variation is a very important statistical instrument.
CV is the standard deviation expressed as a percentage of mean.
CV = SD/ mean x100 or S/X̅ x 100
Thus, CV is used in comparing the variabilities of two variables measured in different units more suitably than S2 and S.
Example:
In two series of boys and girls of the same age of 20 yrs following values were obtained for the height. Which sex shows greater variations?
Thus the height of boys shows slightly greater variation than in girls with the ratio of or 1.1:1.0.
Procedure # 4. Standard Error of Mean:
The standard deviation of the means is commonly termed as standard error and is equal
to SE = SD/√n. It is a quantity which can be calculated directly from standard deviation of the sample and the sample size.
Suppose the mean systolic blood pressure of 566 males is 128.8 mm and SD 13.05 mm.
Then SE = SD/√n = 13.05/√566 = 0.55
Standard error of difference:
A problem more commonly met with in agricultural and biological research is the comparison of two sample means. The standard error of difference of means of two samples of sizes n1 and n2 drawn from a population with
Example:
In a nutritional study 100 children were given a usual diet and vitamin A and D tablets. After 6 months their average weight was 30 kg with SD of 2 kg, while the average weight of control group of 100 children with usual diet is 29 kg with SD 1.8 kg. Can we say that vit. A and D are responsible for this difference?
As per formula:
The ratio of observed difference between means to standard error of difference is Z.
Z = 30 – 29/0.27 = 1/0.27 = 3.7
As the value of the ratio (Z) is 3.7 times to SE of differences, so the observed difference is highly significant. Thus the vitamins have effect on weight gain.
The normal probability table indicates that when this proportion is more than 1.96 times, the observed deviation is significant at 5% level (P0.05). Similarly, in a deviation 2.58, its SE is significant at 1% (P0.01) level.
Procedure # 5. Test of Significance:
The workers are often interested in comparing characteristics (like mean, variance) of a group with specified value or in comparing two or more groups with regard to the characteristics.
On the other hand, the variation between samples from the same population can at best be reduced, but can never be eliminated. So inference cannot be drawn among the groups in the presence of sampling fluctuation. The procedure that helps to find out real differences among groups in presence of sampling fluctuation is so called test of significance.
Student’s ‘Y test (Table A):
Computation of the level of significance of difference between two sample means or between sample mean and mean of the population, use of SE of difference and normal sampling distribution around the population mean is necessary. In case of small sample (N<30) the ratio follows a different distribution, called the ‘t’ distribution.
The ‘t’ corresponds to ‘Z’ in large samples but probability of occurrence (p) of this value is determined by reference to ‘t’-table against appropriate degrees of freedom. The probabilities (p) are given in decimal fractions as 0.01, 0.05, 0.10 and so on and the same can be converted into percentages as 1%, 5%, 10% and so on.
Probability of 0.05 is regarded as critical level of significance and it corresponds to 95% confidence limit of large samples. The ‘t’ test is an accurate method of deciding whether the difference between two means of small samples (N < 30) is significant or not in (a) unpaired data and (b) paired data.
a. Unpaired ‘t’-Test:
For the unpaired ‘t’- test, data of independent observations are made on individuals of two different or separate groups or samples drawn from two populations or same population.
Following steps are followed:
i. Find the observed difference between means of two samples (X̅1 – X̅2).
ii. Calculate the SE of difference between two means.
S = pooled standard deviation,
S1 = SD of sample one,
S2 = SD of sample two,
n1 = size of sample one,
n2 = size of sample two,
If n1 = n2, i.e., both the sample size is equal, say n, the test reduces to
iv. Determine the pooled degrees of freedom df = n1 + n2 – 2,
v. Now refer to Fisher’s ‘t’-table to know the probability of occurrence and the level of significance of calculated ‘t’ value corresponding to the degrees of freedom (n1 + n2 – 2) for the samples under study.
Example:
In a nutritional study 13 children were given a usual diet plus vitamin A and D tablets, while the second comparable group of 12 children was taking the usual diet. After 12 months the gain in weight in pounds was noted as in table. Can you say that the vitamin A and D are responsible for this difference?
For 23 df at 5% level of significance at ‘t’- table, the highest obtainable value of ‘t’ is 2.069. The ‘t’ value in this observation is 2.74 and is much higher than 2.069 and that indicates that the samples differ significantly.
The probabilities of occurrence (p) of the said value 2.74 at 5% level is much less than 0.05. The p comes to 0.02. It can occur less than two times in a 100. The difference is real in 98% experiments. The result of this test is written as (t = 2.74, P < 0.02, significant at 2% level).
So vitamin A and D are responsible for the difference in increase of weight in two groups.
b. Paired (two-tailed) ‘t’ Test:
It is applied to paired data of independent observation when only one sample is used and each individual gives a pair of observations.
Different situations where this test is used are:
i. Study of role of a factor or cause when observations are made before or after its play, e.g., of diet on leucocytes count; of a drug on blood pressures, etc.
ii. To compare the effect of two drugs given to some individuals in the sample in two different occasions, e.g., adrenaline and noradrenaline on pulse rate, etc.
iii. To compare results of two different laboratory techniques, e.g., estimation of haemoglobin by Tallquist and Sahli’s methods; microfilaria infection rate by thick smear and concentration technique, etc.
iv. To study the comparative efficacy of two different instruments, e.g., two types of sphygmomanometers.
v. To compare observations made at two different sites in the same body, e.g., compare temperature between toes and between fingers, or in mouth and rectum of the same individual.
Testing by this method eliminates many sampling errors and one starts it with the null hypothesis (H0).
For testing the significance of difference following steps are to be followed:
i. Find out the difference in each set of paired observations before and after,
(x1 – x2) = x.
ii. Calculate the mean of differences (X̅).
iii. Compute the SD of differences and SE of mean from the same SD/√n .
iv. Determine ‘t’ value by substituting the above values in the formula t = X̅/SE
v. Find the df, it should be n -1.
vi. Find the probability of the calculated ‘t’ corresponding to the degrees of freedom from ‘t’-table.
vii. If ‘p’ is more than 0.05 the difference observed has no significance; if ‘p’ is less than 0.05 the difference observed is significant.
Example:
Systolic blood pressure of 9 normal individuals, who had been recumbent for 5 minutes, is taken. The 2 mls of 0.5% soln of hypotensive drug is given and blood pressure recorded again as in table. Do the injection of drug lower the blood pressure?
In the ‘t’-table for 8 degrees of freedom 5% significant limit of ‘t’ is 2.31. The observe ‘t’ value is 5.17.
Hence, the drug injected produced hypotensive effect (t = 5.17, P<0.001, highly significant at 0.1% level).
*Taken from Table III of Fisher and Yates, Statistical tables for Biological, Agricultural and Medical Research, Longman Group Ltd., London.
Procedure # 6. Chi-Square Test:
The symbol of chi-square is x2, The test is used when the data fit into the yes or no, or change or no change categories. Whether something did or did not have an effect, can be known from this test. The chi-square test might be used in the evaluation of a drug, against fever or other cases in the investigation on the efficacy of a drug, one group of patients with fever are subjected to the drug, while the control group receives an impotent substitute.
The data at the end of observations are:
At the end of the treatment period, for the totals of the two groups 39.6% (38 x 96 x 100) of all the patients had no fever. In case there exists no difference (null hypothesis; H0) between the two groups, then in the control group 21 patients (53 x 39.6%) would be normal and 32 patients (53 – 21) would still have fever.
The same procedure is carried for the drug group and the data may be arranged as:
Once degree of freedom is still there, as after calculation of one value the other three can be obtained by subtraction. The table of chi-square values for degree of freedom tells us that for one degree of freedom and chi-square values 34.73, p is less than 0.001. That means the probability of difference is less than 1 in 1000.
Such a large difference between two groups occur only by chance. From the highly significant difference between the two groups it may be said that the drug is effective in lowering fever. Chi-square values for degree of freedom from 1 to 30 is given in table B.
Procedure # 7. Analysis of Variance (ANOVA):
Paired and unpaired ‘t’ test only deals with two populations and significant difference present between them can only be judged by it. But ‘t’-test is useless when population becomes more than two. In that case ‘F’-test is recommended. More than two populations may arise due to same category of individuals treated with different stresses or a number of groups or categories grown under same stress or both are simultaneously operating.
Through ‘F-test’ the effect, due to any stress, different category of population or due to repetition of experiment can be well judged. For example, the following experiment may be considered.
Variety-2, Replication-10
In ‘t’-test —
Difference (A – B) = – 29-0
t = -29-0/67.6 = – 0429, a non-significant value.
Alternatively, we can analyse the total variation among the 20 plots into two components Between A and B (= 1 df)
Within A and B (18 df) and test the significance of difference A-B by the F-test.
The analysis of variance:
t value is equivalent t√F or vice versa.
This alternative procedure has the advantage that it is also suited to make an overall test of several differences simultaneously; that is, for the comparison of more than two treatments. Thus any number of treatments might be replicated an equal number of times.
Besides that ‘F’-test technique allows the experimenter to partition the total variance into different components and so that he can judge the cause of variation. An experiment is being provided below for easy explanation of the above comments and to find out the procedure of calculation of ‘F’-test.
No. of strains = 8
Replication of each strain = 4
Design = Randomised Block design
Critical difference:
The square root of error mean square measures of the standard error developed due to uncontrolled factors. S.E. of means would be equal to S/√4. Four is the number of replications, each variety had availed.
ADVERTISEMENTS:
S.E. of strains means = √Ems/r
Ems = Error mean square
r = Replication
In previous experiment
S.E. = √(378.12/4)= √94.28
S.E. of difference = 9.7 x √2 = 13.7 units The minimum value of difference, that must remain present between the means of strains, showing significant difference at a particular level of significance is called critical difference (C.D. value at particular level of significance).
Let 0.05 level against df 21 = 2.08
2.08/1 = C.D./ S.E. of difference
Then C.D. = 2.08 x S.E. of difference
= 2.08 x 13.7 units or 137 g
= 285 g or 28-5 units
i.e., C.D. at 05 level = t0.05 at df 21
i.e., C.D. at 0.05 level = t0.05 at df 21 x √Ems/r x √2 = t0.05 at 21 df x √2Ems/r
where Ems = Error mean sum of square
r = replication