What statistical analysis to use to relate multispectral seed data to other conventional tests?

#1
I'm a PhD student at the University of São Paulo, Brazil, and I'm conducting experiments with multispectral analysis of soybean seeds.
I have reflectance data for 8 different soybean seed samples, whose data were obtained from 19 spectral wavelengths. I want to relate these data to the test data of: (i) germination (percentage of dead seeds, normal and abnormal seedlings), (ii) seedling image analysis (vigor index, developmental uniformity and seedling length) , (iii) tetrazolium (viability percentage, vigor and dead seeds). Note: For each test, eight soybean seed lots were formed and used with different levels (0 to 14%) of immature seeds.
The tests were conducted in a completely randomized design with eight seed lots, each lot divided into 4 replications of 50 seeds.
So which statistic is most appropriate for tying multispectral data for each test conducted later?
 
#2
To be ciurious, how had you planned to do the evaluation before the experiment was done?
(Maybe your ideas were just fine.)

What kind of response variables are there:
(i) germination percentage of dead seeds, normal and abnormal seedlings, this seems to be a 0 or 1 variable, or three categories (thus multinomial)
(ii) seedling image analysis (vigor index, developmental uniformity and seedling length) Is that three variables? Could they be continuous, possibly normally distributed?
(iii) tetrazolium (viability percentage, vigor and dead seeds). How many variables is that? Continuous ?

For each test, eight soybean seed lots were formed and used with different levels (0 to 14%) of immature seeds. What does that mean? Why use immature seeds (whatever that is)?

"completely randomized design with eight seed lots, each lot divided into 4 replications of 50 seeds." So one "lot" had 200 seeds? You put some light on the whole group of seeds, not on each individual seed?, and then measured the reflectance from 19 spectral wavelength?
 
#3
Initially, mature soybean seeds with different proportions (0, 2, 4, 6, 8, 10, 12 and 14%) of green soybean seeds (a problem of chlorophyll retention by biotic and abiotic stress) were mixed.

Each lot consists of 200 seeds, following rules for seed analysis.

Initially, for each batch, four replicates of 50 seeds (identified one by one) were subjected to multispectral analysis equipment to excite chlorophyll retained in the seeds and record the reflectance of fluorescence signals.

After capturing the chlorophyll fluorescence signals the samples from the 8 lots were submitted to the germination test, where at 7 days the percentage of dead seeds, normal and abnormal seedlings was accounted.

For the seedling image analysis experiment, 8 new lots were prepared, submitting the seeds to the multispectral equipment and germinating. At 3 days the seeds were placed in a scanner and the average for vigor index, uniformity, seed length (centimeters), hypocotyl length (centimeters) and root length (centimeters) were obtained for each repetition.

The same batch composition and multispectral analysis procedure was performed. The material was then submitted to the tetrazolium test, where the percentage of viability and vigor were obtained for each repetition.

Other tests were conducted, such as electrical conductivity (EC), Accelerated Aging (EASS) - which obtained percentage dead seeds, normal and abnormal seedlings and also measurement of the degree of humidity (given in percentage). Therefore, for each test, eight lots of mature soybean seeds were presented presenting different greenish seed rates.


I need to relate the multispectral data (% reflectance) with the triggered test data after chlorophyll fluorescence analysis.

Below is part of the data already collected. We still need to extract the reflectance data (%) that will be done soon.


1569206941319.png
 
#4
Germination: If you take the data as dead as 0 and not dead as 1. That would be a binomial distribution and you can run a logit model.

The same with normal seeds = 0 versus abnormal =1 (ignoring all dead seeds. Use logit.

Explanatory variables the 19 wave lengths. You could use linear regression, but maybe the wave lengths will be heavily correlated. Then you can summarize the 19 variabels with principal components. That would create a number of uncorrelated variables. Maybe 8 variables would be enough. Then use these 8 variabels as explanatory variabels in a logit regression model. I guess that you could/should include the proportion of green soybeans.

The other variables could maybe be used in a linear regression model (i.e. based on the normal distribution).