Cotton genotypes selection through artificial neural networks
Received: August 11, 2017
Accepted: August 25, 2017
Published: September 27, 2017
Genet.Mol.Res. 16(3): gmr16039798
Breeding programs currently use statistical analysis to assist in the identification of superior genotypes at various stages of a cultivar’s development. Differently from these analyses, the computational intelligence approach has been little explored in genetic improvement of cotton. Thus, this study was carried out with the objective of presenting the use of artificial neural networks as auxiliary tools in the improvement of the cotton to improve fiber quality. To demonstrate the applicability of this approach, this research was carried out using the evaluation data of 40 genotypes. In order to classify the genotypes for fiber quality, the artificial neural networks were trained with replicate data of 20 genotypes of cotton evaluated in the harvests of 2013/14 and 2014/15, regarding fiber length, uniformity of length, fiber strength, micronaire index, elongation, short fiber index, maturity index, reflectance degree, and fiber quality index. This quality indexwas estimated by means of a weighted average on the determined score (1 to 5) of each characteristic of the HVI evaluated, according to its industry standards. The artificial neural networks presented a high capacity of correct classification of the 20 selected genotypes based on the fiber quality index, so that when using fiber length associated with the short fiber index, fiber maturation, and micronaire index, the artificial neural networks presented better results than using only fiber length and previous associations. It was also observed that to submit data of means of new genotypes to the neural networks trained with data of repetition, provides better results of classification of the genotypes. When observing the results obtained in the present study, it was verified that the artificial neural networks present great potential to be used in the different stages of a genetic improvement program of the cotton, aiming at the improvement of the fiber quality of the future cultivars.
Gossypium hirsutum is the most cultivated cotton species in the world. Cotton hasbeen domesticated for thousands of years in South Arabia and as a consequence of this incessant domestication process, cotton is the most important vegetable fiber in the worldwide textile industry (Cotton Incorporated, 2017).
However, the high quality fiber is essential for textile industry and will influence directly cotton commercialization, because quality determines the market value and also acceptance (Bonifácio et al., 2015). Thus, the cotton genetic breeding programs aim specially fiber quality, as well as, the cotton lint development (Morello and Freire, 2005; De Araújo et al., 2013).
Cotton fiber quality has been mainly determined by genetic traits, but, the environment (climatic, agronomic, nutritional, and phytosanitary traits, as well as the crop management), has strong influence in quality. Harvest and processing also play an essential role in maintaining fiber quality. Furthermore, the impurity content from harvest mechanization has to be highlighted (Salgado et al., 2015).
The HVI (high instrument volume) analyzes fiber quality and is widely used in cotton breeding programs, albeit, certain questions arise about this practice, such as the impact on genotype selection using more than one intrinsic fiber criterion and the reduction of genetic diversity for some of traits analyzed. Moreover, the visual criterion used and variation of scores attributed for different appraisers are questionable for morphologic evaluation effectiveness. Studies disclosing the morphologic traits evaluation through visual scores have low efficiency (Gabriel and Blanco, 2009).
In this context, the mathematical modeling appears as a tool to help cotton genetic breeding and so genotypes classification. ANNs are an alternative, which is based on a computation concept that aims to work with data processing in similar way to human brain, acquiring knowledge through experience, predicting and recognizing patterns or establishing groups (Haykin, 2008; Braga et al., 2011). Genetic breeding applies ANNs in genetic diversity studies (Barbosa et al., 2011), genetic value prediction (Silva et al., 2014; Carneiro, 2015), aswell as adaptability and stability analysis (Barroso et al., 2013; Nascimento et al., 2013). Cotton prediction studies have used some traits related to textile industry wiring (Jackowska-Strumillo et al., 2004; Ghosh et al., 2005; Ureyen and Kadoglu, 2007; Gharehaghaji et al., 2007).
RNA main attribute is the nonlinear structure linked to a capacity of not requiring detailed information about physical processes of system (Sudheer et al., 2003). ANNs classification method presents advantages asbeing nonparametric and tolerant to data loss (Kavzoglu and Mather, 2003). Therefore, thisstudyevaluates the RNA potential in cotton genotype selection with high-quality cotton fiber.
Material and Methods
This study used data from 40 cotton genotypes evaluated during 2013/14 and 2014/15 seasons, from Programa de MelhoramentoGenético do Algodoeiro (PROMALG) of Universidade Federal de Uberlândia (UFU), and 8 intrinsic fiber traits measured through HVI (high instrument volume), besides quality fiber analysis. Data were a result of experiment carried out at Fazenda Capim Branco, a research station of UFU, between the geographical coordinates 18°52’S,48°20’W, and 805m in altitude, in Uberlândia, Minas Gerais.
The 2013/14 season experiment design consisted of augmented blocks with 4 replications, 5-m long plots of four rows of cotton plant with row spacing of 0.9 m among each other. The 2014/15 season experiment was made at the same way as the previous one, except for the number of plot lines, with one extra line. Thus, there were 8 holes per meter and two seeds per hole, with subsequent thinning, leaving one plant per hole.
The fiber traits evaluated were: length, length uniformity, resistance, elongation, micronaire, and short fiber index (SFI) (Table 1), as well as reflectance degree and fiber quality.
Table 1. Reference values for fiber intrinsic traits.
Reflectance degree is based on the ash content in cotton sample. The whiter cotton sample is the higher will be the reflectance degree (Costa et al., 2006).The fiber quality has five quality scores per trait. The scale attributes scores from 1 to 5, in accordance with less desirable (score 1) and most desirable (score 5).
The multiple regression analysis identifies cotton fiber major determinant traits of quality through stepwise to select variables for pattern adjustment, which includes 12 genotypes measured traits in two evaluations, seasons of 2013/14 and 2014/15. The stepwise multiple regression analysis was carried out with the GENES software (Cruz, 2016).
The data of relevant variables were submitted to analysis of variance, according to model of augmented block design. Later, the season joint analysis of variance was also carried out. For all analyses, all effects were considered fixed, except the error. RNA analyses used genotype data in each replication to obtain a larger sample size. The genotypes were allocated to two established groups of fiber quality. The first group consisted of genotypes with scores up to 2.5 and the second group with genotypes with scores higher than 2.5. The genotypes placed in different groups at replications and/or seasons were not analyzed.
Thereby, 20 of the 40 genotypes evaluated in seasons of 2013/14 and 2014/15 were used in RNA analyses (Table 2), making total of 60 observations per evaluation year, since the data of each replication were used for RNA training and validation.
|PA UFU - S||1||PA UFU - T||2|
|PA UFU - M||1||PA UFU - N||2|
|PA UFU - C||1||PA UFU - E||2|
|FM 966||1||PA UFU - R||2|
|PA UFU - Z||1||DP 555||2|
|PA UFU - D||1||PA UFU - F||2|
|PA UFU - H||1||PA UFU - OB||2|
|PA UFU - L||1||PA UFU - A||2|
|PA UFU - P||1||PA UFU - 18||2|
|PA UFU - G||1||PA UFU - 7||2|
Table 2. Genotypes selected for RNA analyses and its respective groups of classification.
The RNA analysis was used to predict the fiber quality genotypes in 2014/15 season as fiber length, isolated or joint with short fiber index, fiber maturation, and micronaire index based on RNA of 2013/14 season. Data of 2013/14 experiment replications from amplification process of RNA training acquired information from 300 genotypes per group with the same properties (average vector and variance-covariance matrix) of original genotypes. The validation was carried out replications data (60 observations) used in the amplification process and prediction with isolated data of replications (60 observations) and replications average (20 observations) of the 2014/15 season, as follows: 1 - 2013/14 season - Training and validation; 2014/15 season - Prediction (60 observations - replications data); 2 - 2013/14 season - Training and validation; 2014/15 season - Prediction (19 observations - data of average replications).
In this context, global apparent error rates (TEA) were evaluated for training, validation and prediction of ANNs. TEA was givenby percentage of incorrect classification, according to allocation groups of the genotypes. TEA per group for validation and predictions was also appraised.
The proposed experiment data simulation resulted in 300 new information per group according to data amplification. The new data sets had same properties (average, variance and covariance) of the original data sets. The amplification process was carried out by the GENES software (Cruz, 2016).
Data of 2013/14 and 2014/15 seasons experiments were evaluated through ANNs analyses and carried out with software MATLAB (Beale et al., 2015). RNA training used 600 simulated amplified data (300 of each group) according to multilayer perceptron architecture with following descriptions for the topologies:a)number of hidden layers(3 hidden layers were considered);b)number of neurons(combinations of 3 to 12 neurons were considered for each hidden layer);c)activation function. The linear activation function was used in output layers. The adequacy to all possible combinations: linear, logistic regression analysis and hyperbolic tangent were evaluated in the hidden layers. d) Training number of cycles: it was added 5000 momentum. The number of interactions was limited, and with attention, did not become excessive, which could lead to loss of generalization power. e) training function: trainbr - Backpropagation is a network training function that updates weight and bias values according to Levenberg-Marquardt optimization. This minimizes squares of the errors and weight combination; therefore, it determines the correct combination order to produce a network with good generalization capability, whose process is called Bayesian regularization.
The fiber length (FL), SFI, fiber maturation (MAT) and micronaire index (MIC) were the determinant traits of cotton fiber quality (FQ) based on the spetwise multiple regression analysis. RNA analyses used FL isolated or with SFI, MAT, and MIC predict the FQ of cotton genotypes.
The single analyses of variance report about FQ, FL, SFI, MAT, and MIC of 20 cotton genotypes in 2013/14 and 2014/15 seasons are described in Table 3. The coefficients of experimental variation (CVe’s) in 2013/14 and 2014/15 experiments were below 10% for the traits evaluated, indicating good experimental accuracy. CVe’s values given are in accordance with those reported in similar cotton growing experiments (Bonifácio et al., 2015).
Table 3. Analysisof variance report of experiments carried out with 20 cotton genotypes.
Significance (P < 0.01) was observed as genotype effects in the two experiments (Table 3), indicating the existence of genetic variability among genotypes for the5 traits evaluated in both seasons. The genotypic determination coefficients (h2) of FL, SFI, and MAT were of high magnitude for both experiments.
Joint analyses report of FQ, FL, SFI, MAT, and MIC evaluated in 2013/14 and 2014/15 seasons were described in Table 4. Significant effects (P < 0.01) of genotypes were observed on FQ, FL, SFI, MAT, and MIC. The environmental source of variation in seasons had significant effect (P < 0.01) for FL, SFI, MAT, and MIC. It had significant effect (P < 0.05) of genotype x environment interaction for traits FQ, FL, and SFI.
Table 4. Joint analysis of variance report of evaluated traits in 20 cotton genotypes of 2013/14 and 2014/15 seasons.
Table 5 presents results of ANNs, which used FL isolated or with SFI, MAT, and MIC. Higher TEA was observed for training, validation and prediction based on FL. Lower TEA was also observed in prediction of average data (16.78 and 9.27%) than in the replication data (26.89 and 14.55%) for both ANNs using FL isolated or with other traits.
|Procedures||Apparenterror rate - TEA (%)|
|Training||Validation||Prediction 1||Prediction 2|
Table 5. Apparent error rates calculated in analyses with neural networks.
Accordingto groupsof genotype classification (Table 6), RNA validation through FL, SFI, MAT, and MIC had higher percentage of correctness (91.33%) in genotypeallocation of Group 1 than onlywith FL (73.78%). Similar result was observed for genotype classification of Group 2, with 88.96% of rightness, considering FL, SFI, MAT, and MIC, against 74.55% based only on FL. When submitting the replicate data of 2014/15 season evaluated genotypes to prediction, it was noted that the ANNs using FL individually or in conjunction with the other traits correctly allocated all the genotypes of group 1, while RNA based on FL, SFI, MAT, and MIC of group 2 was higher, with 89.43% accuracy versus 73.28% considering only FL.
|Validation||Prediction 1||Prediction 2|
Table 6. Percentage of correct classification report using artificial neural networks.
Evaluated genotypes replications data used in 2014/15 season and applied to the prediction resulted in ANNs that used FL isolated or with other traits. Also, they were correctly allocated in all genotypes of Group 1, while in Group 2,the RNA based on FL, SFI, MAT, and MIC was higher with 92.45% accuracy versus 81.37% considering only FL. When replications or averages of genotypes evaluated in the 2014/15 season were used to the prediction, and also compared to them, it resulted in ANNs based on FL alone and allocated correctly in all genotypes of Group 1. However, the prediction based on the average data was higher with 81.37% of accuracy versus 73.28% using replication data. Considering FL on set with SFI, MAT, and MIC, when submitting the data of repetitions or averages to the prediction, ANNs had also placed correctly the genotypes of Group 1. However, the prediction based on the average data was higher with 92.45% of accuracy versus 89.43% using replications data.
Table 7 described RNA topologies with lower TEA in validation using fiber length isolated or with SFI, MAT, and MIC, according to multilayer perceptron architecture, regarding the number of neurons and activation function in the hidden layers.
|FL||FL + SFI|
|FL+ SFI + MAT||FL + SFI + MAT + MIC|
Table 7. RNA topology, regarding the number of neurons and activation function in the hidden layers (O1, O2 and O3), based on relation of traits evaluated in 20 cotton genotypes.
The highest number of neurons per layer was observed for RNA based on just FL compared to ANNs based on FL, SFI, MAT, and MIC. RNA topologies based on FL had the activation function in hidden layers more complex than ANNs based on FL and other fiber traits, because they have predominance of functions like logsig and tansig, whereas in associations with FL predominated linear activation functions such as purelin.
Higher accuracy in cotton fiber genotype selection of high quality requires information from other technological fiber traits according to phenotypic expression. This study observed determinant traits of fiber quality as FL, SFI, MAT and MIC.
Because cotton FQ is governed by many genes and has a strong environmental influence, the expression of genes at different stages of cotton fiber development indicates a large number of alleles involved in fiber development and in its quality determination. In this case, indirect selection for FQ based on auxiliary traits is a real possibility for cotton breeders. The FQ was already related to length and maturity of fiber (Zabot, 2007). Traits associated with phenotypic expression of cotton FQ in discriminatory analyses will be effective, if based on highly accurate process of trait selection. Additionally to quality, achieving high levels of productivity is important and closely linked to high technology (Rosolem, 2001).
ANNs based on FL, SFI, MAT, and MIC were higher than ANNs based on FL alone; FL and SFI; FL, SFI and MAT, since they had lower TEA for the stages of training, validation and prediction. Moreover, ANNs based on FL, SFI, MAT, and MIC had TEA slower than 9% in all stages, which in this study represented the erroneous classification of only two of the 20 evaluated genotypes and highlighted the high potential of RNA generalization (Braga et al., 2011; Carneiro, 2015).
The prediction had used genotype average data, as well as ANNs based on FL, SFI, MAT and MIC. Therefore, ANNs based on just FL and other traits were also higher since TEAs were much lower.
Cotton breeding for FQ would select plants with scores higher than 2.5, which would correspond to genotypes allocated to Group 1. Thus, according to predictions, ANNs based on FL, SFI, MAT, and MIC were higher than analyses based only on FL and other associations, since they presented the same percentage of correct classification in Group 1 and higher correctly percentage classification of the genotypes in Group 2.
The highest accuracy using average data of FL, SFI, MAT, and MIC for the prediction is due to the fact of environmental effects tend to be canceled with the use of averages. The genotypes evaluation through scores verified that 20 of 40 genotypes had contradiction regarding FQ scores within the same experiments and/or in different experiments. Considering these contradictions as evaluation errors, 50% error rate was associated with evaluation in the experiment. This error rate was much higher than the prediction error rate of ANNs based on FL, SFI, MAT, and MIC, which evidences the potential of RNA uses in improvement of cotton for FQ. ANNs have been shownefficient in the solution of prediction problems, recognition of patterns and groupings (Haykin, 2008), which also are difficulties found in the different stages of a breeding program.
ANNs proved to be effective in solving problems of prediction, pattern recognition and grouping of cotton genotypes. Using data from averages in the prediction by ANNs generated reliable results to FQ selection of cotton genotypes. Fewer explanatory variables for training and validation require ANNs with more complex architectures.
About the Authors
E.G. Silva Júnior
Instituto de Ciências Agrárias, Universidade Federal de Uberlândia, Uberlândia, MG, Brasil
- Barbosa CD, Viana AP, Silva S, Quintal R, et al. (2011). Artificial neural network analysis of genetic diversity in Carica papaya L. Crop Breed. Appl. Biotechnol. 11: 224-231.https://doi.org/10.1590/S1984-70332011000300004
- Barroso LMA, Nascimento M, Nascimento ACC, Silva FF, et al. (2013). Uso do método de eberhart e russell como informação a priori para aplicação de redes neurais artificiais e análise discriminante visando a classificação de genótipos de alfafa quanto à adaptabilidade e estabilidade. Rev. Bras. Biometria 31: 176-188.
- Beale MH, Hagan MT and Demuth HB (2015). Neural Network Toolbox TM User’s Guide How to Contact MathWorks.Natick: The MathWorks, Inc.
- Bonifácio DOC, Mundim FM and Sousa LB (2015). Variabilidade genética e coeficiente de determinação em genótipos de algodoeiro quanto a qualidade da fibra. Rev. Verde Agroecol. Desenvolv. Sustent. 10: 66-71. https://doi.org/10.18378/ rvads.v10i3.3618
- Braga AP, Carvalho ACPLF and Ludemir TB (2011). Redes Neurais Artificiais - Teoria e Aplicações. 2nd ed. LTC, Rio de Janeiro.
- Carneiro VQ (2015). Rede neural e lógica fuzzy aplicadas no melhoramento do feijoeiro. 2015. 91f. Dissertação (Mestrado em Genética e Melhoramento) - Universidade Federal de Viçosa, Viçosa.
- Costa JN, Santana JCF, Wanderley MJR, Andrade JEO, et al. (2006). Padrões Universais para Classificação do Algodão.Campina Grande, Embrapa Algodão, 22.
- Cotton Incorporated (2017). Did you know? Available at [http://www.cottoninc.com/] Accessed March 10, 2017.
- Cruz CD (2016). Genes Software-extended and integrated with the R, Matlab and Selegen. Acta Sci. Agron. 38: 547-552. https://doi.org/10.4025/actasciagron.v38i3.32629
- De Araújo LF, Bertini CHCM, Bleicher E, Vidal Neto FC, et al. (2013). Características fenológicas, agronômicas e tecnológicas da fibra em diferentes cultivares de algodoeiro herbáceo. Agraria 8: 448-453. https://doi.org/10.5039/ agraria.v8i3a2732
- Gabriel D and Blanco FMG (2009). Efeito de genótipos com características morfológicas mutantes sobre o bicudo e a produção do algodoeiro. Arq. Inst. Biol. (Sao Paulo) •••: 211-215.
- Gharehaghaji AA, Shanbeh M and Palhang M (2007). Analysis of two modeling methodologies for predicting the tensile properties of cottoncovered nylon core yarns. Text. Res. J. 77: 565-571. https://doi.org/10.1177/0040517507078061
- Ghosh A, Ishtiaque S, Rengasamy S, Mal P, et al. (2005). Predictive models for strength of spun yarns: An overview.AUTEX Res. J. 5: 20-29.
- Haykin S (2008). Neural Networks and Learning Machines. 3rd ed. Pearson-Prentice Hall, Hamilton.
- Jackowska-Strumillo L, Jackoswki T, Cyniak D and Czekalski J (2004). Neural model of the spinning process for predicting selected properties of flax/cotton yarn blends. Fibres Text. East. Eur. 12: 17-21.
- Kavzoglu T and Mather P (2003). The use of backpropagation artificial neural networks in land cover classification. Int. J. Remote Sens. 24: 4907-4938.https://doi.org/10.1080/0143116031000114851
- Morello CL and Freire EC (2005). Estratégias para o melhoramento genético do algodoeiro no Brasil. In: Congresso Brasileiro de Algodão.
- Nascimento M, Peternelli LA, Cruz CD, Campana ACN, et al. (2013). Artificial neural networks for adaptability and stability evaluation in alfalfa genotypes. Crop Breed. Appl. Biotechnol. 13: 152-156. https://doi.org/10.1590/S1984-70332013000200008
- Rosolem CA (2001). Informações Agronômicas no 95: Ecofisiologia e manejo da cultura do algodoeiro. Faculdade de Ciências Agronômicas, UNESP, Botucatu, 1.
- Salgado CC, Castro LHS, Lemes EM and Silva Júnior EG (2015). Melhoramento genético do algodoeiro visando à qualidade da fibra. In: Agronegócio sustentável (Silva JC, Silva AAS, Assis RT and Fravet PRF, eds.). Composer Uberlândia, 147-172.
- Silva GN, Tomaz RS, Castro I, Anna S, et al. (2014). Neural networks for predicting breeding values and genetic gains. Sci. Agric. 71: 494-498.https://doi.org/10.1590/0103-9016-2014-0057
- Sudheer KP, Gosain AK and Ramasastr KS (2003). Estimating actual evapotranspiration from limited climatic data using neural computing technique. J. Irrig. Drain. Eng. 129: 214-218. https://doi.org/10.1061/(ASCE)0733-9437(2003)129:3(214)
- Ureyen ME and Kadoglu H (2007). The prediction of cotton ring yarn properties from afisfibre properties by using linear regression models. Fibres Text. East. Eur. 15: 63-67.
- Zabot L (2007). A cultura do algodão: (Gossypium hirsutum L.). Centro de Ciências Rurais, Santa Maria, 15-37.
- Share This