ISSN electrónico 2145-9371
ARTÍCULO DE INVESTIGACIÓN / RESEARCH ARTICLE
Image analysis for automatic feature estimation of the Mangifera indica fruit
Análisis de imágenes para la estimación automatizada de características del fruto Mangifera indica
Correspondencia: Germán Sánchez Torres, Cra. 32 No. 22-08, Universidad del Magdalena, Santa Marta (Colombia). Edificio Docente, Cub. 3D401, Tel: (57 - 5) 4217940, Ext: 1138. firstname.lastname@example.org
Determination of fruit features such as weight, level of maturity and level of spots is an essential step to fulfill export requirements. This paper presents a methodology to estimate these characteristics from a set of images obtained from mango fruits. The methods for determining the level of maturity and level of spots showed an overall accuracy above 90%, while the error margin for the weight determination process did not exceed 6 grams.
Keywords: Degree of maturity, Mangifera indica L, spot detection, weight estimation.
La determinación de las características del fruto como el peso, grado de maduración y nivel de manchas es un paso importante para determinar los requerimientos necesarios establecidos para los procesos de exportación de este producto. Este trabajo presenta una metodología para determinar estas características a partir de un conjunto de imágenes obtenidas del fruto. Los métodos para la determinación del grado de maduración y el nivel de manchas mostraron una efectividad superior al 90% y un margende error inferior a los 6 gramos en la determinación del peso.
Palabras Clave: Detección de manchas, estimación del peso, Mangifera indica L, grado de maduración.
In the agitated modern industry, intense competition compels companies to innovate their processes, mechanisms, tools and technology to increase the level of performance while maintaining strict quality standards. In production sectors, processes are still highly dependent on human intervention. Traditionally, the fruit production sector relies on subjective judgments for assessing compliance with high product standards, even though the weaknesses in subjective assessments are widely known. Several examples of successful incorporation of technologies in production processes can be found in the literature (see ,  and ).
There is an extensive amount of research into the automation of similar processes for a variety of products, such as apples , , strawberries , peaches , potatoes , and other products , . Reported results have been satisfactory at trying to automate processes previously performed manually. Godoy et al.  present an inspection system designed to classify the Chontaduro fruit (Bactris gassipaes) for use in derivatives. The classification process is based on fruit color, size and shape, and presence of defects. Bands are used to transport and rotate the fruit in order to take a set of images to analyze the whole surface of the fruit. Classification is performed by means of a minimum distance classifier model K-Means. The percentage of total precision of this system is 96%. In  they present the experimental results of the spectral analysis of optical images of cherry coffee fruits. The spectral processing was performed in the electro-magnetic radiation visible spectrum through an acoustic-optical filter reporting minor time processing rate than classical digital processing approaches. Atencio et al.  present a method for mango classification based on visual inspection, according to Colombian Technical Standard NTC 5139 . The classification process is based on the automatic estimation of physical fruit properties, such as volume, weight, size and maturity level. Fruit properties are estimated by means of principal component analysis and a three-dimensional ellipsoidal model. The method is based on processing a single fruit image, making it difficult to get an accurate estimation of the degree of maturity or a complete analysis of spots. Kang et al.  examine the use of digital color measurement by calculating the values of the angular position and radial distance at the CIELab color space, analyzing surface hue in order to determine the state of maturation of a mango.
Regardless the proposed models, the reported accuracy levels can be enhanced in order to an more accurate classification process. In , the reported error on weight estimation was around +-11.6grms and  reported an error of 2.5° on maturity estimation space, which alter the classification process for fruit weighing near limits of each category. The core of these limitations is the use of a single image for fruit feature estimation.
We address the implementation of a system that characterizes and classifies mango fruits (Mangifera indica L) according to the parameters required by international and national market standards . The approach is based on multiple images analysis for fruit feature estimation including weight, maturity level, and spots quantification, in order to improve the process accuracy. Our initial step comprises a set of methods that aim to get the best visual information of the scene to analyze and convert it into digital information. The digital information is pre-processed to improve image quality and highlight the characteristics of interest. Then, we apply a segmentation step to individually separate the fruit information. At this stage, we seek to isolate the fruit from the background. Afterward, the method estimates some fruit characteristics such as weight, color and degree of maturity. Finally, the system evaluates and analyzes these features in order to classify the fruit according to preset parameters.
The proposed methods begin with a set of acquired images of the fruit. For image acquisition, the camera uses video adapters connected to the computer. This connection uses the serial port with transfer standard IEEE 1394. High resolution allows for better detail analysis, but since increasing the resolution also increases processing loads, we decided to set the image resolution at 480x352 pixels. For image acquisition, a diffuse front lighting system with four white-light 6-watt lamps was used (see figure 1).
In order to determine the main characteristics of the fruit from the images, we followed a series of steps: pre-processing, weight estimation, degree of maturation, and spots measurement. In each of the steps, the procedures used Matlab 2011a software functions, and Statistics Toolbox functions.
We applied traditional methods for image enhancement, de-noising and edge detection. This stage seeks to separate the fruit from the image background. To achieve this separation we performed color segmentation. In order to design a more accurate method to achieve this objective, we studied different color spaces, especially those in which color information is distinguishable from the intensity component, such as HSI (Hue, Saturation and Intensity) or YCbCr (Luminance and Chroma components) color spaces.
We chose the YCbCr color space because it allows for more effective segmentation than other color spaces, resulting in a clear distinction between fruit and background colors. Colors in a mango fruit are usually in the green to red color spectrum. In the YCbCr color space, this spectrum range is in the second and third quadrants (see figure 2), in the negative value segment [-1, 0] of the Cb channel. This means that by analyzing this region separation between fruit and background is straightforward.
Due to the fact that only the Cb channel information is relevant, it is not necessary to make a complete transformation from an RBG image. For efficiency, the following function obtains the Cb channel information from an RGB image:
where, R(x,y), G(x,y) and B(x,y) are in the integer range of [0 255].
The method normalized the result of the previous function in the range of 0 to 255. In this case, the negative range observed in Figure 2 is between 0 and 127. The result of applying this transformation to an original image can be observed in figure 3.
After this transformation, a threshold segmentation procedure is applied. The segmented image g(x, y) is:
Each image pixel (x, y) in which Cb(x, y) < U is a pixel belonging to the fruit; otherwise, the pixel is part of the background, as shown in figure 4. We determined the U value by histogram analysis of a set of images and set it to U = 120. It is consistent with previous deductions related to the range of fruit over the Cb channel.
From the resulting image (figure 4b), features related to fruit geometry, like weight or volume, can be extracted. However, for the estimation of degree of maturation and the analysis of spots, it is necessary to define an enhanced image (see figure 5). This image (IM1) is the result of the product between the segmented image (ImSeg) and the original image, that is:
The result is an image in which points belonging to the background are zero (black), and those corresponding to the fruit have their original color.
We can derive the weight of fruit from volume estimation. The analysis of spatial geometry is the basis for volume estimation, from which the total volume is the sum of the volumes of all sections formed by a transverse cut along the length of the fruit (see figure 6). Then, if we take a small enough h value (height of each cross section), it is possible to approximate the volume of each section to the elliptic cylinder volume, thus:
For the Mangifera indica L fruit, a ratio analysis of the semi axes a and b at different points along the length of the fruit (see figure 7) allows us to conclude that there is a constant relationship between the values of the semi axes, which we called Depth Factor (Df):
In this way, the value of semi axis a can be expressed by means of b and Df values.
In order to determine the b axis value, we only have to determine the number of on pixels on each row and divide it by two, which represents half of the length of the fruit in pixels in this image section.
Given an image f(x, y) of size m x n:
To complete the volume estimating process, the length of the minor axis of each pre-processed image obtained before was calculated with the regionprops function of the Matlab image toolbox. This function permits to specify the MinorAxisLength property, which returns the length in pixels of the minor axis of the ellipse having the same normalized central moments as the object in the image, thus:
Minoraxis = regionprops(B, ' MinorAxisLength') where B is the image that has the object to analyze.
Once we get the minor axis length of each image, determining the other axis is straightforward (see figure 8). This axis combination determines a front view of the fruit. The relation between the major and minor axis of a set of images defined the depth factor Df.
After obtaining the volume, weight P of the fruit is:
Where d is the average density obtained from the analysis of a set of images of real fruit with known weight and volume. The weight-based classification process is carried out according to the values in table 1.
Maturity level estimation
The level of maturity is a decisive factor in fruit classification for export and a key factor to determine conservation policies. We propose a non-invasive method to determine the maturity level by analyzing the color of the surface of a fruit. This analysis provides a criterion for classification according to the specifications for classification by color described in NTC 5139. For color treatment, we selected the HSI color space because it allows for better discrimination between the color information of the first two channels: the Hue-Saturation and the intensity channels. The following function permits the transformation of RGB to HSI space:
Where , if
Once we get the transformation to HSI, we analyzed distribution variations by histogram analysis in a set of images. This analysis showed that the mean value of channels H and S is sensitive to changes in the color states.
The fruit hue average component with lower maturity level is at the right of the histogram, while the values to the left represent an increasing degree of maturation (see figure 9). From these findings, we designed a Bayesian classifier in which the characteristics to be evaluated are the overall average hue and saturation channels of the five images obtained from each fruit. We use the NaiveBayes_fit function on Matlab to define a NaiveBayes classifier trained from an array of data describing the characteristics of each sample and their respective category, so that the probability densities function self adjusts for each class. We used the default Normal (Gaussian) distribution to model the data.
The training set had 15 fruits, each one with two features, hue and saturation mean, that were estimated by using a set of 5 images for each fruit, and the classes were defined by expert concept. The prior probabilities for classes were estimated from the relative frequencies of the classes in the training data. Finally, the category to which the fruit belongs according to the input parameters of hue and saturation can be defined by means of the Predict method.
Spots represent another crucial factor in the classification process since they directly denote fruit conditions affecting its quality, such as the presence of pests or general irregularities in its treatment. We implement a method that could simulate the decision criteria of experts in the field. The main idea is to analyze the distribution and proportion of spots on a fruit's surface so that a comprehensive description of its status can be obtained (e.g. no spots, light spots or prominent spots). As in other methods, the first step was to determine the color space to work so that the spots can be discriminated from the remaining parts of the fruit. We chose the saturation component S of the HSI color space because relevant spots on fruits are mostly of gray and black shades, which represent a high saturation value, thus allowing for a clear distinction of these spots from the rest of the fruit (see figure 10a).
In order to determine the threshold that would allow for an adequate segmentation of the spots we analyzed a histogram of the image (see figure 11). Then we obtained a binarized image, as shown in figure 10b.
From the binarized image, we proceeded to determine the characteristics from which to evaluate possible spots on the fruit. After a visual analysis of the binary information, we identified the following factors:
Inspected area: the inspected area is defined as the area in pixels of the image without spots. This property calculates the area in pixels of the image with all its internal spots (in this case, connected pixels equal to 0 represent the spots). Spot area: the spot area is defined as the area of dark pixels inside the fruit. These pixels are visible after combining the negative of the original and the binarized image, as shown in figure 12.
Number of spots: we determined the number of spots calculating the number of holes in the fruit.
Percentage of spots: the percentage is the ratio between the total area of the spots on the fruit and the whole area inspected. We proceeded to describe classification categories according to the amount of spots; with the assistance of a classification expert we identified the following categories:
• None or very mild: no spots present or just natural lows in the fruit.
• Mild: few natural spots that do not involve a large surface area.
• Medium: numerous spots spread over a large percentage of the surface of the fruit.
• High: excessive spots affecting a large percentage of the fruit's surface, usually an indicator of pest presence.
In order to evaluate the weight estimation method, we carried out a comparative analysis between the real and estimated weight of a set of 120 fruits. From the working set of fruits, we selected a random a sample. An expert performed the measurements and the selection process.
Figure 13 shows that the estimated weight behavior is similar to the real weight behavior. The mean square error was 5.34 grams, the standard deviation was 2.71 and the correlation coefficient was 0.997.
In order to determine the main causes of loss of accuracy, we selected and analyzed the fruits with the biggest discrepancies. We determined that the most important loss of accuracy corresponded to irregular turns during the rotation process, which prevented the acquisition of the appropriate profiles of the fruit.
Additionally, since the system is set to work with a set of five images, it was desirable to measure the effect of the number of images. To this end, we selected a set of random fruits to which we performed 10 measurements with 5, 6, 7 and 10 images. As shown in figure 14, the accuracy of the estimation method increased when the number of acquired images was increased; however, this increase translates into a higher computational load and consequently into more processing time. Therefore, a trade-off between accuracy and speed must be considered when deciding the number of images to use.
Finally, we classified each fruit by estimated weight. The method classified 114 fruits correctly and 6 fruits incorrectly, allowing us to estimate an effectiveness of 95%. Similarly, for evaluating the color-based classification, we used a fruit set with known maturity level to define the Hue-saturation classification guide shown in figure 15. The proposed method had an effectiveness of 96.47% compared to a fruit set classified by an expert.
The results of the classification based on spot occurrence were evaluated in a similar way. We analyzed the performance of the classifier by comparing system response to expert evaluation. From a 150 fruit set, the method correctly classified 136 fruits, for an effectiveness of 90.66%. The confusion matrix for the spot-based classification is show in Table 2.
The results presented above allow us to conclude that the proposed fruit classification methods based on digital image processing techniques adequately classify the fruits. The correlation coefficient of the weight-based classification showed that this is a highly accurate model for weight estimation. The spot-based classification showed lower accuracy, because estimation of spots is highly dependent on fruit position during image acquisition. The distribution of spots on the fruit's surface is not uniform; therefore, accomplishing complete surface representation in the fruit image set is a key aspect if a high level of accuracy is needed.
As future work, we aim at migrating the code to native languages in order to perform the processing within microcontrollers, thus allowing the device to operate independently of a computer.
Another interesting aspect is to propose a different model for the acquisition of a set of fruit images. A procedure based on rotations resulted in errors being introduced during the acquisition of the fruit's different profiles, because rotations to exact degrees are not guaranteed. Since the results of the rotation process depend on fruit shape and weight, a mechanical implementation of this process could be time consuming because rotations depend on a mechanical device.
*** Professor, Universidad del Magdalena, Santa Marta (Colombia). Research and Development Group in New Information and Communication Technologies (GIDTIC). PhD in Systems Engineering, Universidad Nacional de Colombia. email@example.com
 A. Al-mallahi, T. Kataoka, H. Okamoto and Shibata, Y, "An image processing algorithm for detecting in-line potato tubers without singulation," Computers and Electronics in Agriculture, vol. 70, pp. 239-244, 2010.
 P. Atencio, G. Sánchez and J. Branch, "Automatic visual model for classification and measurement of quality of fruit: case Mangifera Indica L," Dyna, vol. 76, N°. 160, pp. 317-326, 2009.
 B. Jarimopas and N. Jaisin, "An experimental machine vision system for sorting sweet tamarind," Journal of Food Engineering, vol. 89, pp. 291-297, 2008.
 C. Puchalsk, J. Gorzelany, G. Zagula and G. Brusewitz, "Image analysis for apple defect detection," Biosystems and agricultura engineering, vol. 8, pp. 197-205, 2008.
 D. Unay and B. Gosselin, "Artificial neural network-based segmentation and apple grading by machine vision," In IEEE International Conference on Image Processing (ICIP 2005), vol. 2, pp. 630-633. 2005.
 X. Liming and Z. Yanchao, "Automated strawberry grading system based on image processing," Computers and electronics in agricultura, vol. 71, pp. s32-s39, 2010.
 S. Godoy, L. Pencue, A. Ruiz, D. Montilla. "Clasificación automática del chontaduro (Bactris Gassipaes) para su aplicación en conserva, mermelada y harinas". Revista biotecnológica en el sector agropecuario y agroindustrial, Facultad de Ciencias Agropecuarias, Vol. 5 No. 2, pp. 137-146, 2007. ISSN 1909-9959.
 J. Hofstee and G. Molema, "Volume estimation of potatoes partly covered with dirt tare, " In American Society of Agricultural and Biological Engineers, pp. 12. 2003.
 A. Loureiro, J. Sanchez, D. Fabbro, "Image processing techniques for lemons and tomatoes classification," Journal of food and engineering, vol. 83(4), pp. 433-440, 2008.
 J. Mosquera, A. Sepúlveda and C. Isaza. "Procesamiento de imágenes ópticas de frutos café en cereza por medio de filtros acusto-ópticos," Revista Ingeniería y Desarrollo, vol. 21, pp. 94-102, 2007.
 NTC 5139, Instituto Colombiano de Normas Tecnicas. 2002.
 S. Kang, A. East and F. Trujillo, "Colour vision system evaluation of bicolour fruit: a case study with 'b74' mango," Post-harvest biology and technology, vol. 49, pp. 77-85, 2008.
Ingeniería y Desarrollo