Avocado fruits from the ‘Hass’ cultivar were harvested from orchards located in Uruapan, Michoacán, México (19°25 N, 102°03 W; 1620 m AMSL), which is the major commercial avocado producing region in the country. Samples for the different experiments were collected and studied between the years of 2011 to 2014. After collection, avocados were kept at ambient conditions in perforated bags overnight and shipped in closed containers with activated carbon. Samples for the first studies, described below, were collected in 2013 and upon arrival to the Centro de Biotecnología FEMSA, avocados were divided in three subsets for the fruit growth (Study I), postharvest ripening (Study II), and idioblast isolation studies (Study III). As it is described in the following sections, other avocado samples were also analyzed in Studies IV and V to study the effects of seed germination and harvesting season on acetogenin profiles, respectively.
Study I. Fruit growth – To characterize changes during fruit growth (Additional file 1: Figure S1, first part) avocado fruits were selected and separated directly at the Uruapan orchard by weight. Samples were grouped by two methods: by fruit fresh weight, to assess changes related to growth, into 10 categories (45, 60, 90, 120, 150, 180, 210, 240, 270 and 300 g) containing three replicates each; and by dry matter, to investigate relation to oil content. Upon arrival at the laboratory, the selected avocados were stored at 4 °C for 48 h, followed by freezing at −20 °C for other 2 days, and final storage at −80 °C.
Study II Postharvest ripening - Fully developed avocados (300 g), harvested at the same time than those for Study I were placed under a temperature controlled, ventilated environment until reaching one of the three sought ripening stages, in order to assess postharvest evolution of lipids (Additional file 1: Figure S1, second part). Fruits were separated by hedonic scale in Unripe (one week after detachment; peel still green); Breaker (two weeks after detachment; peel half black, half green, mesocarp still hard but softening) and Ripe (two and a half weeks after detachment; peel black and mesocarp soft, ready to eat); each stage with three replicates. When samples reached each stage, freezing and storage was followed as described in Study I.
Study III. Idioblast characterization- Idioblast isolation was also conducted on the same samples used for Study I at every other stage during growth, and from Study II at all postharvest ripening stages, to characterize acetogenin contents and distribution; and finally from 5 mature green (300 g) avocados to simultaneously assess acetogenin and fatty acid profiles. To avoid compromising cell integrity, idioblast extraction was performed as soon as avocados were at the desired stage. Samples were processed into slices, digested and fractionated to isolate idioblasts as described below; and the remaining tissue was frozen at −20 °C until further analysis. Thus, each idioblast replicate has a corresponding mesocarp and seed counterpart from Studies I and II.
Study IV. Germination studies- A set of samples collected on October 2011, were used in seed germination studies. Germination was conducted as reported  with minor adjustments. Briefly, avocado seeds from fully ripe black fruit were washed with soap and stored for three days at 4 °C in closed plastic bags filled with sterilized peat moss, to avoid dehydration. Prior to germination experiments, seeds were rinsed, decorticated (to remove seed coats) and a horizontal cut was made at the base of the embryo, without damaging the embryonic axis. Cut seeds were placed in a container filled with water covering one quarter of the seed. Samples were changed to individual Magenta™ vessels as they grew, and germinated inside a growth chamber (23 °C, 18:6 light/dark cycle). Embryonic axis and cotyledon sub-samples were collected in triplicate at the beginning of the experiment (day 0), twice a week after germination during a 3-week period (days 7 to 24), and after 10 weeks of imbibition (day 70). Embryonic axis (plumule and radicle) and cotyledon sub-samples were flash-frozen in liquid nitrogen, and stored at −80 °C until further analysis.
Study V. Seasonal effects- To investigate possible changes in acetogenin profiles throughout the harvesting years, sampling was conducted on October 2011, June 2013 and April 2014. Avocado fruits used for the study were collected at a mature green stage and stored as described above. Studies I, II, and V included acetogenin determination for both fruit mesocarp and seed tissues.
Determination of moisture content
Dry weight was determined by weighting 5 g of material (seed or mesocarp), cutting it in thin slices (mesocarp) or small cubes (seed) and incubating at 105 °C until constant weight was achieved (typically 5–6 h) .
Idioblasts were fractionated as described elsewhere [29, 59], with modifications. Briefly, 5 g of avocado mesocarp, in a slice, were cut to small pieces and briefly homogenized at 11000 rpm in 10 mL of a buffer containing 10 mM MES, 100 mM sorbitol, 1 mM CaCl2, 0.2% BSA and 0.2% DTT, at pH 5.5, and lytic enzymes (Cellulase Onozuka RS, 165 units/mL; and Macerozyme R10 mix, 15 units/mL, final concentration; Phytotechnology Laboratories). Oxygen was removed from the headspace with Nitrogen and the homogenate was incubated for 2 h in the dark at 150 rpm and room temperature. Afterwards, the mix was briefly homogenized again, and then filtered through nylon mesh filters of pore size 140 and 61 μm, and washed in the filter with buffer without enzymes. Fraction F140 contained mainly vascular tissue and undigested tissue, and fraction F60 was the idioblast-enriched fraction. Cell integrity and purity (absence of parenchymatic cells) were checked by microscopy, which clearly differentiates intact from burst idioblast cells and also lipid-containing idioblast from parenchymatic cells (described below).
Extractions were made as described previously , namely, tissues were separated, mesocarp (2 g) was cut in even, longitudinal slices, and cotyledons (1 g) were macerated while frozen. On the other hand, idioblast enriched fractions (recovered from 5 g of mesocarp) were analyzed directly. Extraction was achieved by addition of 15 mL of acetone, where samples were homogenized with the aid of a Polytron homogenizer (Ultra-Turrax T25, IKA-Werke, Germany) for 3 min, sonicated for 1 min and clarified by centrifugation at 10000 g at 25 °C for 10 min. A 1 mL aliquot was then taken and dried under nitrogen, redissolved in 2 mL of water and added 2 mL of dichloromethane, and the organic phase was recovered, dried, resuspended in 1 mL isopropanol and filtered through a 0.2 μm PTFE filter, for HPLC injection. The extraction was made under dim light and for every step following homogenization, air was displaced from the headspace using nitrogen gas.
Extracts were separated with the aid of a C18 column (Zorbax Extend-C18, 3x100mm, 3.5 μm; Agilent, CA, USA) using a HPLC-VWD (Series 1100; HP, CA, USA) system and a gradient elution program using water (A) and methanol (B) as mobile phases, as stated in  only with a minor modification to column temperature, set to 35 °C. Chromatographic profiles were obtained by measuring absorbance at 220 nm and identities were assigned by comparing the retention times to those with NMR-confirmed, purified peaks by Rodríguez-Sánchez et al. . Calibration curves were generated for every purified compound based on weight, except for Persenone A, for which an extinction coefficient is available. Only peak (3), an Unknown Putative Acetogenin (UPA), was quantified in Persenone A equivalents. Since Persin co-eluted with Persenone B  the chromatographic peak was considered as both Persin and Persenone B, and quantified with a Persenone B calibration curve, as it is the moiety that absorbs the most at 220 nm (Persin absorption maximum is at 208 nm ).
Fatty acid extraction
For lipid extraction, a modified Folch method was used  in which the tissue or an idioblast fraction (0.5 g) was homogenized in a 2:1 solution of dichloromethane:methanol (10 mL) for 3 min, sonicated for 5 min, and left at room temperature for at least 10 min before centrifugation (10,000 g) at room temperature for 5 min. Clarified phase was then vigorously mixed with a NaCl solution (0.9%, 2 mL), then centrifuged (5000 g, 2 min) and the recovered organic phase was evaporated. The remaining oil was then resuspended in a KOH solution (4 mL, 1 M in 96% ethanol) and left overnight at room temperature, under a nitrogen atmosphere, for saponification. The solution was then mixed with water (10 mL), and extracted 3 times with hexane-diethyl ether (1:1, 10 mL). Organic extract was further washed with water (10 mL), which was then mixed with the previous aqueous phase and acidified with HCl to a pH of 3. Fatty acids are recovered from acidified phase with subsequent extractions (10 mL, 3 times) with hexane-diethyl ether (1:1). Organic extracts were evaporated to dryness, re-suspended in isopropanol and passed through a PTFE filter (0.2 μm) prior to injection.
Separation and detection were made by HPLC-ELSD (1200 Series; Agilent) with the aid of a Luna C8(2) column (2.6x75mm, 3.5 μm; Phenomenex) using the vendor application No. 1258  with slight modifications. Solvent gradient was programed to change from 70% acetonitrile in water, to 90% acetonitrile during the first 10 min, followed by a change to 100% acetonitrile by minute 11, and kept for 4 extra minutes, before returning (at minute 15) to the initial conditions for 5 min before the next injection, all at a flow rate of 0.3 mL/min. Detector was set to a temperature of 40 °C, with a gain of 4, with no offset, and a sampling rate of 0.1 s with a gas pressure of 3.3 bars; quantification was made by comparing areas to a curve made with certified standards for each fatty acid (Palmitic, Palmitoleic, Stearic, Oleic, Linoleic, and Linolenic acids), which were purchased from Sigma-Aldrich (St. Louis, MO, United States).
For staining experiments, nucleic acids were stained using 4′,6-diamidino-2-phenylindole (DAPI, ThermoFisher, USA), and a lipid specific dye, Nile Red (Sigma-Aldrich, USA), was used for oil staining following vendor instructions. Idioblast cell integrity was visualized in an AXIO Imager.A2 Microscope (Carl Zeiss, Oberkochen, Germany) with a HXP 120C UV source (OSRAM, Munich, Germany) equipped with a mercury lamp. Cytometric measurements on stained samples of pulp homogenate, idioblast enriched and permeated fractions were performed on a BD FACSCanto II flow cytometer (BD, San Jose, Calif., U.S.A.). Data was acquired from a total of 10,000 events per sample, collected at low flow rate through channels PerCP (670 LP nm band-pass filter) and FITC (530/30 nm band-pass filter), in forward and side scatter. Group discrimination and purity assessment was performed in R, as stated in the Data Analysis section.
All acetone extracts from Study III (Idioblast characterization) were selected to follow the lipidomics pipeline, along with their corresponding extracts from mesocarp and seed in Study I and II, with the exception of the smallest stage (45 g) for which there was not enough sample and was substituted by the next stage (60 g) in mesocarp and seed. Acetone was evaporated in the dark, under vacuum, at 45 °C, until dryness, and resuspended in Isopropanol. Resuspension volume was calculated as to inject a constant amount of dry weight for each tissue.
Extracts were separated with a Luna C18(2) column (150x2mm, 3 μm; Phenomenex, CA, USA) using a HPLC (Series 1100; Agilent, CA, USA) coupled via ESI to a TOF MS Detector (G1969A; Agilent, CA, USA) system and a gradient elution program that included a water:Acetonitrile mix (4:1 v/v; phase A) and an Isopropanol:Acetonitrile mix (9:1 v/v; phase B) as mobile phases, both modified with 10 mM Ammonium Acetate and 0.1% Formic Acid. Samples were separated at 55 °C and the elution gradient had a constant flow of 0.2 mL/min. The 65-min gradient consisted of linear ramps from 40% to 43% B (6 min); jumping to 50% B at minute 6, and ramping linearly to 54% B until minute 36; then changing immediately to 70% B and linearly increasing to reach 99% B by minute 54. This condition was kept until minute 55, when column returned initial conditions (40% B) where it equilibrated (10 min). ESI drying gas (nitrogen) was set to 13 L/h, at 350 °C, with a nebulizer pressure of 35 psig; capillary voltage was set to 4.5 kV to favor fragmentation, and the optical parameters were set to 250, 225, and 60 V for the octopole radio frequency voltage (Oct RFV), fragmentor and skimmer, respectively. Runs were performed to acquire mass spectra in positive mode, and files were saved in profile mode, with an m/z range from 150 to 1500 m/z, and reading at 0.94 cycles per second, with a total of 10,000 transients per scan. Samples were injected in a random manner, and began with a set of 5 ‘dummy’ runs, where the same amount of a mix of all samples was injected.
Raw files were converted to CDF using Agilent’s Translator Utility (Agilent, CA, USA), and processing was done in the MZmine 2.15 platform. . GridMass algorithm  was used for peak detection and base line correction was performed using an in-house implementation of a 2D–baseline correction method as a module for MZmine, which is available in http://bioinformatica.mty.itesm.mx/baseline2d. The baseline algorithm works by considering a range of time points from a window in m/z to reduce the background, which is estimated by a percentage of observed data within the window. This algorithm was run, considering an m/z window of 0.01, a retention time (RT) window of 1.5 min, and a 40% quantile; and peak detection via GridMass using a minimum height of 1000 counts, an m/z tolerance of 0.05, a RT window between 0.1 and 2.5 min, a smoothing time of 0.1 min and an intensity similarity ratio of 0.5. After feature detection, isotopic peaks grouping was performed using an m/z tolerance of 0.001 or 10 ppm, and a RT tolerance of 0.25, assuming a monotonic shape and a maximum charge of 2, with the lowest m/z as the most representative isotope. Alignment of features was achieved by the RANSAC algorithm with an m/z tolerance of 0.025 m/z, or 50 ppm, a RT tolerance of 2 min before and 1.5 min after RT correction, a minimum of 25% points matching the non-linear model below a threshold of 1 min. Finally, gap filling was performed using our own algorithm in R integrating the intensity over non-detected peaks in the RT window predicted using the detected peaks which better predicted the RT window of the detected features in the sample, with an m/z tolerance of 0.025 or 50 ppm. Further analysis of the resulting feature tables was performed in R, as stated in the Data Analysis section.
Given the nature of the data, in which molecules may be confounded with isotopes of members of the same family, which is also rich on isomers, many of the detected features included isotopes and artifacts of the feature detection. Also, different adducts of the same molecule may be present, and, given the high voltage selected, molecules may be subject to fragmentation, which can yield information on their structure. Therefore, after processing the raw data with MZmine, an automated grouping procedure was performed in R. On the selected list of features, an in-house built algorithm was used to extract from the raw files information of the peaks in the samples, such as m/z, retention time, and intensity values from each measurement between the full width at quarter maximum (FWQM) of the chromatographic peak. Second, features were compared among each other, and if the retention times overlapped in some point, they were considered as “candidates” to belong to the same compound. Later, this candidate list was further trimmed based on the correlation of the respective intensities of the peaks at each time point, and were considered to belong to the same molecule only if their correlation was above 0.9. Since correlation rapidly degenerates as RT shifts, this threshold is equivalent to a shift of one scan (<1 s) even if the peak follows the exact same peak shape. An example of this grouping is shown and explained in Additional file 1: Figure S10. Selected peaks that belonged in the same group were considered as probably belonging to the same molecule, and were manually cleaned from isotopes by visually assessing the experimental mass spectra, and curated in search for adducts or fragments in the selected and automatically identified features.
Assignation of identity
Using the same information extraction algorithm from the grouping method (2.7.3) each peak was assigned a mean m/z measurement and its standard deviation, and was assigned a charge based on the first isotope. Monoisotopic masses of the most common adducts in positive mode for single ([M + H]+, [M + NH4] +, [M + Na] +, and [M + K] +) and double charged (combinations of the previous adducts, e.g. [M + 2H] 2+, [M + H + NH4]2+, etc.) molecules were substracted from each feature mean m/z, and the resulting exact masses (4 for each single charged feature, 16 for double charged) were automatically searched using the Lipid MAPS® Representational State Transfer service  to access the Lipid MAPS® structure database . The m/z window for the search was taken as 3 standard deviations of the m/z measurement, plus 25 ppm, and assignations were further cleaned by comparing the theoretical isotopic pattern of the molecular formula (including the adduct) with the experimental intensities, using as an allowed window the standard deviation of the intensity divided by the square of the correlation to the monoisotopic feature, plus 5% of the intensity value. Information retrieved included Molecular Formula, Name and Lipid MAPS® classification .
Therefore, a single feature could be assigned more than one putative molecular formula, each of which could in turn represent one or more known compounds. However, a set of different compounds that differ in identity, may belong to the same category in the Lipid MAPS® Lipid Classification System, having a similar biological role. Thus, using the main and secondary lipid classes, we estimated an “average” composition of each sample by estimating the proportion of ‘fixed’ and ‘approximate’ composition of all possible annotations for each mass. The ‘fixed’ composition represents the annotations that are independent of the choice of the assigned compound either because it contains only one compound or because all compounds are annotated as belonging to the same family. The ‘approximate’ composition represents a weighted average composition from all possible compound annotations. For instance, the single-charged feature 421.3–12.0 has an average m/z of 421.2893, and was assigned two possible molecular formulas: [C25H40O + H]+ and [C27H42O + K]+. These formulas have in total 12 possible assignations (1 and 11, respectively): one belonging at the second level to “Bile acids and derivatives [ST04]”, three to “Secosteroids [ST03]”, and eight to “Sterols [ST01]”. Therefore, it was annotated at the second level with an ‘approximate’ composition of 8.3% [ST04], 25% [ST03], and 66.7% [ST01]. However, since all those assignations fall within the category of “Sterol Lipids [ST]”, it was assigned as a ‘fixed’ [ST] at the first level. The analysis presented in the main manuscript correspond to the first level of annotations while the analysis presented in the supplementary material corresponds to curated assignations at the identity level.
All algorithms and statistical procedures, such as Analysis of Variance (ANOVA), t-tests, and Principal Component Analysis (PCA) were made using the R platform  with the stats library, unless otherwise stated. Grouping by Tukey’s Honestly Significant Difference (HSD) was done with the aid of the agricolae package . For a result to be considered significant, a p-value threshold of 0.05 was set for the ANOVAs and t-tests; similarly, an α value of 0.05 was used for Tukey’s HSD. While linear model regressions were estimated using the R stats library, non-linear regression parameters were calculated using Microsoft Excel® (Microsoft Office Professional Plus 2013) “Add Trendline” function. Contents mentioned in the text are shown as mean and standard deviation unless otherwise stated. Flow cytometry data was accessed by use of the flowCore library , and purity assessed by predictive Linear Discriminant Analysis (pLDA) with the DiscriMiner library . Direct access to MS files was achieved through ncdf package , and generation of theoretical exact mass and isotopic pattern calculations, using Rdisop library . Previous to lipidomics analyses, the matrix with the raw intensities was quantile-normalized and log-transformed; then, centered and scaled feature-wise. All images here presented are of our own creation, using a combination of R, Microsoft Excel® and PowerPoint® (Microsoft Office Professional Plus 2013), and ACD/ChemSketch (ACD/Labs version 12.01, 2010) for chemical structures.