Gene prediction workflow. (A) RNAseq samples are aligned on the reference genome. (B) Biological replicate alignments are merged together into 64 different datasets. Transcript reconstruction was performed independently on each dataset using three different programs: Cufflinks, Scripture and Isolasso. The Venn diagram shows the percentage of reconstructed transcripts in common among the three software while the numbers between brackets indicates the average number of reconstructed transcripts per sample. We selected only those transcript models predicted by at least two programs and with a length higher than 150 bases. (C) The selected transcripts were assembled using PASA software. (D) PASA assemblies were used to update v1 gene predictions. (E) A new gene prediction was performed integrating with EvidenceModeler (EVM) software different sources of evidence such as PASA transcripts, ESTs and proteins alignments and Augustus prediction trained with PASA assemblies. The produced gene set was compared to v1 gene prediction and only the new gene loci were selected for further analysis. After applying different filtering criteria, we obtained a final dataset of 2,258 new genes. (F) The final v2 gene prediction integrates genes generated by the steps described in D (v1 update) and E (new gene prediction).