RNA SEQUENCING AND EUKARYOTIC ORGANISMS , WITH SPECIAL REFERENCE TO HAEMONCHUS CONTORTUS : A MINIREVIEW

Saeed El-Ashram 1,2 , Ibrahim Al Nasr 3,4 , Fathi Abouhajer 5,6 , Rashid mehmood 7,8 , Min Hu 9 and *Xun Suo 1 . 1. National Animal Protozoa Laboratory, College of Veterinary Medicine, China Agricultural University, Beijing 100193, China. 2. Faculty of Science, Kafr El-Sheikh University, Kafr El-Sheikh, Egypt. 3. College of Science and Arts in Unaizah, Qassim University, Unaizah, Saudi Arabia. 4. College of Applied Health Sciences in Ar Rass, Qassim University, Ar Rass 51921, Saudi Arabia. 5. Asmarya University for Islamic Sciences, Zliten, Misrata, Libya. 6. College of Animal Sciences and Technology, China Agricultural University (CAU), Beijing 100193, China. 7. College of information science and technology, Beijing normal university, Beijing, china. 8. Department of Computer Science and Information Technology, University of Management Sciences and Information Technology, Kotli Azad Kashmir, 11100, Pakistan 9. State Key Laboratory of Agricultural Microbiology, Key Laboratory of Development of Veterinary Products, Ministry of Agriculture, College of Veterinary Medicine, Huazhong Agricultural University, Wuhan 430070, Hubei,China. ...................................................................................................................... Manuscript Info Abstract ......................... ........................................................................ Manuscript History

613 specific set of proteins in very peculiar quantities).Gene inactivation (knocking it out), which was for a long time the only common approach in genetics available to study the impacts of this knockout in other genes, is extremely slow, expensive and inefficient for a large-scale screening of several genes.RNA-seq allows the screening of many genes at the same time.By taking a snapshot of a whole gene expression pattern in a given cell or tissue, many tissues can be compared with each other or a tumor with the healthy tissue surrounding it.Furthermore, the impacts of drugs or stressors on different tissues can be monitored by the gene expression levels.The phenomena related to aging or fetal development can be understood by gene expression.Screening tests can be designed for a myriad of conditions distinguished by specific gene expression patterns.Drug development, diagnosis, comparative genomics, functional genomics and many other fields may benefit hugely from RNA-seq technology that allows accurate and relatively economical collection of gene expression information for many genes at a time.The impact of parasitic elements (e.g.larval-stages, cercarial stages, sporozoites, and tachyzoites) on the different predilection site in the host can be investigated by RNA-seq technology.Gene expression of parasites at different developmental stages can be carried out by NGS technology for discovery and comparison of gene expression patterns.A wide variety of RNA-seq applications have been reviewed in detail elsewhere [1].

RNA sequencing-tool for gene expression analysis:-
The development of molecular approaches, for example, candidate gene method (e.g.real-time polymerase chain reaction) and exploratory technique (e.g.microarrays) has fascilitated the exploration of the gene expression or transcriptional profiles.The current gold standard for protein-coding gene annotation is expressed sequence tag (EST) or full-length cDNA sequencing followed by alignment to a reference genome, but it has been calculated that most EST studies using Sanger sequencing discover approximately 60% of cell transcripts, which declines to disclose the low abundant and long transcripts [2].This information gap can be addressed by exploiting the RNAseq technologies.RNA-seq is a vigorous tool to unravel the complex landscape and dynamics of transcriptomes at an exceptional level of sensitivity and accuracy [3; 4; 5].This approach offers a number of advantages compared to other technologies, including microarrays.These are: unbiased detection of novel transcripts, broader dynamic range, compatible with any species, easier detection of rare and low-abundance transcripts, better estimate of relative expression levels of any genomic region with higher technical reproducibility, facilitating the alternative splicing detection [6; 7].Along with these advantages, RNA-seq has been employed to reassemble the whole organism transcriptome [8; 9].With today's advances in RNA-seq technology, enormous sets of gene expression data can be generated.Such catalogues are known as gene expression or transcriptional profiles, and the data collecting process is named profiling.RNA-seq technology allows rapid profiling and deep mining of the transcriptome.While the mRNA-seq application requires especial lab methods (poly-A selection for mRNA purification from total-RNA, reverse transcription into cDNAs), the instrumental rationale for mRNA-seq is similar to that of Genome-seq.As for reference-based mRNA-seq application, illumina single-end or paired-end layouts are favored [10; 11](Table 1).Transcriptional profiling:-RNA-seq is a currently developed method for transcriptome profiling [4].Investigations using this approach have altered our understanding of the magnitude and complexity of prokaryotic and eukaryotic transcriptomes [12; 13; 14; 15; 16].To date, next generation sequencing technologies have been employed to create transcriptomes for diverse species and tissues [13; 15; 17].For example, a study employed the 454 technology to produce 391,157 EST reads from the brain transcriptome of the wasp Polistes metricus [18].The reads were then aligned to the genome sequence and EST resources from the honeybee, Apis mellifera, to annotate P. metricus transcripts.Strikingly, the study observed wasp EST matches to 39% of the honeybee mRNAs and detected a robust correlation between the expression levels of the corresponding transcripts from the two species.RNA-Seq has been employed to precisely monitor gene expression during yeast vegetative growth [17], yeast meiosis [19], and mouse embryonic stem (ES) cell differentiation [12], to track gene expression changes during development, and to provide a " digital measurement " of gene expression difference between different tissues.RNA-seq has disclosed diverse novel transcribed regions and splicing isoforms of known genes, and has mapped 5 ′ and 3 ′ boundaries for many genes.In this context, the starts and ends of most transcripts had not been precisely determined, and the extent of spliced heterogeneity remained poorly known before the advent of RNA-seq.Using RNA-seq technology, the 5 ′ and 3 ′ boundaries of 80% and 85% of all annotated genes, respectively, were mapped in Saccharomyces cerevisiae [17].Furthermore, in Schizosaccharomyces pombe [19], several boundaries were delineated by RNA-seq data in conjunction with tiling array data.In humans, 31,618 known splicing events were validated (11% of all known splicing events) and 379 novel splicing events were identified [20; 21].In mice, extensive alternative splicing was charcterized for 3462 genes [13].Moreover, results from RNA-seq propose the existence of many novel transcribed regions in every genome assessed, including those of Arabidopsis thaliana [22], mouse [12; 13], human [20], S. cerevisiae [17], and S. pombe [19].The high-throughput paired-end Illumina technology was employed to explore the haemocytes of O. vulgaris transcriptome (de novo sequencing), identify genes involved in immune defense, and understand the molecular basis of octopus tolerance/resistance to coccidiosis [23].Furthermore, dual RNA-seq of parasite and host reveals gene expression dynamics during filarial worm Brugia malayi-mosquito Aedes aegypti interactions [24].The transcriptional profiles of the parasitic nematode Strongyloides stercoralis disclose different regulation of canonical dauer pathways [25].High-throughput RNA sequencing (RNA-seq) has played a crucial in providing a concise view of the Leishmania major promastigote stage global transcriptome [9], establishing and enlightening current expression datasets, and providing a solid foundation for drug discovery and vaccine development [26], and studying of the peripheral-blood mononuclear cells (PBMCs) transcriptome from Fasciola hepatica-infected sheep [27].A recent study by [28] examined the transcriptome profiling of differentially expressed genes of H. contortus-infected resistant Canaria Hair Breed (CHB) and susceptible Canaria Sheep (CS).

RNA-seq experiment, data generation and analysis:
All RNA-seq experiments follow a similar protocol.The currently used method can be listed as follows:-RNA extraction:-Total RNA from fundic abomasal samples of sheep was isolated employing Trizol (Invitrogen, Carlsbad, CA, USA) followed by DNase digestion, as previously reported [29; 30; 31].1% agarose gels was exploited to monitor RNA degradation and contamination.The Nano Photometer® spectrophotometer (IMPLEN, CA, USA) was used for checking RNA purity.Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA) in combination with its RNA Nano 6000 Assay Kit was used for RNA integrity assessment.
Library preparation:-RNA-seq library was prepared according to the procedure used by [32] as follows: A total amount of 1 μg RNA per sample was exploited as input material.NEBNext®Ultra™ RNA Library Prep Kit for Illumina®NEB, USA) following manufacturer's recommendations was conducted for the generation of sequencing libraries.For sorting and identification of sequences to each sample, index codes were added.Briefly, the poly (A)-containing mRNA molecules were purified from total RNA employing poly-T oligo-attached magnetic beads.Following purification, the poly (A)-containing mRNA molecules were fragmented using divalent cations in NEBNext First Strand Synthesis Reaction Buffer (5X) under elevated temperature.The first strand cDNA was synthesized from the cleaved RNA fragments using reverse transcriptase and random primers.Subsequently, the second strand cDNA synthesis was carried out exploiting DNA polymerase I and RNase H. T4 DNA polymerase and Klenow DNA polymerase were used to convert overhangs into blunt ends via exonuclease/polymerase activities.Furthermore, NEBNext Adaptor with hairpin loop structure was ligated to prepare for hybridization.Following adaptor ligation, 615 cDNA fragments of preferentially (approximately 250 to 300 bp) were selected on a gel and the library fragments were purified with AMPure XP system (Beckman Coulter, Beverly, USA).Adaptor-ligated cDNAs were treated with 3 μl Uracil-Specific Excision Reagent enzyme mix (USER; NEB) at 37 °C for 15 min followed by heat inactivation at 95 °C for 5 min.The clonal amplification of the fragments was conducted with NEBNext Q5 Hot Start HiFi PCR Master Mix, Universal PCR primers and Index (X) Primer, and subsequent purification of PCR products (AMPure XP system) and evaluation of library quality on the Agilent Bioanalyzer 2100 system.For information about the impact of RNA extraction methods and library selection schemes on RNA-seq data, we direct the reader to the published articles [33; 34] Clustering and sequencing:-The index-coded samples were clustered on a cBot Cluster Generation System employing the TruSeq PE Cluster Kit v3-cBot-HS (Illumina, San Diego, USA), as stated in the manufacturer's instructions.Then, the library preparations were sequenced with the generation of 150 bp pair-end reads on an Illumina Hiseq 4000 platform.We refer the reader to [35] for the current situation.
Data processing and quality control:-Raw data (raw reads) in FASTQ format were processed to obtain clean data (clean reads) by trimming the adapter sequences out of the reads (Trimmomatic software v0.33), and filtering read-containing ploy-Ns (Ns>10% in a read), low quality reads (Q<=20) greater than 50% using in-house C scripts.Consequently, the Q20, Q30, GC content of the clean data were calculated.All downstream analyses were carried on clean and high-quality data.The clean reads were aligned to the reference genome using Tophat2 (v2.1.0)[36].

Differential gene expression (DGE) Analysis:
-Differential expression analysis was conducted employing DESeq 2 packages [27] for comparisons among sample gene from different experimental conditions.To determine the statistically significant differential expression, corrected P-value (q-value) < 0.05 and |log 2 (fold change)| > 1 were set as the threshold for significantly DEGs [37].
Functional annotation enrichment analyses:-Gene Ontology (GO) database and Kyoto Encyclopedia of Genes and Genomes (KEGG) database were selected to perform DEGs enrichment analysis under different experimental conditions.Goseq R Bioconductor package was implemented for performing Gene Ontology analysis of RNA-seq data [38].GO terms with adjusted P-value less than 0.05 were considered as significantly enriched transcripts.KOBAS v2.0 (KEGG Orthology Based Annotation System) was used to identify the statistical enrichment of differentially expressed genes in KEGG pathways employing hypergeometric test [39].KEGG terms with corrected P-value less than 0.05 were considered statistically significantly enriched genes (Fig. 1) (https://www.illumina.com/techniques/sequencing/rna-sequencing.html).The commonly employed RNA-Seq term explanation is illustrated in Table 2.

Protein-protein interaction networks (PPI):-
Interactions between proteins can be predicted through an array of computational methods and databases [40; 41] .

Novel transcripts prediction:-
Reference Annotation Based Transcript (RABT) assembly method was built upon the Cufflinks v2.1.1 by constructing and identifying both known and novel transcripts from TopHat alignment results in the context of an existing annotation.

Single nucleotide polymorphisms (SNP) Analysis:-
Picard-tools v1.96 (http://sourceforge.net/projects/picard/files/picard-tools/1.96/) and samtools v0.1.18[43] were employed to classify, remove coupled reads and merge the bam alignment results of each sample.The Genome Analysis Toolkit (GATK2) software was adopted to conduct SNP calling [42; 44].are consistent with our recent data as illustrated below.We conducted transcriptome sequencing of the ovine abomasal tissues using the Illumina Hi Seq 4000 platform to segregate early and late H. contortus-infected sheep (7 and 50 days post-infected groups [7 dpi and 50 dpi], respectively) from the control naive ones.We accredit the reader to reviews and articles by [54; 55; 56; 57] for detailed information about Haemonchus contortus and its ovine host.By random selection, 13 genes with (overexpressed or repressed) and without differential expression were chosen for verification by quantitative RT-PCR, which was performed as follows: the same total RNA employed for RNA-seq was reverse transcribed employing EasyScript® Reverse Transcriptase (Beijing TransGen Biotech Co., Ltd) and SYBR green-based RT-PCR was conducted by using SYBR ® Select Master Mix (Applied Biosystems; Cat: 4472908) according to the instructions made by the manufacturer.The results were expressed as fold-changes [58].
A Correlation analysis (Graphpad Software, San Diego, Calif) was performed between the RNA-seq and RT-PCR fold-change results using the same RNA samples before pooling.Additionally, experiments were conducted in triplicate, and data are displayed as mean ± SD.

Validation of transcriptome results by real-time PCR:-
To validate the transcriptome data, both differentially and non-differentially expressed genes were selected for realtime polymerase chain reaction (PCR) analysis, which also showed similar trends concordant with the Illumina sequencing data indicating the reliability of the comparative analysis of our transcriptomes.As expected, transcriptspecific fold-change in the same RNA samples was highly consistent between the RNA-seq and RT-PCR methods, which were corroborated in the correlation analysis.For the selected 13 DEGs (Fig. 2A, B, C, D and E), there was a firm correlation between RNA-seq and RT-PCR results (r 2 = 0.9998), substantiating the reliability of differential gene expression analysis adopting RNA-seq. 617

Read length
It relies on the desired results of the experiment.For gene expression profiling, 50 bp single-end reads would be sufficient for most studies..For detecting currently unknown transcripts, novel splicing isoforms, gene fusion, etc., longer (150 bp) reads offer an advantage. [60] Read depth or coverage Coverage = (total number of bases generated) / (size of genome sequenced).In other words, the average number of reads that align to, or "cover," known reference bases.Lowexpressing gene measurement or novel feature identification needs more coverage. [61] Complementary DNA (cDNA) Any DNA that is obtained from an mRNA template via reverse transcription. [62] Expressed sequence tags (EST) cDNA (sub)sequences derived from a single read of a cDNA sequencing experiment. [62] RNA-seq High-throughput sequencing technology utilization to describe entire transcriptomes. [4]

Transcriptome
The complete set of transcripts in a cell, and their quantity, for a specific developmental stage or physiological condition.

Conclusion and future directions:-
We have attempted to provide a snap shot of RNA-seq technology as a tool for gene expression analysis.Future research may exploit RNA-seq to provide a dual RNA-seq time course analysis of H. contortus and ovine host.

Figure 1 :
Figure 1:-An RNA-Seq analysis workflow (Beijing Allwegene Technology Co., Ltd, China).Quantitative reverse transcriptase-PCR (RT-PCR) analysis for validation of RNA-seq Results:-Since the entire transcript is assessed in a more or less unbiased manner, probe bias, poor sensitivity and reduced linear range are not as problematic in RNA-seq experiments.However, real-time-PCR discrepancies may be due to its oligo (dT) primer and probe-bias based on what region of the cDNA is amplified[4; 45; 46].A large and growing body of literature has reported a strong correlation between these methods[47; 48; 49; 50; 51; 52; 53].These results are consistent with our recent data as illustrated below.We conducted transcriptome sequencing of the ovine abomasal tissues using the Illumina Hi Seq 4000 platform to segregate early and late H. contortus-infected sheep (7 and 50 days post-infected groups [7 dpi and 50 dpi], respectively) from the control naive ones.We accredit the reader to reviews and articles by[54; 55; 56; 57]  for detailed information about Haemonchus contortus and its ovine host.By random selection, 13 genes with (overexpressed or repressed) and without differential expression were chosen for verification by quantitative RT-PCR, which was performed as follows: the same total RNA employed for RNA-seq was reverse transcribed employing EasyScript® Reverse Transcriptase (Beijing TransGen Biotech Co., Ltd) and SYBR green-based RT-PCR was conducted by using SYBR ® Select Master Mix (Applied Biosystems; Cat: 4472908) according to the instructions made by the manufacturer.The results were expressed as fold-changes[58].A Correlation analysis (Graphpad Software, San Diego, Calif) was performed between the RNA-seq and RT-PCR fold-change results using the same RNA samples before pooling.Additionally, experiments were conducted in triplicate, and data are displayed as mean ± SD.

Table 2 :
-The commonly employed RNA-seq term explanation.
63; 64] -8 nt) introduced as part of adapters.It provides unique identifier for each sample, tolerance of 1-2 sequencing errors, pooling samples to mitigate lane effects, and allowing deep multiplexing due to dual barcodes.