3D Perception of Maximum Density Zone on Ramachandran Plots for Zika Virus Protein Structures

The Ramachandran plot is among the most central concepts in structural biology which uses torsion angles to describe polypeptide and protein conformation. To help visualize the features of high-fidelity Ramachandran plots, it is helpful to look beyond the common two-dimensional psi-phi-plot, which for a large dataset does not serve very well to convey the true nature of the distribution. In particular, when a large subset of the observations is found very narrowly distributed within one small region, this is not well seen in the simple plot because the data points congest one another. Zika Virus (ZIKV) protein databank has been chosen as specimen for analysis. This is because the structure, tropism, and pathogenesis of ZIKV are largely unknown and are the focus of current investigations in an effort to address the need for rapid development of vaccines and therapeutics. After a brief survey on Zika Virus, it is shown that when a dense dataset of ZIKV protein databank is passed through a colour-coded scaled algorithm, a three dimensional plot gets generated which gives a much more compelling impression of the proportions of residues in the different parts of the protein rather than representing it in a normal two dimensional psi-phi plot.

important emerging pathogen whose global impact is yet to be discovered [11]. Figure 1 illustrates the spread of Zika virus.

Genome Structure
The Zika virus is a positive sense single-stranded RNA molecule 10794 bases long with two noncoding regions flanking regions known as the 5' NCR and the 3' NCR [5]. The open reading frame of the Zika virus reads as follows: 5′-C-prM-E-NS1-NS2A-NS2B-NS3-NS4A-NS4B-NS5-3′ and codes for a polyprotein that is subsequently cleaved into capsid (C), precursor membrane (prM), envelope (E), and non-structural proteins (NS) [8]. The E protein composes the majority of the virion surface and is involved with aspects of replication such as host cell binding and membrane fusion. NS1, NS3, and NS5 are large, highly-conserved proteins while the NS2A, NS2B, NS4A, and NS4B proteins are smaller, hydrophobic proteins. Located in the 3' NCR are 428 nucleotides that may play a part in translation, RNA packaging, cyclization, genome stabilization, and recognition. The 3' NCR forms a loop structure and the 5' NCR allows translation via a methylated nucleotide cap or a genome-linked protein [12].

Virion Structure of Zika Virus
The structure of ZIKV follows that of other flaviviruses. It contains a nucleocapsid approximately 25-30 nm in diameter surrounded by a host-membrane derived lipid bilayer that contains envelope proteins E and M. The virion is approximately 40 nm in diameter with surface projections that measure roughly 5-10 nm [8]. The surface proteins are arranged in an icosohedral-like symmetry [12]. Figure 3. Virion structure of Flavivirus [7] As part of our analysis we have chosen the following specimens currently available in RCSB protein databank: a) The cryo-EM structure of Zika Virus (PDB ID 5IRE) b) ZIKV nonstructural protein 1 (NS1) (PDB ID 5IY3)

The cryo-EM structure of Zika Virus (PDB ID 5IRE)
A cryo-electron microscopy (cryo-EM) structure of the mature ZIKV at near-atomic resolution (3.8 Å) has been reported in the protein data bank [3]. The structure of Zika virus is similar to other known flavivirus structures, except for the ~10 amino acids that surround the Asn154 glycosylation site in each of the 180 envelope glycoproteins that make up the icosahedral shell [3].
The carbohydrate moiety associated with this 5IRE residue, which is recognizable in the cryo-EM electron density, may function as an attachment site of the virus to host cells [3].

ZIKV non-structural protein 1 (NS1) (PDB ID 5IY3)
The molecular mechanisms of NS1 are relatively well established for Dengue Virus (DENV) and West Nile virus (WNV) and the NS1-encoding sequence is suspected to be a major genetic factor underlying the diverse clinical consequences of infections caused by flaviviruses (over 70 members). However, little is known about the NS1 of ZIKV, which displays different pathogenesis from that of typical flaviviruses [2].
In the protein data bank, virologists have address this lack of information, by expressing the ZIKV NS1172-352 fragment of the BeH819015 strain isolated from Brazil in 2015 in Escherichia coli as inclusion bodies and obtained the soluble protein by in vitro refolding and then solved its crystal structure by molecular replacement to a resolution of 2.2 Å [2].
The ZIKV NS1172-352 protein crystallized as a rod-like homodimer with a length of ~9 nm. Sedimentation velocity analytical ultracentrifugation analyses confirmed that the ZIKV NS1172-352 protein exists as a homodimer (~40 kDa) in solution [2].
In the below figure, The ZIKV NS1172-352 homodimer structure has a continuous β-sheet on one surface, with 20 β-strands arranged like the rungs of a ladder, in which each monomer contributes ten rungs to the antiparallel β-ladder. On the opposite side of the homodimer, an irregular surface is formed by a complex arrangement of loop structures [2]. Most of those inter-strand loops are short, except for a long 'spaghetti loop' between β4 and β5 that lacks secondary structure. In the below figure, a potential N-linked glycosylation site that is highly conserved in the Flaviviridae family is located in the β3-β4 loop [2]. Glycosylation sites are indicated with green hexagons, and disulphide bonds are indicated with yellow circles. Figure 6. ZIKV NS1 topology diagram [2] II. PROPOSED METHOD AND IMPLEMENTATION

Overview of UCSF Chimera
The Chimera was, according to Greek mythology, a monstrous fire-breathing hybrid creature of Lycia in Asia Minor, composed of the parts of more than one animal. It is usually depicted as a lion, with the head of a goat arising from its back, and a tail that might end with a snake's head.
The term chimera has come to describe any mythical or fictional animal with parts taken from various animals, or to describe anything composed of very disparate parts, or perceived as wildly imaginative, implausible, or dazzling. Chimera is divided into a core and extensions. The core provides basic services and molecular graphics capabilities. All higher level functionality is provided through extensions. Extensions can be integrated into the Chimera menu system, and can present a separate graphical user interface as needed using the Tkinter, Tix, and/or Pmw toolkits [4].
The Chimera core consists of a C++ layer that handles time-critical operations (e.g., graphics rendering) and a Python layer that handles all other functions. All significant C++ data and functions are made accessible to the Python layer. Core capabilities include molecular file input/output, molecular surface generation using the Michel Sanner's Molecular Surface (MSMS) algorithm, and aspects of graphical display such as wire-frame, ball-and-stick, ribbon, and sphere representations, transparency control, near and far clipping planes, and lenses. Another core service is maintenance and display of the current selection. Extensions can query for the contents of the selection [4].
Extensions are written either entirely in Python or in a combination of Python and C/C ++ (the latter using a shared library loaded at runtime). Extensions can be placed in the Chimera installation directory (which would make the extension available to all users) or in the user's own file area. Extensions are loaded on demand, typically when the user accesses a menu entry that starts the extension [4].

Need for 3D Ramachandran Plot
To help visualize the features of high-fidelity Ramachandran plots, we have found it helpful to look beyond the common two-dimensional ϕ, ψ-plot, which for a large dataset does not serve very well to convey well the true nature of the distribution. In particular, when a large subset of the observations is found very narrowly distributed within one small region, (such as occurs in the α-helical region) this is not well seen in the simple plot because the data points occlude one another.
Three-dimensional versions of the Ramachandran plot, with the third dimension representing observations [1], when coloured and scaled, gives a much more compelling impression of the proportions of residues in the different parts of the Ramachandran plot. It also helps us in visualizing the maximum density zone among all the local maxima. Figure 9 illustrates the basic working principle of generating a two dimensional Ramachandran plot by running python scripts in UCSF Chimera: The flow diagrams illustrate the following steps:

Working Principle
Step1: Extract the PDB files from RCSB website. For Zika virus, we have two PDB files namely 5IRE.pdb and 5IY3.pdb Step2: Copy the .pdb file and .py files to the base directory of Chimera tool in Program files.
Step3: Open front panel of Chimera tool and fetch the .py files from the base directory for automated execution. Step4: Visualize the result output as 2D or 3D plot as per file.

Algorithm for 3D Ramachandra Plot
Input: 5ire.pdb/5iy3.pdb Output: 3D Ramachandran plot of 5ire Steps: 1. Import Core header files from Chimera and Numpy folders 2. Import Extensive header files solids and labels from StructBio folder 3 Define maxHeight to 64. (maxHeight does not affect the plot unless the largest bucket value is higher than maxHeight in which case the output is scaled so that the "tallest spire" is only maxHeight units high.) 4. Return VRML string that draws a set of boxes (36 * 36). Each box serves as the color coded bucket. 5. Define baseColor and baseHeight 6. For ii in range of 36 Put in cubes making up the base for each jj in range of 36 Put in boxes for bucket values for jj in range of 36 Generate the histogram by adding box for each (i,j) depending on the frequency Change color code as per the boxed histogram height. 7. Put in the axes in a separate model 8. Take the .pdb file name input and fetch it from base directory using chimera.openModels.open("5ire", type="PDB") //5iy3 9. Look at all residues of the protein rather than a single side chain 10. Display the plot in 3d using VRML viewer an rotate manually to get the desired viewing angle.

Graphical Results
Following 3D plot and 2D plot has been taken after execution of the python code in Chimera.

III. CONCLUSIONS
Using our high-fidelity dataset of Zika Virus, the 3D-plot allows several interesting observations. Most obvious, is the titanic and sharp peak resulting from residues in α-helices. This very sharp peak towers over every other portion of the Ramachandran plot, including the other portions of the classically defined alpha-region. This salient observation suggests that the classically defined alpharegion does not behave as a unit.
Moreover we can easily visualize the maximum density zone in the 3D plot marked with #1 rather than large set of scatter point cluster in the 2D graph. The colour codes being graded gives us a clear picture of how the data in protein data bank files are packed over a narrow region.

IV. FUTURE/PROPOSED WORK
Future works comprise of providing a generalized algorithm to generate a high fidelity 3D plot. This should take box size and maximum height parameters as well. We can even extend this representation as an extension to Protein Data Bank website as a call from JavaScript. The algorithm should also take PDB ID as user inputs rather than hard coding the value in the python script.
Regarding Zika Virus 3D plots, future work should comprise of comparing it with other strains of flavivirus 3D plots to determine the binding positions of the virus and find ways to stop it with suitable antibody.