DESIGN AND IMPLEMENTATION OF NOVEL NOC ARCHITECTURE ON FPGA.

Shilpa K Gowda 1 , Dr. Rekha K R 2 and Dr. Nataraj K R 3 . 1. Research Scholar, Jain University, Bangalore, India. 2. Professor, ECE Department , SJBIT, Bangalore, India. 3. Head of department, ECE, SJBIT Bangalore India. ...................................................................................................................... Manuscript Info Abstract ......................... ........................................................................ Manuscript History

A 2-Dimensional mesh has low design complexities and very good match to the rectangular processor architecture which makes them the favorite Network-on-Chip (NoC) topology that is most used more often for on-chip communication of processor with multi-core arrays. However, it has some of the basic problems such as local traffic congestion which may arise due to various levels of traffic with other neighbor cores this is a major problem as it increases high latency and huge power consumption for the chip. To overcome the above stated problems, we propose a novel architecture of 6-neighbor hexagonal mesh topology for implementing on an FPGA. The design is developed using verilog hdl language and tested on modelsim for the functional correctness. The architecture developed has also been tested to overcome some of the basic networking problems such as deadlock and livelocks. It is also implemented and tested on latest Xilinx FPGA such as Vertix 6 and Atrix 7 for the physical implementation. The grid topology with 6-neighbor hexagonal pattern is implemented and has a very less area on chip in comparison with the 4-neighbor 2D mesh, it also has much more effective interconnect with the inter-processor that results in an area reduction of 21%, an average power reduction of 17%, and a communication distance among inter processor on an average is decreased by 19% This makes the design more effective as compared to traditional 2D architecture.

…………………………………………………………………………………………………….... Introduction:-
These days there is a huge increase in demands for NoCs based architecture as they are very easily connected to numbers of processors forming many core processor system which is faster in performing task as compared to single core processor system. NoC has also become famous because of their compatibility and easy to use structure which is quite complicated for the other techniques such as global shared bus methodology. The most important factor that affects the other parameters of NoC such as energy efficiency, performance, area, latency etc. is the way the nodes are defined and are placed in NoC architecture for processor connections. Due to the design simplicity and traditional use 2D mesh topology is the one which is most widely used for developing NoC. However, due to high increasing in traffic and more and more core getting connected to the NoC 2D mesh topology is getting hard time to overcome of the major challenges. For example let there be an efficiently mapped ping application that needs a communication from one processor to other which is not connected to adjacent node this makes a processor to act as 2 a routing processor to establish a static interconnection architecture and intermediate router as dynamic router based NoC. In this situation the power consumption of the overall NoC will increase at the same time latency will also go high resulting in huge delay in delivery of the packet at the specified processor this is due to the increased numbers of routers and routing processors.
In various application of homogenous multiprocessors communication is very often largely localized this give rise to new problem of local mapping congestion. Above mentioned problem has a possible solution, just increasing the number of connections with the nodes. This will not only overcome the problem but it will also solve the issue of high power consumption and high area utilization. This has worked as a motivation for us to develop a new architecture for the node with more numbers of local connections and build a new mesh network using the same nodes.
The main involvement of this paper has be divided in three points. First, a 6-neighbor topology network has been proposed in the paper based on newly developed hexagonal-shaped node with dynamic reconfiguration capability. Secondly a complete development a NoC architecture on FPGA using customized XY routing algorithm using the proposed node and also implemented scheme to overcome deadlock causing congestion in the network. Third we have done a complete comparative analysis of our architecture with traditional 4-neighbor 2D mesh topology as show in Fig1. This section also has simulation results and tables to show the comparison. Previous work:-Lot of work has been carried out in the field of NoC to achieve high performance network with high efficiency. Some of the work related to our research has been discussed in the following sections. In paper [1] a NoC structure has been proposed based on reconfigurable architecture which makes use of properties of Field Programmable Gate Array (FPGA) names as partial dynamic reconfiguration. The network is configured based on topology using energy efficient switch based technique. This architecture has an efficient ways to create express lines of communication between SoC components using the dynamic switching of circuit for the channel connection and performs runtime NoC topology.
It also generates reconfigurable routing table that has capability of handling congestion issues of the network this gives minimum over head to the network.
An evaluation of the architecture was done based on mapping an application. A static 2D mesh topology and ReNoC architecture were used in two different topology configurations. The architecture consumes less power compared to the static 2D mesh topology when they are configuring an application specific topology. The topology switches increased the area of the NoC architecture [2]. This technique enables not only reconfiguration but also generates express lines connection and removal which in returns reduces the overhead by 10% as compared to traditional static NoC designs. Also it has been observed that the structure has better latency and improved frequency operations. As there is a very high demand for implementing Network-on-Chip (NoC) that is used in Multi-Processor System on Chip (MPSoC) it is mandatory for NoC to be reconfigurable so that it will deliver good service. Due to present architecture limitations such as high area it is very difficult to support for dynamically reconfigurable designs. dynamically placed on chip for FPGA based reconfigurable network devices. The CuNoC makes use of flexible communication unit that is perfectly suited to reconfigurable devices. The CuNoC can be used with small changes to all others systems that needs a communication medium. CuNoC is that allows a scalable network structure, a simultaneous communication with a good compromise between the logic area and operating frequency [4]. Paper [5] proposes a novel architecture based on reconfigurable communication system for architectures with dynamically reconfigurability. A tile based Network on chip approach has been proposed that has communication layers which are completely different from the traditional computational one. It is specially designed so that it can support a communication at fabric level for dynamic reconfiguration. In any run time application it is very important to assure flexibility and adaptability of the system. It is has been observed that the design of NoC based on dynamic reconfigurable architecture has proved to be the best option. Various routing mechanism has been used in the previous design that has not got the efficiency as dynamic routing. In this process the content is loaded in BRAM block and is used at the time of requirement this is the routing information relies on the current network status. In Heterogeneous system with dynamic reconfiguration the layered approach of Noc has become a backbone for communication with promising solutions. In [6], a future FPGA architecture are considered as an high level routing source. It is considered that they will have a hardwired NoC architecture which will boost the phenomenon. It is very much possible to build a low cost solution based on recongigurable component on top of this architecture. Instead of implementing NoC interface with valuable hardware modules. The model developed in this architecture implements all the basic features of noc with additional feature of dynamic reconfiguration based on tile. DyNoC has been presented [7] the routing methodology used in the paper are capable of overcoming some obstacles of networks. Many problems generated due to placing of component dynamically can be solved by using this architecture as a communication medium.
In papers [8] [9], the adaptive reconfigurable multiprocessor Noc is configured dynamically with best parameters includes new routers, packet size and new efficient switching techniques. They described how reconfigurablity is processed in Smart Network Stack (SNS) includes network control and its simulations, exchanging the data at high speed and formation of circuit between two processing elements in efficient manner. The bus based interface dynamically Reconfigurable with NoC is designed in paper [10] for low cost applications. They are taken core interface model as API Signal for communication bus, router design taking input from IP Cores. The high level platform architectures include ARM7 Processor, Reconfigurable FFT and Veterbi decoder, SDRAM and Memory controller. There are various ways to reduce time one of mentioned in paper [11] is the use of core interface model. The designs completely act as a bus interface module to the NoC architecture. These are dynamically reconfigured with the host controller based on the requirements. They are using dynamic routing algorithms to find out the tasks in NoC Platform. Because of this they can control or tolerate faults in NoC.

Proposed System:-
To overcome the present problem of mapping congestion due to increase in interface of more devices we have proposed a new hexagonal shaped node with all the necessary features. Figure 3 shows the basic top level architecture of single hexagonal node with six port connectivity. As we can see it has total six connection port and an additional local in and local output port. This gives us an opportunity to connect 6 peripherals with the single node to perform more efficiently. Also it has been observed that communication between the hexagonal node is more efficient than the normal 4 port node. The node has a local output which is 32-bit and 6 outputs (out1-out6) each is 48bits.the whole architecture is working based on the priority encoder and routing algorithm. The priority encoder works based on the select line information. The select line is taken from the input packet MSB bits i.e 47th bit which are 7-bits. If the select line is 0000_001 then local input will be select as packet to the routing algorithm.  The proposed system has been implemented in verilog language using Xilinx ISE. Figure 4 shows the obtained top level RTL schematic for the single hexagonal node that was further used in a network for developing the NoC. Routing is a process of finding out the route for the packet to the destination through the network. It is very important to have an efficient algorithm so that it will efficiently help the packet reach the destination with minimum delay and no loss. 5 In this system we have used customized XY routing algorithm to achieve the goal. Figure 5 describe the network connection with neighboring nodes in a hexagonal base node system. Figure 6 shows the obtained RTL View for the proposed system which is a seven node architecture with 48 bit input for each node containing 32 bit of data and reset as information bits for the routing process. Also it has got 32 bit output bus for each node representing the data out form the node.
The XY-algorithm is distributive-deterministic routing algorithm, which is used to avoid the network congestion problems and it transmits the packet either through y-axis direction or x-axis direction. If we use this deterministic algorithm, it provides the traffic free network to Improve the high throughput, low latency of the router design [11][12][13]. As the architecture is made up of 6 port implementation of basic XY-routing was not sufficient hence, we have optimized the algorithm to work for the newly developed node. The algorithm has one more direction Z direction to cater all the direction. Figure 7 shows the basic flow diagram for the modified XYZ routing algorithm. An algorithm starts by checking firstly the current address Z with the destination address z of the incoming packet from the source. Once it is identified and matched with the persisting z address it further completes the same process of XY routing. It will check the incoming packets X address to match with destination x address if it is not matched then it will pass the packet to the next node else if it matches then it checks the final y-axis direction, if all the destination and current address are matched, then packet output will be generated in local output. if it is not matched, then checks others ports. If current address y is greater than the destination address y, packet will be allocated to out1 else out2. If it is not matched, then it checks others ports. If current address x is greater than the destination address x, packet will be allocated to out3 else out4. . If it is not matched, then it checks z-axis. If current address z is greater than the destination address z, packet will be allocated to out5 else out6. The brief flow diagram for the proposed customized XY routing has been demonstrated in the flow diagram in figure7. Figure 8 shows the internal connection RTL schematic for the proposed hexagonal node architecture obtained after the synthesis of the developed code. The analysis of this shows the code is functional and all the connections made are perfect as per the design. Reconfiguration is one of the most important feature and an added advantage to the network. The feature makes the overall design more robust and increase the efficiency of the network. Reconfiguration is the process in which we can increase the number of nodes in given network and also the path can be configured in case of fault nodes. This process is done automated and user need not involve in the process. Reconfiguration technique is used based on the reconfiguration feature of field programeble gate array(FPGA). This feature the design to be more dynamic Table 1 is a representation of the comparision made among different topology for their connection link and number of neighbor they has we can clearly see that though the 6 node hexagonal node has more number of neighbor the connection link is the same which says that the over all structure has more capability than the persisting mesh networks.

Results and Analysis:-
The complete Hexagonal Packet switched NoC is designed and implemented on Xilinx 14.7 ISE and Simulated using Modelsim 6.3f. For implementation we are using Artix 7 Device: 100T, Speed:-3 and Package: CSG 324. The Figure 9 shows the simulation results of single Hexagonal Node6. It contains all the 7 48-bit inputs including local packet and 6 48-bit outputs and single 32-bit local packet output. We can see the time taken to process one input and its generation of output in figure 9. It has taken 1 and half clock cycles to process the operation. Each clock period is 5ns.    The

Conclusion:-
In this Paper, we are concluding hexagonal node pattern for packet switched NoC is mainly used for chip networks which is dynamically reconfigurable in nature. The main purpose of hexagonal node is used to improve the communication between nodes and increase the communication ports to increase the number of devices connected to it. This has given a very big scope for improvement on the present network without increasing the overhead in terms of area and timing. Dynamic reconfiguration plays a vital role in the design giving the architecture a new dimension of improvement and robustness. It has also helped in improvement of newly implemented algorithm to give better results in terms of throughput and decrease the latency. The results are analyzed and design utilization and timing of single and complete hexagonal PS-NoC with different FPGA's are shown above tables. The slice utilization is improved over 40% when compared to normal NoC design architecture. Overall the complete proposed architecture suits to on-chip networks. In future we can design fault-elimination network in complete hexagonal NoC.