An Analysis on General-purpose Computing on Graphics Processing Units (GPGPU) Processing Applying CORBA Based Distributed Framework

Reader Impact Factor Score
[Total: 5 Average: 5]

Published on International Journal of Informatics, Technology & Computers
Publication Date: July 25, 2019

Dewan Juel Rahman, Salman Ahmed Sizar, Yasser Khan & Monirul Islam
CSE, Jahangirnagar University, Dhaka, Bangladesh
CSE, Rajshahi Science & Technology University, Natore, Bangladesh
CSE, East West University, Dhaka, Bangladesh

Journal Full Text PDF: An Analysis on General-purpose Computing on Graphics Processing Units (GPGPU) Processing Applying CORBA Based Distributed Framework.

We present a CORBA based distributed system that implemented to execute CUDA program from a remote GPU enable machine. CPU is unable to execute that program. Therefore, we introduced CORBA based distributed system to provide the services so that CPU users can get the facilities of GPU based system. Where CPU users will act as client and GPU, based system will act as server. Clients can request to the server to use its’ GPU based system which is costly setup for users.

Keywords: CORBA, CUDA, GPU, Distributed System.

GPU enabled machines are costlier compared to CPU enabled machines. A simple way to understand the difference between a CPU and GPU is to compare how they process tasks. GPU can implement a lot of task at a single time where CPU is unable to do that. Therefore, we have introduced CORBA based distributed system which is language and platform independent. CPU users can invoke to GPU enabled machines so that CPU users can use GPGPU processing.

1.1. Objective
• To introduce CORBA architecture with GPGPU
• Providing GPU services to clients
• Saving consumptions of time
• Drawing a standard design so that the oversized data might not result in retaliate in long run by adding overhead to the total execution time

We define distributed system as one in which software or hardware located at networked computers communicates or coordinate their actions by passing messages. [1] A distributed architecture is an architecture supporting the development of applications and services that can exploit a physical architecture consisting of multiple, autonomous processing elements. Those elements do not share primary memory but cooperate by sending messages over the network. A distributed system is a collection of networks that appears to the users of the system as a single coherent system.

Figure 1 distributed System Architecture

A distributed system organized as middleware. A middleware layer runs on all machines, and offers a uniform interface to the system. Middleware is a general term for software that serves to “glue together” separate, often complex and already existing, programs. Some software components that frequently connected with middleware include enterprise applications and Web services. Different time different Middleware layers are introduced. Some of the Middleware layers:
• RPC (Request Procedure Call)
• RMI (Remote Method Invocation)
• CORBA (Common Object Request Broker)
• DCOM (Distributed Common Object Model

CORBA is a software standard that defined and maintained by the Object Management Group (OMG). Common Object Request Broker Architecture (CORBA) is an architecture and specification for creating, distributing, and managing distributed program objects in a network. It allows programs at different locations and developed by different vendors to communicate in a network through an “interface broker”.

Figure 2 IDL and ORB

CORBA automates many common network programming tasks such as object registration, location, and activation; request de-multiplexing; framing and error-handling; parameter marshalling and un-marshalling and operation dispatching. The most essential concept of this architecture is Object Request Broker (ORB). ORB support in a network of clients and server on different computers that means a client program can request services from a server program or object without having to understand where the server is in a distributed network or what the interface to the server program looks like. To make requests or return replies between the ORBs, programs use the General Inter-ORB Protocol (GIOP) and, for the Internet, it’s Internet Inter-ORB Protocol (IIOP). IIOP maps GIOP requests and replies to the Internet’s Transmission Control Protocol (TCP) layer in each computer. CORBA is a software standard that defined and maintained by the Object Management Group (OMG).

Figure 3 distributed System Architecture

3.1. Security
• RPC Provides a high-level security framework
• Supports authentication of remote users and services, access control for key objects and services, auditing functions, ability to establish secure communications channels between clients and object services.
• Encryption functions are not included in the framework.

Using GPU to compute general computational work is known as GPGPU (General purpose GPU). GPU has hundreds of cores that can execute multiple numbers of instructions in parallel and it is far greater than modern CPUs with 4 or 8 cores. GPU Processing Capacity limited to independent fragments. But these fragments can process in those cores in parallel. So, a programmer should choose those parts of a program that can be fragmented and those fragments are independent from each other. So, it can be said that, GPU can process a multiple time of a same operation on many records in a stream in parallel. A set of records needs similar computation is the Stream. We can say that, Streams provide the parallelism. The GPU multiprocessors worked as co-processors for CPU. It more likes an acceleration device for CPU. When CPU invokes a kernel to GPU that kernel executes in parallel number of times in GPUs cores. So, how many tasks a GPU can complete at a time depends on its number of SM and cores per SM. Simple adding more SM can make a device completed more task. Streaming Multiprocessors (SM) are accumulation of multiple independent operators known as core/streaming processor. Upon receiving an execution command from CPU, the GPU SMs are awakened and distributed with equalized workload of “responsibilities” which are so referred as “kernel. Having two SMs on the delineated GPU, it is capable of executing both SMs at the same time that is where the parallelism kicks in; meaning each workload of two different SMs ends at the same given moment ensued by very same initiating moment. As each kernel is primarily composed of BLOCKS and THREADS, SMs are designed to execute the BLOCKS. Given (i.e.) out of two executable blocks each BLOCKs were accommodated by each SM starts execution at time=0; they are bound to terminate the both operations at the same latter time of time=3. Though the bigger picture can be convenient enough considering SMs to be responsible for parallelism in GPU, it would not be possible without the contribution of streaming processors.
The complete execution process can divided into two parts: first, one is client-side request and other one is client-side acceptance.

4.1 Client side
“Cd C:\thesis” command entering into ‘thesis’ folder that containing the client code.

“idlj -fall CalcyInterface.idl” to create a java binding from a given idl file

“javac CalcApp/*.java” the ‘javac’ tool using on the command read class and interface definitions, written in the java programming language and compiles them into bytecode class files.

If the connection established successfully the blank window will appear of an exe file named ORBD (Object request broker daemon) used to enable clients to transparently locate and invoke persistent objects on servers in the CORBA environment. The port no. and IP address in the command are using to establish a connection from client to server.
The client will be will be shown options as showing above. The code will perform several operations like ADD, SUB, and Merging.

4.2 Server side:
Open directory that contains the server codes.

“idlj -fall CalcyInterface.idl” to create a java binding from a given idl file

“javac CalcApp/*.java” the ‘javac’ tool using on the command read class and interface definitions, written in the java programming language and compiles them into bytecode class files.

Start ORBD application and accept the connection request from server and completed the connection.

Both of the system is now ready for the operations.

Output of a complete CUDA program that adds 2 arrays of 4096 integer each on server side and the operation has been done using GPU cores, take a screenshot of the outputs, and save in on server computer.

In this paper we have developed a server-client program that can send request to the sever end for GPGPU processing. Whether the client is using CPU enabled machine, the user can get the GPU computational result from server end. CORBA is platform and language independent architecture. This program is developed through CORBA based architecture. Our future plan is to establish a GPU enabled server machine and provide service to the client. 3D video rendering needs a highly computational GPU which might have a costly setup for most of the people. Future games also needed highly computational GPU. So, we want to provide the GPU services through GPU this GPU enabled machine server.

[1] George Coulouris, Jean Dollimore, “Distributed System Concepts and Design”, fifth Edition.
[2] Hung, C.L.; Lin, C.Y.; Wang, H.H. “An efficient parallel-network packet pattern-matching approach using
GPUs”. J. Syst. Archit. 2014, 60, 431–439.
[3] Cheng, K.-T.; Wang, Y.-C. “Using mobile GPU for general-purpose computing—A case study of face Recognition on smartphones”, In Proceedings of the 2011 International Symposium on VLSI Design, Automation and Test, Hsinchu, Taiwan, 25–28 April 2011; pp. 1–4.
[4] De Ravé, E.G.; Jiménez-Hornero, F.J.; Ariza- Villaverde, A.B.; Gómez-López, J.M. “Using general-purpose computing on graphics processing units (GPGPU) to accelerate the ordinary kriging algorithm”. Comput. Geosci. 2014, 64, 1–6.
[5] Kirk David B., Wen-mei W. Hwu.,”Programming Massively Parallel Processors – A Hands-on Approach”, Morgan Kaufman Publishers, USA, 2010.
[6] Seiller Nicolas, Singhal Nitin, Kyu Park.,”Object Oriented Framework For Real-Time Image Processing On GPU”, Proceedings of 2010 IEEE 17th International Conference on Image Processing September 26-29,