

# UNIVERSITY OF NOVI SAD TECHNICAL FACULTY "MIHAJLO PUPIN" ZRENJANIN

F

EDUCATION

R

AND

TECHNOLOGY

Z



11

R

DEVELOPMENT

# DEVELOPMENT ITROCONFERENCE" AND EDUCATION INFORMATION TECHNOLOGY INFORMATION

# ZRENJANIN, Oktobar 2020



UNIVERSITY OF NOVI SAD TECHNICAL FACULTY "MIHAJLO PUPIN" ZRENJANIN REPUBLIC OF SERBIA



## XI INTERNATIONAL CONFERENCE OF INFORMATION TECHNOLOGY AND DEVELOPMENT OF EDUCATION ITRO 2020

PROCEEDINGS OF PAPERS



## XI MEĐUNARODNA KONFERENCIJA INFORMACIONE TEHNOLOGIJE I RAZVOJ OBRAZOVANJA ITRO 2020 ZBORNIK RADOVA

ZRENJANIN, OCTOBER 2020

Publisher and Organiser of the Conference: University of Novi Sad, Technical faculty "Mihajlo Pupin", Zrenjanin, Republic of Serbia

For publisher: Dragica Radosav, Ph. D, Professor, Dean of the Technical faculty "Mihajlo Pupin", Zrenjanin, Republic of Serbia

Editor in Cheaf - President of OC ITRO 2020: Dragana Glušac, Ph. D, Assistant Professor

Proceedings editor: Marjana Pardanjac, Ph. D, Professor

Technical design: Dusanka Milanov MSc, Assistant Maja Gaborov MSc, Assistant Marko Blažić BSc, Assistant Nemanja Tasić BSc, Assistant

Circulation: 50

ISBN: 978-86-7672-341-6

CIP - Каталогизација у публикацији Библиотеке Матице српске, Нови Сад

37.01:004(082) 37.02(082)

## **INTERNATIONAL** Conference of Information Technology and Development of Education ITRO (11; 2020; Zrenjanin)

Proceedings of papers [Elektronski izvor] / XI International Conference of Information Technology and Development of Education ITRO 2020 = Zbornik radova / XI međunarodna konferencija Informacione tehnologije i razvoj obrazovanja ITRO 2020, Zrenjanin, October 2020. - Zrenjanin : Technical Faculty "Mihajlo Pupin", 2020. - 1 elektronski optički disk (CDROM) : tekst, graf. prikazi ; 12 cm

Sistemski zahtevi: Nisu navedeni. - Nasl. sa naslovnog ekrana. - Elektronska publikacija u formatu pdf opsega 265 str. - Bibliografija uz svaki rad.

ISBN 978-86-7672-341-6

а) Информационе технологије -- Образовање -- Зборници б) Образовна технологија –
 Зборници

COBISS.SR-ID 26470409

#### PARTNERS INTERNATIONAL CONFERENCE

South-West University "Neofit Rilski" Faculty of Education, Blagoevgrad, Republic of Bulgaria



SOUTH WEST UNIVERSITY "NEOFIT RILSKI"

Technical University of Košice Faculty of Electrical Engineering and Informatics Slovak Republic



University Goce Delcev Stip Republic of Macedonia



#### THE SCIENCE COMMITTEE:

Marina Čičin Šain, Ph.D, Professor, University of Rijeka, Croatia Sashko Plachkov, Ph.D, Professor, South-West University "Neofit Rilski" /Department of Education, Blagoevgrad, Republic of Bulgaria Sulejman Meta, Ph.D, Professor, Faculty of Applied Sciences, Tetovo, Macedonia Márta Takács, Ph.D, Professor, Óbuda University, John von Neumann Faculty of Informatics, Budapest, Hungary Nina Bijedić, Ph.D, Professor, Applied mathematics, Bosnia and Herzegovina Mirjana Segedinac, Ph.D, Professor, Faculty of Science, Novi Sad, Serbia Milka Oljača, Ph.D, Professor, Faculty of Philosophy, Novi Sad, Serbia Dušan Starčević, Ph.D, Professor, Faculty of Organizational Sciences, Belgrade, Serbia Josip Ivanović, PhD, Professor, Hungarian Language Teacher Training Faculty, Subotica, Serbia Ivanka Georgieva, Ph.D, South-West University "Neofit Rilski", Faculty of Engineering, Blagoevgrad, Republic of Bulgaria Miodrag Ivković, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Momčilo Bjelica, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Dragica Radosav, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Dragana Glušac, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Dijana Karuović, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Ivan Tasić, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Vesna Makitan, Ph.D, Assistant Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Marjana Pardanjac, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Snežana Babić Kekez, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Erika Tobolka, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Stojanov Željko, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Brtka Vladimir, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Kazi Ljubica, Ph.D, Assistant Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Berković Ivana, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Nikolić Milan, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Dalibor Dobrilović, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Anja Žnidaršič, Ph.D Professor, Faculty of Organizational Sciences, Kranj, Slovenia Janja Jerebic, Ph.D Professor, Faculty of Organizational Sciences, Kranj, Slovenia Tatjana Grbić, Ph.D Professor, Faculty of Technical Sciences, Novi Sad, Serbia Slavica Medić, Ph.D Professor, Faculty of Technical Sciences, Novi Sad, Serbia Gordana Jotanović, Ph.D Professor, Faculty of Transport and Traffic Engineering, Doboj, BIH Đurđa Grijak, Ph.D Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Snežana Jokić, Ph.D, Assistant Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia Gordana Štasni, Ph.D Professor, Faculty of Philosophy, Novi Sad, Serbia Stojanov Jelena, Ph.D., Assistant Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, Serbia

#### THE ORGANIZING COMMITTEE:

**Dragana Glušac**, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia - Chairman of the Conference ITRO 2020

Ivan Tasić, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Dragica Radosav, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Dijana Karuović, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Marjana Pardanjac, Ph.D, Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Jelena Stojanov, Ph.D, Ass. Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Vesna Makitan, Ph.D, Ass. Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Snežana Jokić, Ph.D, Ass. Professor, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Dusanka Milanov, MSc, Assistant, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Dragana Drašković, MSc, Assistant, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Maja Gaborov, MSc, Assistant, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Maja Gaborov, MSc, Assistant, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Maja Gaborov, MSc, Assistant, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia Marko Blažić, BSc, Assistant, Technical Faculty "Mihajlo Pupin" Zrenjanin, R. of Serbia

All rights reserved. No part of this Proceeding may be reproduced in any form without written permission from the publisher.

The editor and the publisher are not responsible either for the statements made or for the opinion expressed in this publication.

The authors are solely responsible for the content of the papers and any copyrights, which are related to the content of the papers.

With this publication, the CD with all papers from the International Conference on Information Technology and Development of Education, ITRO 2020 is also published.

### INTRODUCTION

For the first time the conference "Information Technology and Development of Education – ITRO 2020" has been held on line, due to the covid-19 pandemic circumstances. The main goal of the conference was scientific discussion and interchange of information and experiences about the implementation of IT solutions in educational technology and the impact of different kinds of crises on children's access to quality education. Thematic fields of the conference are aligned with general trends in education, especially in technical sciences.

At the conference, within the poster session and at the plenary presentation, problems and, conditions were presented in the following areas: Theoretical and methodological issues of modern teaching, Personalization and learning styles, Social networks and their impact on education, Safety and security of children on the Internet, Curriculum of modern teaching, Methodological issues of teaching natural and technical sciences, Lifelong learning and professional development of teachers, E-learning, Management in education, Development and impact of information technology on teaching, Information and communication infrastructure in the teaching process, Improving the competencies of teachers and students. A significant number of papers were related to the implementation of teaching in the context of the COVID 19 pandemic.

At the end of the conference, and based on the papers of our participants, we conclude that the main focus points of this moment in education, which in one of the papers is called the "digital revolution", are the following:

- intensive work on increasing the level of responsibility of all participants in education,
- intensive work on the digitization of teaching content in order to overcome barriers and problems, of which one is certainly the dominant which is students motivation,
- intensive work on increasing competencies and professional support to teachers in the circumstances of a pandemic, different type of crisis and state of emergency,
- necessity of lifelong learning mechanisms,
- encouraging the research of attributes and relatively simple but sufficiently efficient approaches to assessing the metrics of the usability of educational technologies,
- encouraging the media to play a more active role in presenting the situation in the field of education professionally and objectively.

The ITRO Organizing Committee would like to thank the authors of papers, reviewers and participants in the Conference who have contributed to its tradition and successful realization.

We hope that next year our planet Earth will recover and that we will see each other live at the next conference.

We especially want to pay tribute to our late colleague professor Ivan Tasić PhD, as one of the founders of the ITRO conference. Our team thus suffered an irreparable loss, and his name will forever remain on the pages of the conference proceedings.

Chairman of the Organizing Committee Ph.D Dragana Glušac

## CONTENTS

#### SCIENTIFIC PAPERS

| I. Tasić, M. Merdović, E. Terek, J. Rajković and M. Nikolić          |
|----------------------------------------------------------------------|
| THE INFLUENCE OF TEACHER COMMUNICATION SATISFACTION ON THE           |
| TEACHING PROCESS AND STUDENT DEVELOPMENT1                            |
| F. Dedić, N. Bijedić, E. Babović and D. Gašpar                       |
| STUDENT'S SUCCESS PREDICTION IN THE SECONDARY LEVEL OF EDUCATION     |
| USING A LINEAR REGRESSION MODEL                                      |
| M. Đ. Adamović and D. V. Ivetić                                      |
| GUIDELINES FOR DEVELOPMENT OF EDUTAINMENT VIDEO GAMES                |
| J. V. Buralieva and D. Stojanov                                      |
| FOURIER ANALYSIS THROUGH EXAMPLES USING WOLFRAM                      |
| MATHEMATICA                                                          |
| R. Timovski, N. Koceska and S. Koceski                               |
| REVIEW: THE USE OF AUGMENTED AND VIRTUAL REALITY IN                  |
| EDUCATION                                                            |
| M. Lazić, M. Kovačević, N. Tasić and M. Pardanjac                    |
| IMPLEMENTATION OF THE UNIFIED INFORMATION SYSTEM OF EDUCATION        |
| IN HIGHER EDUCATION - SIGNIFICANCE AND EFFECTS                       |
| D. Milosavljev, J. Stojanov, A. Grban and M. Kavalić                 |
| EDUCATION AND KNOWLEDGE IMPROVEMENT OF EMPLOYEES IN DRIVING          |
| SCHOOLS IN THE REPUBLIC OF SERBIA                                    |
| L. Bajrami and M. Ismaili                                            |
| INCORPORATING DIGITAL MEDIA TO MOTIVATE STUDENTS IN EFL              |
| CLASSES                                                              |
| N. Koceska, S. Koceski, B. Pucovski, V. K. Mitkovska and A. Lazovski |
| INVESTIGATING THE EFFECTS OF ONLINE AND FLIPPED CLASSROOM            |
| APPROACH DURING COVID-19 PANDEMIC                                    |

| Lj. Kazi, S. Nadrljanski, G. Gecin, A. Kansara, Z. Kazi, B. Radulović and N. Chotaliya<br>RECOVERY OF PARTITIONED DATABASES BASED ON TIME STAMP DATA<br>AND THE ROLE OF CRUD OPERATIONS: TWO EDUCATIONAL WEB<br>APPLICATIONS |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>G. Molnár, Z. Námesztovszki and Z. Szűts</b><br>SWITCHING TO ONLINE EDUCATION, EXPERIENCES FROM HUNGARY AND<br>SERBIA                                                                                                     |
| <b>L. K. Lazarova, M. Miteva and T. Zenku</b><br>TEACHING AND LEARNING MATHEMATICS DURING COVID PERIOD60                                                                                                                     |
| <b>G. Skondric, I. Hamulic and E. Mudnic</b><br>VERIFICATION OF USER BEHAVIAR MODEL IN P2P STORAGE DISTRIBUTED<br>SYSTEM SIMULATIONS                                                                                         |
| <b>M. Knežević, E. Brtka and I. Vecštejn</b><br>COMPARISON OF SOFTWARE APPLICATION DEVELOPMENT PROCEDURES IN<br>C++ AND C# PROGRAMMING LANGUAGES                                                                             |
| <b>D. Radosav, N. Ljubojev, D. Milanov and M. Ercegovac</b><br>TEACHERS' AND STUDENTS' ATTITUDES TOWARDS DOING HOMEWORK<br>ASSIGNMENTS ONLINE                                                                                |
| <b>A. Tasić, D. Karuović and A. Lunjić</b><br>SIGNIFICANCE AND APPLICATION WEB TECHNOLOGIES IN A TIME OF<br>PANDEMIC                                                                                                         |
| <b>A. Belegisan, D. Glusac and D. Milanov</b><br>CORRELATION BETWEEN SCHOOL SUCCESS AND STUDENTS' DIGITAL<br>COMPETENCIES                                                                                                    |
| <b>M. Bakator and D. Radosav</b><br>ANALYZING THE DIGITAL EDUCATION REVOLUTION                                                                                                                                               |
| <b>N. Koceska and S. Koceski</b><br>MEASURING THE IMPACT OF ONLINE LEARNING ON STUDENTS'<br>SATISFACTION AND STUDENT OUTCOMES USING INTEGRATED MODEL96                                                                       |
| <b>R. Timovski, T. A. Pacemska and B. Aleksov</b><br>USING WORLD REFERENCE LEVEL (WRL) IN THE PROCESS OF RECOGNIZING<br>THE LEARNING OUTCOMES – CASE STUDY                                                                   |
| <b>D. Bikov, M. Pashinska and N. Stojkovikj</b><br>PARALLEL PROGRAMMING WITH CUDA AND MPI107                                                                                                                                 |

| <b>E. Karamazova, M. Kocaleva and T. Jusufi Zenku</b><br>STATISTICAL DATA FOR MODERN COMMUNICATION IN MATHEMATICS<br>SUBJECTS AT FACULTY                                                                    |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <b>M. Bakator and D. Radosav</b><br>RECAP ON SOCIAL MEDIA IMPACT ON EDUCATION186                                                                                                                            |
| <b>M. Kocaleva , B. Petrovska, N. Stojkovikj, A. Stojanova and B. Zlatanovska</b><br>REVIEW OF SENTINEL-2 APPLICATIONS                                                                                      |
| <b>A. Krstev, D. Krstev and R. Polenakovik</b><br>MODELLING WITH STRUCTURAL EQUATION MODELLING – APPLICATION<br>AND ISSUES                                                                                  |
| <b>M. Gaborov, D. Radosav, A. Felbab and M. Mazalica</b><br>REPRESENTATION OF THE PROGRAMMING LANGUAGES IN IT SECTOR IN<br>ZRENJANIN                                                                        |
| <b>M. Ismaili, L. Bajrami and S. Hasani</b><br>ENHANCING EFL STUDENTS' COMMUNICATIVE SKILLS BY USING LEARNING<br>APPS                                                                                       |
| <b>S. Dimitrijević and V. Devedžić</b><br>USABILITY EVALUATION IN SELECTING EDUCATIONAL TECHNOLOGY208                                                                                                       |
| <b>E. Tosheva</b><br>3D MODELING SOLUTIONS IN THE CLOUD215                                                                                                                                                  |
| <b>Đ. Milošević, D. Subošić, P. Vasiljević, V. Nikolić and B. Markoski</b><br>POSSIBILITIES OF USING BIG DATA ANALYTIC IN POLICE WORK217                                                                    |
| <b>J. Milenković, M. Pavlović, V. Nikolić, A. Jašić and V. Premčevski</b><br>EXAMPLE OF CLUSTERING USING K-MEANS METHOD IN PYTHON223                                                                        |
| <b>M. Stojičević, P. Vukašinović, V. Nikolić, B. Markoski and V. Premčevski</b><br>EXAMPLE OF FUZZY-BASED SEARCH MECHANISM IN PYTHON226                                                                     |
| <b>N. Pena, D. Karuović and J. Bushati</b><br>USER EXPERIENCE IN DEVELOPMENT OF THE WEB APPLICATIONS231                                                                                                     |
| <b>B. Tomić, N. Milikić, J. Jovanović and V. Devedžić</b><br>EXAMINING ATTENDANCE, PERFORMANCE AND INTEREST IN A CS COURSE<br>IN RELATION TO STUDENTS' ACHIEVEMENT GOAL ORIENTATION AND SELF-<br>EVALUATION |
|                                                                                                                                                                                                             |

| B. Sobota, Š. Korečko, M. Hudák and M. Sivý                  |   |
|--------------------------------------------------------------|---|
| COLLABORATIVE VIRTUAL REALITY USAGE IN EDUCATIONAL AND       |   |
| FRAINING PROCESS                                             | 2 |
| M. Lazić, M. Kovačević, N. Tasić and M. Pardanjac            |   |
| METHODOLOGY FOR EXTERNAL QUALITY CONTROL OF HIGHER EDUCATION |   |
| NSTITUTIONS                                                  | 3 |
| S. Danilov, V. Makitan and M. Sisak                          |   |
| AN OVERVIEW OF THE MOST INFLUENTIAL ON-LINE MEDIA            | 3 |
| Γ. Zorić, V. Makitan and E. Brtka                            |   |
| T PROJECTS SUCCESS FACTORS                                   | 9 |

## Parallel Programming with CUDA and MPI

D. Bikov<sup>\*</sup>, M. Pashinska <sup>\*\*</sup> and N. Stojkovikj<sup>\*</sup>

\* Faculty of computer science, "Goce Delcev" University, Stip, Republic of Macedonia

\*\* Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Veliko Tarnovo, Bulgaria dusan.bikov@ugd.edu.mk, mariqpashinska@math.bas.bg, natasa.stojkovik@ugd.edu.mk

Abstract – Nowadays parallel programming is stepping up on the big door and slowly surpasses the traditional sequential programming model. The main idea here is to show how it can combine two parallel programming models in order to use effectively all available computing resources. Parallel programming models of interest here are MPI and CUDA. By combining both models we could use potentially all available processing resources.

#### I. INTRODUCTION

Modern computer systems consist of more and more parallel processing resources which is due to the latest achievements in computing technologies. On the other hand, to be able to use these parallel processing capabilities we need to imply a suitable programming model. Nowadays computing systems combine two main computation resources with parallel pronounced significantly computing capabilities. The main building component of every computer system is Central Processing Unit (CPU) which is low latency oriented while the other significant component represents Graphic Processing Units (GPU) which is throughput oriented and owns massive parallel processing capability.

Trends in computing technologies will inevitably introduce high education Computer Science subjects that cover parallel computing and programming. The strength of the CPU is in the efficient low latencyoriented design. They contain few cores and can handle few threads at a time. On the other hand, the recent GPU have high throughput-oriented design and are composed from thousands of cores that can execution of thousands handle of threads simultaneously. Combining MPI (Message Passing Interface) and CUDA (Compute Unified Device Architecture) programming models allows utilization of all available computation resources. Through the short guide we will show how it is possible to achieve that. Building the application which will combine those two programming models could ensure utilization of the whole computation

resources available in one computer system. Knowledge about this issue only can help the students easily to step forward and to acquire new skills and learn new and highly demanded modern technologies. Modern multicore computer systems brought parallel computing to wide use generalpurpose PC, embedded system, game consoles, smart phones, smart TV etc.

Because of the educational purpose of this paper we will focus on giving step by step guidelines for combining MPI and CUDA programming models by exploiting initial programming examples. Recommended prerequisites that are needed before starting to write combined MPI and CUDA programs are a knowledge of C/C++, knowledge of computer architectures and operating systems. It is not necessary to have knowledge of computer graphics or parallel programming. There are many factors for writing efficient programs that will help exploiting all computation resources. Here we will cover some basics without going deeper or showing optimizations techniques for writing efficient parallel programming code.

Growing computing application demand naturally leads to change from traditional sequential computation to more promising parallel computing. To fill this gap there is a need for massive parallel processing capability units. Latest GPU have a highly parallel structure that makes them more effective for algorithms where processing of large blocks of data is done in parallel [1] [2]. With the passing of time GPU have surpassed the CPU in many ways. Improvements of the GPU go side by side with a growing range of applications from the traditional computation, signal processing, irregular computations to Machine Learning, Deep Learning, AI, Computer Vision, Supercomputing and more. It is very important for students to get familiar with GPU technologies. However, combining programming models that ensures usage of the whole available computation resources is of a completely different magnitude.

Various approaches and techniques exist for combining CPU and GPU resources. Most of the approaches are intended for computer cluster

<sup>&</sup>lt;sup>1</sup>The research of the first author was supported by Bulgarian Science Fund under Contract DN-02-2/13.12.2016.

<sup>&</sup>lt;sup>2</sup>The research of the second author was supported, in part, by the Bulgarian Ministry of Education and Science by Grant No. DO1-221/03.12.2018 for NCHDC, a part of the Bulgarian National Roadmap on RIs.

systems while others are for writing programs heterogeneous platforms. intended for These approaches be separated into specific can programming languages, API (Application Programming Interface) and frameworks [3]. There are also techniques for reclaiming lost performance and inefficient resources use. Some of them examine GPU utilization of individual kernels and design algorithmic techniques for maximizing resource utilization. Our intentions are to show a general method for utilizing simultaneously CPU and GPU resources and to achieve this MPI and CUDA programming models are used.

The paper is organized as follows. The general principles of MPI and CUDA programming models are given in section 2. Section 3 is devoted to describing the possibility for combining both programming models in a single program. Initial programming examples and compile instructions are presented in section 4. A few conclusion sentences are given in the end.

#### II. CPU AND GPU COMPUTING MODEL

One of the main differences between CPU and GPU computing models is how they execute tasks. The CPU is optimized for sequential execution while the GPU can execute thousands of tasks simultaneously.

#### A. CPU computing with MPI

MPI is a portable message-passing standard API designed by a group of researchers from academia and industry and is intended for a wide variety of parallel architectures. It is a standard for data communication via messages between distributed processes and is often used in HPC (High Performance Computing) for building scale applications on computer clusters. There are several well tested and efficient implementations of MPI that are fully compatible with CUDA, CUDA Fortran, and OpenACC designed for parallel computing. There are several CUDA-aware MPI open source and commercial implementations and some of them are MVAPICH2, OpenMPI, CRAY MPI, IBM Platform MPI, SGI MPI. There are a bunch of reasons for writing MPI and CUDA combined parallel programming code. Depending on the hardware or the problem that needs to be solved, the reasons for using these parallel programming approaches may vary. This approach can be applied if there is a problem with very large data size that needs to fit in the memory of a single GPU. Another reason is enabling multi-GPU applications to scale across multiple nodes. Our reason is accelerating an existing sequential application in order to achieve

more efficient use of all available computing resources.

MPI standard defines syntax and semantics of library routines used for writing a wide range of portable message passing programs in C, C++, and Fortran. Other languages can also interface with such libraries. Parallel programs that use MPI can consist of separate processes, each with its own address space in which it is run. Each MPI process has its own rank and for the duration of the program execution there are fixed number of ranks executing the same program. It facilitates the use of SPMD (Single Program, Multiple Data) [4] programming model but it is not required, there are MPI implementations that allow multiple, different, executables to be started in the same MPI job. MPI process rank runs on a different core and own private memory and executes instructions at its own rate. The ranks can copy or move data between private memories via a shared interconnection. The communication can be performed by point-to-point (send/receive) which is communication between two processes, or by collective communication among the group of processes. In general, all ranks perform the same activity - compute or communicate at the same time. It should be noted that ranks workloads are not well balanced. It is important to understand message passing. They are like email with a destination and message body, which can be empty. Communication is bidirectional and requires explicit sender and receiver participation. The messages provide two services as memory to memory copy across address spaces and 2-sided handshake synchronization. If multiple messages are sent to the same destination from the same rank, then the messages will be received in the same order. But if different ranks send messages to the same destination, the order of receipt is not defined across sources. For writing message passing programs a library called MPI is used. There are a few releases of this library and the first one MPI-1 is from 1994. The first release contains 125 routines and there are more than 430 routines in MPI-3. There are at least six routines needed for the most MPI programs: start, end, query MPI execution state, point-to-point message passing. The library has additional tools for launching the MPI program (mpirun) and daemon which moves the data across the network.

#### B. GPU computing with CUDA

CUDA is powerful parallel computing platform created by Nvidia and it allows software developers to use a CUDA-enabled GPU [5] for general purpose computing. This platform allows developers to directly interact with the GPU resources and harness their power for writing efficient parallel programs. The GPU programs contain two parts, there is a

control (sequential) part that is executed by a single CPU thread and there is a parallel GPU executed part that runs thousands threads in parallel on as many cores as possible at each moment. The CUDA platform is designed to work with programming languages such as C, C++ and Fortran. It also supports other computational interfaces such as OpenCL, OpenGL, C++ AMP, and the third-party wrappers are available for Python, Julia, MATLAB, etc. Through the years Nvidia developed different micro-architecture for the various GPU. Depending on the microarchitecture generally Nvidia GPU are organized in SM (Streaming Multiprocessor) with a set of registers cache for constants, texture cache, shared memory (L1 cache) and global memory. Each SM consists of a number of SP (Streaming Processor), and SFU (Special Function Unit) used for transcendental functions. Common name for SP is CUDA core. The SP contains several ALU (Arithmetic Logic Unit) and FPU (Floating Point Unit). Execution model used by the SM is SIMT (Single-Instruction Multiple-Threads) [6] which is similar to the SIMD (Single Instruction Multiple Data) by Flynn's taxonomy [7] of computer architectures classification. The communication between SM is performed through global memory.

CUDA C is essentially a C/C++ programming language with extensions that allow executing of parallel functions on GPU. The CUDA source code consists of a mixture of conventional C/C++ host code and GPU device functions. There is CUDA C compiler *nvcc* that separates the parallel (device) functions from the code. According to this on the top level of the CUDA application there is a master process that runs on the CPU. This process is responsible for data flow between main memory and GPU memory. This process performs several tasks such as GPU initialization, allocation of main and GPU memory, moving data between main to GPU



Figure 1. Example of processing flow and execution strategy model

memory, launching of kernels (functions) on the GPU, fetching back processed data, deallocation of the memory and termination.

#### III. BASIC PARALLEL PROGRAMMING STRATEGIES

The common problem in parallel programming is balancing of the computational load among a set of parallel processing resources. It is especially to use the appropriate important parallel programming strategy. The choice of suitable parallel programming strategy highly depends on the problem itself. In this section two widely accepted parallel programming strategies will be presented. These programming strategies are suitable for task parallel programs with no communication between tasks. It is possible to have communications between the tasks. However, it is recommended to be infrequent to reduce negative consequences on efficiency. The two typical strategies will be explained here. The structure of the programs is simple and has several MPI processes that operate with the same GPU. If there are multiple GPU, the MPI process can handle all of them. It is widely known that this programming structure introduces contest switch overheads. The MPI process is handling the main memory while CUDA kernels update the GPU memory.

In order to explain the building of a combined MPI and CUDA program step by step it is important to introduce the parallel programming strategies. On Fig. 1 is shown the general structure of processing flow together with the execution of the strategies.

#### A. Basic Parallel Strategy Model

The Basic Parallel Strategy Model is the first strategy shown on Fig.1 and is marked by 1. As it is implied by the name, this strategy has typical basic characteristics of a simple bound MPI and CUDA program. This strategy presents a simple solution for building MPI and CUDA programs. As it can be seen from Fig 1. all MPI processes run simultaneously and can start CUDA kernel function on the GPU. From Fig. 1 it can be noticed that there is circular execution on MPI - CUDA threads. It is important to have balanced work distribution through the MPI – CUDA threads. Balanced work distribution ensures efficient computation resource use and better computing performance. As already mentioned, there is a master process that runs on the CPU which is responsible for initialization, allocation and computation performing on GPU. In this case there are few MPI process instances that run on the CPU and they are responsible to ensure performance of all essential tasks simultaneously and independently. Because of the use of the same hardware resources there is an introduction of latency, resource contest, waiting, overheads and slower bandwidth. With the increase of the number of processes the negative effects get more pronounced. It is important to have balance between the number of processes and the scale of the problem. The efficiency of the parallel computation is directly connected to available hardware resources.

#### B. Master-Workers Parallel Strategy Model

Second strategy that will be described is the Master-Workers Parallel Strategy Model shown on Fig. 1 marked by 2. The effective solution by automatic dynamic load balancing is to define the single master process to manage collection of the tasks and collect the results. Then the set of process workers grab a task, compute the task, send the results back to the master and then grab the next task. This action proceeds until completion of all the tasks. The master process in this case is one MPI process that schedules computational tasks to other MPI worker processes. The worker process is behaving as a top-level process that maintains a CUDA instance. Any worker task is recommended to be with equal computing demand. This way it ensures efficiency and better computing performances. In this case the worker MPI processes run on the CPU and they are responsible for performing all essential tasks simultaneously and independently. The same hardware resources are used as in the first strategy, but there is also an extra master process. Because of this there are negative consequences as input query, task waiting time, and not equal distribution of tasks between the workers. The negative consequences are more expressed with the increase of the number of the processes. There needs to be a balance between hardware resources, process number and the scale of the computing

 TABLE I.
 DESCRIPTION OF THE TEST PLATFORM

| Environment                  | Platform 1              | Platform 2                     |
|------------------------------|-------------------------|--------------------------------|
| CPU                          | Intel i7-8565U, 1.80GHz | Intel Xeon E5-2640,<br>2.50GHz |
| Memory                       | 8 GB DDR4 2400 MHz      | 48 GB DDR3 1333<br>MHz         |
| GPU                          | GeForce MX150           | Nvidia TITAN X<br>(Pascal)     |
| OS                           | Ubuntu 16.04 LTS 64-bit | Ubuntu 18.04.3<br>LTS 64-bit   |
| Compiler                     | gcc 5.4.0               | gcc 7.4.0                      |
| CUDA<br>compilation<br>tools | 10.1                    | 9.1                            |
| GPU Driver                   | V416.56                 | V430.50                        |
| MPI                          | (Open MPI) 1.10.2       | (Open MPI) 3.3a2               |

problem.

#### IV. PARALLEL PROGRAM EXAMPLES

This section presents parallel program examples. The platforms that are used for testing the examples are described in Table 1. Platform 1, a graphic card NVIDIA GeForce 150MX [8], has 384 cores running at 1.53 GHz and 48 (GB/sec) memory bandwidth. Platform 2, a graphic card NVIDIA TITAN X [9], has 3584 cores running at 1.41 GHz and 480 (GB/sec) memory bandwidth.

CUDA-aware OpenMPI implementation is used for building a single MPI + CUDA program on Ubuntu OS. Open MPI can handle multiple GPU cards but in our case, there is only one GPU. The GPU card is utilized and shared between several MPI processes. Tests are performed on two different classes of parallel processing capability hardware (Table I). It is important computing hardware and software to be compatible, update and properly configured.

#### A. Examples

The examples represent simple MPI and CUDA programs that are built according to the parallel strategies presented in the previous section. Here the examples represent the basic program skeleton structure of the explained strategies. One way to build a single MPI+CUDA program is to put both code MPI and CUDA code in a single file. This program can be compiled using nvcc, which internally uses gcc/g++ to compile your C/C++ code, and linked to your MPI library. Another way is to have MPI and CUDA code separate in two files, *main.c* and *example.cu* respectively.

The code from the first example is according to the Basic Parallel Strategy Model and contains the mentioned two files *main.c* and *examples.cu*. The *main.c*, containing the call to CUDA file, would look like:

```
#include <mpi.h>
#include <stdio.h>
//Function declaration
void call kernel(...);
int main(int argc, char *argv[]) {
//variable declarations
int myrank, nProcs;
//Allocate memory
/* Initialize the
                     MPI
                           execution
environment. */
MPI Init(argc, argv);
/* Get the number of MPI processes
and the rank of this process. */
MPI Comm rank (MPI COMM WORLD, &myRan
k);
```

MPI\_Comm\_size(MPI\_COMM\_WORLD,&nProc
s);
/\* Call function 'call\_kernel()'
from CUDA file example.cu \*/
call\_kernel(...);
// Terminates MPI execution
environment
MPI\_Finalize();
//Free memory
...
}

In *example.cu*, the *call\_kernel()* function is defined with the 'extern' keyword to make it accessible from *main.c*. The code it would look like:

```
#include <cuda.h>
#include <cuda_runtime.h>
#include <stdio.h>
//CUDA kernel
__global__ void __kernel__(...)
{
    ... // Do some work
}
extern "C" void call_kernel(...)
{
    //Load CPU data into GPU buffers
    __kernel__<<<BlocksPerGrid,
ThreadsPerBlock >>>(...);
//Transfer data from GPU to CPU
//Free device memory
}
```

The second example is based on the Master-Workers Parallel Strategy Model. It also consists of two files, *main.c* and *example.cu*. The *main.c*, containing the call to CUDA file, would look like:

```
#include <mpi.h>
#include <stdio.h>
#define ROOT 0
//message to processors
#define END MSG 1
#define GET TASK 2
#define NEW TASK 3
#define NO TASK 4
#define DONE 5
//Function declaration
void call kernel(...);
int main(int argc, char **argv) {
// Variable declarations
int myRank, //id of
                           current
processor
   nprocs; //number of processors
int rc; // result
                        from MPI
operation
```

int dummy = 1; // fictive variable . . . //Allocate memory . . . /\* Initialize the MPI execution environment\*/ MPI Init(&argc, &argv);  $/\star$  Get the number of MPI processes and the id rank of this process. \*/ MPI Comm size (MPI COMM WORLD, &nProc s); MPI Comm rank (MPI COMM WORLD, & myRan k); // Status of a reception operation MPI Status status; if(myRank == 0) // Root { int hasWork = 15; // number of works while(hasWork >= 0) { /\* Wait message (dummy) from free processor (MPI ANY SOURCE) \*/ rc= MPI Recv(&dummy, 1, MPI INT, MPI ANY SOURCE, GET TASK, MPI COMM WORLD, &status); // Master send work rc = MPI Send(&hasWork, 1, MPI\_INT, status.MPI SOURCE, NEW TASK, MPI COMM WORLD); hasWork--; //Decrement work variable } /\* When while quit this means that no more works and Root send message DONE to all free processors \*/ int k; for (k = 0; k < nprocs-1; k++) { dummy = -100;rc = MPI Recv(&dummy, 1, MPI INT, MPI ANY SOURCE, GET TASK, MPI COMM WORLD, &status); rc = MPI Send(&dummy, 1, MPI INT, status.MPI\_SOURCE, DONE, MPI COMM WORLD); } } if(myRank ! = 0) / /all other processors { //Must have end case, else not to work while(1) { int work, other; dummy = myRank;

```
/* send to Root (0), that I am
free, and want some work */
rc = MPI Send(&dummy, 1, MPI INT,
0, GET_TASK, MPI COMM WORLD);
//receive work from Root (int work)
rc = MPI Recv(&work, 1, MPI INT, 0,
MPI ANY TAG, MPI COMM WORLD,
&status);
// quit while
if (status.MPI TAG == DONE) {
break;
}
// Do some work from process
if(status.MPI TAG == NEW TASK) {
/*
   Call function 'call kernel()'
from CUDA file example.cu */
call kernel(...);
} } }
// Terminates MPI execution
environment
MPI Finalize();
}
```

The *example.cu*, contains the same code as Basic Parallel Strategy Model where *call\_kernel()* is defined with the *'extern'* keyword to make it accessible from *main.c*.

#### B. Compile procedure

The described below way of building MPI+CUDA programs contains two files. These two files can be compiled using *mpicc*, and *nvcc* respectively into object files (.o) and combined into a single executable file using *mpicc*. Using *mpicc* means that the CUDA library must be linked.

The program can be build/executed in Debug and Release mode. Making object files, linking, building, and executing the programs were performed by using Ubuntu terminal and suitable commands. The compile instructions for build and execute in Debug mode, would look like:

mpicc -c main.c -o main.o
nvcc -c example.cu -o example.o
mpicc main.o example.o -lcudart -L
/usr/local/cuda-10.0/lib64/ -lstdc++ -o
mpicuda

It is necessary to carefully link the CUDA library. In the example above is shown linking the CUDA library executed on Platform 1 (see Table I). Compile instruction that look like:

mpiexec -n <numprocs> <program>

is one way to start <program> with the name of the program with an initial MPI\_COMM\_WORLD whose group contains <numprocs> number of processes. Example for request two processes to test the program would look like:

mpiexec -np 2 ./mpicuda

The compile instructions with additional optimization and customize options [10] [11] [12] for build and execute in Release mode, would look like:

```
mpicc -03 -Wall -c -fmessage-length=0 -
MMD -MP -MF"main.d" -MT"main.d" -o
"main.o" "main.c"
nvcc -03 --compile --relocatable-
device-code=false -gencode
arch=compute_50,code=compute_50 -
gencode arch=compute_50,code=sm_50 -
D_FORCE_INLINES -x cu -o "example.o"
"example.cu"
mpicc main.o example.o -lcudart -L
/usr/local/cuda-10.1/lib64/ -lstdc++ -o
mpicuda
```

The compile instruction for executing the by request two processes to test the program would look like:

#### mpiexec -np 2 ./mpicuda

If there is a cluster with multiple CPU and GPU it can call different execution configuration. For example, it can request two processes and two GPUs to test the program by using PBS (Portable Batch System) script. Another way to execute the program on more than one GPU is expressly use of CUDA call *cudaSetDevice(number)* to set the current GPU, where *number* is GPU card ID (identifier).

#### V. CONCLUSIONS

In this paper, a short guideline is given for combining the two parallel programming approaches of MPI and CUDA. There is brief clarification of both parallel programming models. Through the two parallel programming strategies our intention is to get parallel programming closer to the students. In the last section the program skeleton of an example is shown. This example can be a starting point for building complex program structure. In the end of the section is shown how to compile and run the combined MPI and CUDA program.

#### ACKNOWLEDGMENTS

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research.

#### References

[1] Kirk, D. B., Wen-mei W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Elsevier, 2013.

- [2] Kurzak, J., D. A. Bader, J. Dongarra. Scientific Computing with Multicore and Accelerators. CRC Press, 2010.
- [3] Mittal, S. and Vetter, J.S.: A survey of CPU-GPU heterogeneous computing techniques. ACM Computing Surveys (CSUR), 47(4), pp.1–35, 2015.
- [4] Quinn Michael J.: Parallel Programming in C with MPI and OpenMP. 1st ed. McGraw-Hill Inc. 2004.
- [5] Nvidia CUDA Home Page, https://developer.nvidia.com/cudazone. Last accessed 16 August 2020
- [6] Lindholm E., J. Nickolls, S. Oberman, J. Montrym. NVIDIA Tesla: A Unified Graphics and Computing Architecture, IEEE Micro, Vol. 28, Issue 2, 2008.
- [7] Flynn M., Flynn's Taxonomy. In: Padua D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA., pp. 689-697, 2011.

- [8] NVIDIA GeForce 150MX Specification, https://www.geforce.com/hardware/notebook-gpus/geforcemx150. Last accessed 14 April 2020
- [9] NVIDIA GeForce TITAN X Specification, https://www.nvidia.com/en-us/geforce/products/10series/titan-xpascal/. Last accessed 16 August 2020
- [10] NVCC :: CUDA Toolkit Documentation, https://docs.nvidia.com/ cuda/cuda-compiler-driver-nvcc/index.html. Last accessed 12 September 2020
- [11] GCC Command Options, https://gcc.gnu.org/onlinedocs/gcc-3.1.1/gcc/Invoking-GCC.html. Last accessed 12 September 2020
- [12] FAQ: Compiling MPI applications, https://www.openmpi.org/faq/?category=mpi-apps. Last accessed 12 September 2020