Paper Title
An Optimized Parallel Algorithm for Hamiltonian Cycles Searching

Abstract
Modern GPUs (Graphics processing units) are being used for general purpose parallel computation. Parallel algorithms can be developed for GPUs using different computing architectures like CUDA (compute unified device architecture).Finding Hamiltonian cycle’s is an NP-complete problem. Specifically, given a connected, undirected graph, a Hamiltonian Circuit is a path that starts at a given vertex, visits each vertex in the graph exactly once, and ends at the starting vertex. We are using a branch and bound algorithm to find Hamiltonian Circuits in a graph. The primary contribution of this paper is to present a fast parallel implementation of this algorithm on a GPU and to provide data structures for optimal parallel computation of this algorithm. This implementation uses GPU to generate and check nodes used for searching the state space tree containing solution to the given instance. We are using independent queues for searching the given state space tree. Each SMX will use a separate Queue to process nodes, this method helps in parallelizing the Branch and Bound algorithm without the need for synchronization between SMX’s and without the need of a CPU for generating nodes. The parallel algorithm executes up to 25 times faster than the CPU based implementation depending on the instance of the problem. Keywords - GPU, Hamiltonian Cycles, Queues, Branch & Bound