Parallel Computing and Machine Learning for Binary Classification
Abstract
Machine learning is a powerful tool with many applications. However, within the machine learning pipeline, there are limitations due to computing resources. Parallel computing offers a solution to some of these limitations and provides great benefit to developers and organizations implementing machine learning in their workflows. The machine learning pipeline consists of several phases, each of which have different computation and resource requirements, and these requirements change with the application of the machine learning model. This paper seeks to elaborate on the application of parallel computing and machine learning to a binary classification problem, specifically the Kaggle “Titanic - Machine Learning from Disaster” competition.