Companies are increasingly data-driven–sensing market and environment data, and using analytics and machine learning to recognize complex patterns, detect changes, and make predictions that directly impact the bottom line. Data-driven companies use data science to manage and make sense of torrents of data.
Data science is part of every industry. Large companies from retail, financial, healthcare, and logistics leverage data science technologies to improve their competitiveness, responsiveness, and efficiency. Advertising companies use it to target ads more effectively. Mortgage companies use it to accurately forecast default risk for maximum returns. And retailers use it to streamline their supply chains. In fact, it was the availability of open-source, large-scale data analytics and machine learning software in mid-2000s like Hadoop, NumPy, scikitlearn, Pandas, and Spark that ignited this big data revolution.
Today, data science and machine learning have become the world’s largest compute segment. Modest improvements in the accuracy of predictive machine learning models can translate into billions to the bottom line. The training of predictive models is at the core of data science. In fact, the majority of IT budgets for data science are spent on building machine learning models, which includes data transformation, feature engineering, training, evaluating, and visualizing. To build the best models, data scientists need to train, evaluate, and retrain with lots of iterations. Today, these iterations take days, limiting how many can occur before deploying to production and impacting the quality of the final result.
It takes massive infrastructure to run analytics and machine learning across enterprises. Fortune 500 companies scale-out compute and invest in thousands of CPU servers to build massive data science clusters. CPU-scale out is no longer effective. While the world’s data doubles each year, CPU computing has hit a brick wall with the end of Moore’s law. GPUs have a massively parallel architecture consisting of thousands of small efficient cores designed for handling multiple tasks simultaneously. Similar to how scientific computing and deep learning have turned to NVIDIA GPU acceleration, data analytics, and machine learning will also benefit from GPU parallelization and acceleration.