A voyage — and invitation — into NVIDIA’s GPU-accelerated ecosystem
By Juan Medina, Team Lead in Data Science & ML at Loka.com
More and more disruptive companies are looking at data science to leverage large-scale datasets and generate precious insights. While data science is incredibly valuable to businesses, it can also be an arduous, costly process.
Building and training models can take months, and productionizing models can compound that time exponentially. This can drastically impact budget, speed to production, and time to market.
No matter the industry, startups and innovation arms of big brands want to train machine learning models at higher speeds. If they could, it would help them extract insights quicker, keep costs lower, and give them an instant advantage in the market.
As data scientists and machine learning engineers at Loka, we want to be ready to help our customers make that happen.
The exciting news: We are seeing that we can accelerate time to market at lower costs with GPUs.
Our RAPIDS Origin Story
Back in January, our team leader at Loka introduced us to NVIDIA’s new suite of open-source libraries that lets you execute end-to-end data science and analytics pipelines entirely on their GPUs. NVIDIA called it RAPIDS and it blew our minds.
The more we dug in, the more we were impressed with improvements in the computing times we could achieve using RAPIDS compared to CPU solutions. Not even CPU-based Apache Spark came close.
Then we thought, wouldn’t it be cool to speed up the processing time on our own internal projects? So we went ahead and applied RAPIDS to our own work. Specifically, on image augmentation.
Sadly, we found that RAPIDS was more suitable for structured data processing. But this was a turning point for us; this is when we started to dream big.
At Loka, we deeply value courageous innovation and constant curiosity. In that spirit, we pondered how we could augment those images using a GPU. If not RAPIDS, then what?
Our curiosity rabbit hole led us to OpenCV-a standard library for image processing – and we ended up finding a way to use this library on GPU. (Admittedly, it’s a bold move to undertake such a task in the absence of much information about it. Not to worry, we’ll share our findings with you! :D)
A few Zoom calls later, we decided to start building an ecosystem focused on NVIDIA GPU-acceleration for data science and machine learning.
The idea was to use RAPIDS for structured data, and OpenCV for image processing. The next step was to build a proper environment to be able to start using these libraries for our internal projects.
This exploration isn’t just for us – it’s a catalyst for a larger RAPIDS community
And this is the journey we want to share with you. We want to be transparent about our exploration into this tool and what our findings are. The speeds, the use cases, the insights, how RAPIDS can make data science and pulling valuable insights more feasible and accessible.
This post is a brief introduction to a series of articles where we will be showcasing our experiences to build an ecosystem focused on GPU-accelerated solutions for any data science enthusiasts.
On this quest toward more efficient computing, you’ll find benchmarks using RAPIDS, insights on applying it, implementations on GPU, and every other cool thing we are working on in this space.
We will share our findings (and failures) with you in near real-time, illustrating to you and everyone else what our data scientists are capable of doing with NVIDIA’s GPU-acceleration. What is next for AI, ML and humanity starts today, with you, with us, with our data scientists.
Upcoming posts from our journey through GPU acceleration:
- So how fast is RAPIDS: Benchmarking for NVIDIA GPU-accelerated functions to augment images
- Using RAPIDS to run an EDA about Sloan Digital Sky Survey — CPU vs GPU
- Minimizing costs by building a CI/CD pipeline on demand for GPU-accelerated libraries
We hope you join us on our journey. Stay tuned.
Originally published at https://loka.com.