Why tech giants should post their trained models online
By Melanie Ehrenkranz
We know that training powerful machine learning models takes a lot of time and massive computing power—and these things inherently cost a lot of money and take a toll on our planet. For computer science students interested in learning how to build off these models and progressing innovation in these spaces, the time and capital required are demoralizing roadblocks.
But if companies like Facebook and Google publish their pretrained models, the path toward developing their own deep learning models becomes pretty clear.
Daniel Larremore, an assistant professor in the Department of Computer Science at the University of Colorado Boulder, says his students are really excited about training machine learning models like the ones they read about in papers, and want to train their own. But that requires a lot of time on massive computers or cloud computing platforms.
“What I find is that a lot of students run out of credits before their models actually finish training,” Larremore said. “One thing that we’re coming into contact more and more in my classes are pretrained models.”
Using a pretrained model means someone doesn’t have to train it from scratch—which accounts for a bulk of the computing power and data. Instead, for instance, Larremore’s students can build off of these models, saving them time and resources that they might not have.
Larremore cites NVIDIA as an example of a tech company that shares their manuscripts, papers, code, and trained models online.
“So you and I can download some image processing neural net and just use it without ever having to train it, which is kind of cool,” Larremore says. “The prepacking of trained models means the cost of training only has to happen once.”
Open-sourcing trained models benefits student researchers that don’t have the time and power supply to begin from square one. These are the next generation of machine learning experts and data scientists. Why not give them the tools they need to join a tech company with an intimate familiarity and insight into deep learning models.
As Dipam Vasani, a self-taught deep learning practitioner, wrote…
it’s not always necessary to train a model on the basics—like learning how to identify a straight or slanted line. It’s the more intricate learnings, ones specific to someone’s project , that you can build off of that pretrained data. Allowing people to develop from existing work expedites innovation.
Katharina Kann, an assistant professor of Computer Science at University of Colorado Boulder, described a hypothetical in which someone might be exploring whether certain low-level information can be integrated into a deep learning model.
Kann cites things about part-of-speech, sentence structure or morphology as examples of these low-level data. Using a smaller model as a proxy might not work, because the results don’t speak to what would happen if this information was integrated into a larger model.
“So basically, without models being available, students might not be able to investigate the original research question in a meaningful way,” Kann said, referring to open-source training models of a larger scale. And that’s just one example of the advantages of transparency and access to this data.
Loka, Inc spoke with Kann in more detail on why saving time, money, and electricity are just a few reasons why companies with teeming resources should post their trained machine learning models online.
Kann described how this access benefits researchers not affiliated with the big tech companies while pleasing the researchers within these companies and potentially leveling up their brand integrity. Below is a condensed version of our conversation.
Loka: Why should tech giants such as Facebook, Google, and Amazon post their trained machine learning models online?
Kann: The publication of models makes it possible to independently verify claims made about the models by companies without having to invest a lot of resources to reproduce the models first.
The fact that not everyone needs to first reproduce models to build on them speeds up progress in the field. It also saves compute, and this is beneficial for the environment.
The publication of trained machine learning models makes research easier—in some cases even just possible!—for researchers working with limited computing resources. For the companies, there are additional benefits:
(1) Free publicity, which makes recruitment easier and in general increases the company’s reputation. (2) Increased happiness of their (researcher) employees, since researchers generally like to publish, even when they decide to work for a company. (3) To some extent, it increases the trustworthiness of companies, since their results are reproducible.
The last three points are especially important whenever other scandals make companies unattractive to potential applicants, e.g., to graduate students; such as the recent discussions around the firing of researcher Timnit Gebru from Google.
Now there could be reasons for companies to not publish their models. For instance:
(1) Privacy concerns. It has been shown that it’s possible to reconstruct the training data of machine learning models to some extent as soon as one has access to the models. This could lead to serious problems for companies. (2) Companies could keep a monopoly on their models if this gives them advantages, e.g., for their products, or in terms of the research they can publish, but nobody else can afford. However, I don’t think these are very good arguments.
Loka: How does this open-source access to information benefit student researchers?
Kann: I would say that there is a general tendency for this to be more helpful for researchers with fewer computing resources. At the very least, this enables researchers to easily compare to state-of-the-art models, such as the models which currently obtain the best results on given datasets.
Publishing of pretrained models makes it possible for researchers to develop and evaluate their proposed methods in realistic settings.
Loka: Have you or your students benefited from an open-source deep learning model?
Kann: I have published multiple papers about research projects that would have been extremely difficult or impossible without access to open-source deep learning models.
For example, what we do a lot in NLP is to take a large, publicly available model which has been pretrained on raw text data-for example, from Wikipedia-and then train it on task-specific data. The goal of this project was to find out if we can improve performance if we additionally train a pretrained model on labeled data not belonging to our task of interest, before doing the last training step on task-specific data.
I am fairly confident that we would not have conducted this study if the initial pretrained model hadn’t been available to us.
Loka: Are you seeing any trends when it comes to access to deep learning models online?
Kann: At least in NLP, companies generally make their most recent models available. The last famous example where this wasn’t the case was GPT-3. The last time I checked one needed to pay for access. However, something else that’s becoming relevant is that models are starting to be so big that even with them being available, many groups are not able to easily run them.
For instance, students in my group were unable to work with a model for machine translation, called , since it was just too huge to be finetuned on our GPUs. We have solved this problem now by getting better GPUs, but I expect this to more and more become a serious problem for many groups.
Loka: Thank you for your time, Katharina.
What’s clear is that access to pretrained models has a number of benefits for students, researchers, and even the massive companies that publish them.
The advantages for the little guys alone should motivate those with goodwill and a vested interest in a faster pace of progress in machine learning. And if it’s not, the brand equity in practicing such transparency is typically good for a tech company’s bottom line. While one can always find an argument to not do something, the arguments here aren’t a meaningful tradeoff given the gains.
These companies are also being tasked with-and taking on the mantle of-leading environmental initiatives to offset their carbon footprints. If these are more than merely lip service, then progressive practices like sharing trained models should be on the table for companies big and small.
Melanie Ehrenkranz is a writer with a focus on tech, culture, power, and the environment. She has been featured in Gizmodo, Vice’s Motherboard, Medium’s OneZero, National Geographic, and more. You can follow her work here.
Katharina Kann is an assistant professor of computer science at University of Colorado Boulder. Her research focuses on natural language processing and deep learning, and one of her goals is to make human language technologies available in as many languages as possible. Previously, she was a postdoc at New York University and a PhD student at LMU Munich.