Some months ago, SemiAnalysis published a flashy article with the premise that organizations with GPUs in the magnitude of tens of thousands had so many resources that the rest of the startups and researchers with few GPUs were wasting their time doing things such as local fine-tuning and over-quantization. According to them, the GPU Poor were not focusing on useful stuff.
First of all, I am, proudly, GPU Poor (I have a 3080/12GB GPU and do many things in free Colab). And I couldn’t be prouder of what the ecosystem has done this year. We’re in a world in which TheBloke quantizes models at the accelerating speed of the model releases; a world where the Tekniums, local llamas, and aligners and unaligners will fine-tune the models before they are even announced; a world in which Tim Dettmers enables us to do 4-bit fine-tuning. These are exciting days!
Yes, most of the community uses the nice Llama, but guess what? We also have options. Microsoft dropped Phi - a 3B model I can run in my browser without sending anything to a server. Mistral unleashed Mixtral, a MoE with the same quality as the largest version of Llama, and running much faster. And we also have Qwen, Yi, Falcon, Deci, Starling, InternML, MPT, and StableLM, plus all their fine tunes and weird merges.
This year is the one in which we got tools such as LM Studio and Candle to run the models on-device, not sending any data to external servers. While the GPU Rich focused on somewhat similar user experiences (chatbots, LLM, maybe add some image or audio input here and then), the community can transcribe 2.5 hours of audio in less than 98 seconds, do image generation in real-time, and even video understanding, all running in our good ol’ potatoes.
While the Turbo GPU Rich spent weeks preparing their release and waiting to get those L8+ approvals, the tinkerers’ communities of all kinds of disciplines, from artists to healthcare specialists, were combining open-source tools to generate music from images, figuring out how to enable fast loading of dozens of LoRAs models, or achieving sub-1-bit quantization.
Don’t get me wrong. We greatly appreciate and love the amazing efforts of the GPU Rich that are releasing in the open their work and sharing with the community. We genuinely want them to succeed in their open and collaborative paths. But to imply that the GPU poor have no moat and are not contributing or doing something useful is naive.
The efforts of the GPU Poor and Middle Class are closing the access gap, making high-quality models more accessible than ever to people from different backgrounds, pushing open science forward, and taking hardware to its limits.
This was an exciting year for open-source, and we have a wide variety of labs and companies doing open work, GPU Poor, Middle Class, and Rich, all contributing in their own meaningful ways. Shoutouts to Kyutai, Answer.ai, 01.ai, BigCode, Mistral, Stability, Alibaba, Meta, and Microsoft. This year, we also got Nous Research, Skunkworks AI, Alignment Lab, Open Assistant, WizardLM, and so many other amazing communities.
So here we are, closing the year with an average of 3 new SOTA models daily, tackling all kinds of modalities, running models as powerful as GPT 3.5 in our computers, exploring AI feedback, building a thriving ecosystem of tools, and more. How can’t I be excited for next year?
What's on the wishlist for next year? More collaboration, transparency, and sharing. The vibrant GPU Poor ecosystem, where needs lead to novel research in asynchronous Discord servers and pushing the boundaries of libraries and hardware alike. The GPU Rich sharing research that can only be done at a huge scale and open-sourcing some of their models with licenses that will foster adoption and community. The bridging GPU Middle Class in direct touch with the Poor, understanding the masses’ needs and training high-quality models under intense constraints.
The GPU Poor strike back! Vive la révolution Open Source!
Image from Harrison Kinsley (Sentdex)
Good stuff