Why Chasing Trillion Parameter Models Is Missing the Point

We built the world’s most powerful engine, for what?

After 18 months of personal experimentation with AI, I am seeing an interesting trend that I’m sure many of you are noticing. Watching AI progress from an overpowered search engine with outdated data, to massive thinking system, with enormous amounts of a computing power looks like a supercar on an empty highway. All that horsepower and no one around to slow it down is exciting in itself but I have a question.

What are we doing with all the power?

Models like Kimi2 with 1.2 trillion parameters are incredible, I guess, but what is anyone doing with all the power. When the first AI models came out, they had much fewer parameters but the training data was more organic, at least in my opinion. Now you have models trained on data that might have been created by other AI models, cough, Social Media, where the data is now poisoned and offering no benefit to the AI models, except for bragging rights and ego stuffing.

Up until recently, most businesses were using frontier models like ChatGPT and Claude for drafting emails and chats where you could have used a search engine. These models, were pretty capable, even in their original releases, for writing emails and chatting but is this all we should be seeing for millions or billions of dollars in compute and other resources?

With recent AI advancements like Claude Code, there are now tools that can certainly make AI models much more useful. Instead of writing an email or chatting about who makes the best coffee, these new tools can now write code, test code and even create full, ready for production, products in sometimes days or even hours where it would take a developer weeks, months or longer just a short time ago. But the majority of AI use, and all the resources required, are doing the same things as before by drafting emails, chatting and even having AI write full articles that are less than accurate, the majority of the time.

Capability overhang

This is what’s called capability overhang, where there is a growing gap between what AI models can do and what we’re actually doing with them. We have trillion parameter models being used like glorified autocomplete. Instead of figuring out how to use existing capability better, the industry keeps building bigger models. Meanwhile, the people who would discover novel applications, homelabbers like me, are being priced out of experimentation and innovation.

Who’s winning and who’s losing?

The introduction of AI came with some incredible opportunities for the home labbers. Never before could you have an AI model sitting on a computer at home. If you are a PC gamer and have a newer, decent GPU, you could run AI, although much smaller models, in the safety of your own home. Private, secure and pretty good at chatting and drafting emails too.

Those opportunities are suddenly moving out of reach for those wanting local, private AI. Nvidia has moved all their focus to the enterprise, RAM vendors are now ignoring consumers and if you don’t already have a decent GPU or are willing to spend over two times suggested retail price, you are out of luck.

Waste not, want not

Spending hundreds of millions or billions of dollars on computation training makes for interesting headlines and huge return on investment for CEO’s and investors with money to invest but it adds very little to AI advancement.

Here is my point and the reason behind this rant, over the last 20 months or so, I wanted to have my own local AI. I tested a lot of models, some different hardware, tuned default prompts to make the small models better and what I found is, I can do at least 80% of what I need with my local AI that both Claude and ChatGPT can do.

I can hear all of you calling bullshit but let me tell my story, I often take the questions and results that I ask my local and give them to both Claude and ChatGPT. For most of my coding or troubleshooting and even edge topics, Claude and ChatGPT both tell me that the results are punching way higher than the size of the models that I am using.

Here is something interesting, I am using models between 14 billion and 32 billion parameters, on a GPU that uses a maximum of 180 watts of power all in the comfort of my little homelab. I already have systems in place, like an n8n workflow, where it runs agentic tasks, monitoring my Zigbee devices battery levels in my smart home, which gives me useful information and saves me the time of having to check these entities one by one or have a dedicated dashboard just for batteries.

I know this is small scale but if I can build these workflows, using small models, with little overall power usage, why on earth do we need to spend billions on models with hundreds of billions of parameters to do the same thing? I understand, there has to be scale for hundreds of millions of users using the frontier models but does the world need these massive models if the same results can be had on smaller models, using exponentially less resources?

The alternative is?

If I can take a bunch of mixed up computer parts and create a local AI that gives me 80 percent of frontier performance, why can’t AI experts use smaller, much less costly and better utilized models to give users the same results.

If I had to put a dollar amount on what it cost me for hardware to build my local AI, I would guess it was about $1200 to $1500. This includes the $500 Nvidia 5060ti 16GB, before all the craziness started, an older Minisforum mini-pc and an Oculink dock. I am not including the hours I spent building and tuning because that was my choice but I have over a couple hundred hours in it since I started in June of 2024.

For me, this current situation or should I use the word constraint is making me optimize what I have which includes smaller models and less hardware but utilize them better. I am trying to get everything I can out of what I have available and feel that I am doing pretty darn good.

Wake up AI industry!

The AI industry needs to take what they built and optimize it instead of chasing bigger models with more parameters. Everyone knows that it costs less to run smaller models in compute and power usage and that smaller models are much faster and easier to scale but if they know that, why aren’t they doing it?

The industry is hoping that the capability overhang will continue to widen the gap and force home labbers to become API consumers but little do they know, home labbers are pretty smart and they will find a way to take these small open source models and get every last bit of performance out of the smaller models.

Take away the home labbers ability to buy hardware and they will take what they have available and make it perform near or better than frontier models.

Look at what is happening with Microsoft Windows. They forced people into throwing away perfectly good hardware, that is still easily usable for many more years, by adding limitations and selling them as innovation and security. Little do they know, they forced the home lab community to build even better desktop versions of Linux that not only run on the same hardware, that Microsoft says is bad, but made it perform faster and better on hardware that is supposedly obsolete.

Wake up AI industry and get back on the original path of creating an AI that will change the world instead of chasing bigger models, puffed up egos and bigger investor pocket books.

Why Chasing Trillion Parameter Models Is Missing the Point

Leave a Comment Cancel reply

TechManagerPlanet.com