Zero-Bullshit Take on LLM Optimization

…with zero promotions and AI-generated words

Published on 16 Aug 2025

-/- lines long

Lately, there's been a lot of noise around the web regarding "LLM optimization", so I want to start with a disclaimer:

I am not a LinkedIn "growth hacker", "hustler", "influencer", or any other type of internet bullshitter. I'm also not the founder of a startup or some AI SaaS crap that I'm about to sell you.

I am a nobody on the interent who wants to figure out how to do a decent job as a web developer. Naturally, this involves SEO. But now that people also use LLMs to answer their questions, it's only logical that we want to optimize for them as well.

However, unlike most people, I'm not going to claim that I have the golden formula to get you at the top of LLM answers because nobody fucking does. I'm just going to share my experience and conclusions, with actual proof behind them. Feel free to challenge them and I even encourage you to do so.

How do LLMs know stuff?

There are two ways:

Training data: everything that is "shipped" internally in the model. Content that is gathered from the internet in one way or another and embedded inside of it.
Realtime search: everything that is outside of what's embedded in the model (e.g. recent events) that the LLM has no other way of knowing.

So if you want to optimize for LLMs, you have to somehow make an impact on these two things.

Impacting training data

That's basically impossible to "optimize", at least in the regular meaning of the word. Sure, you could publish new content and it'll just get scraped and embedded in the model, right? In theory, yes, however:

You can't know when models get trained
You can't know when those models actually launch
You can't have a predictable feedback loop to improve

PROOF?

I asked GPT 5 with disabled search "Who is the 47th president of the United States?" and it confidently answered(opens in new tab) with a completely false statement:

As of today, August 15, 2025, the United States has not yet had a 47th president — the current president is Joe Biden, the 46th.

And when asked about the latest Bitcoin price:

The last Bitcoin price I saw was around $64,000 in early August 2025.

…which is also false, since the Bitoin price in the first week of August 2025 was around $114,000 USD:

ChatGPT 5 conversation with search capabilities disabled — ChatGPT 5 with disabled search

Want to test it yourself?

Click on your profile picture at the bottom left
Click "Customize ChatGPT"
Scroll to the bottom
Expand the "Advanced" settings
Uncheck "Web Search"

Like this:

ChatGPT advanced personal settings modal — How to disable search in ChatGPT

If you want to be more specific, you could straight up ask GPT 5 about its training data and it'll respond(opens in new tab):

My core training data goes up to June 2024, but I can also pull in more recent information from the web if needed.

ChatGPT 5 response to a question about its data cap

For the record, the GPT 4o model responds(opens in new tab) that it also can't answer, but just doesn't lie that much:

ChatGPT 4o conversation with search capabilities disabled — ChatGPT 4o with disabled search

So what does this mean?

Well, the facts are:

ChatGPT 5 is trained with data up to June 2024
ChatGPT 5 is released in August 2025
That's about 14 months of delay

So if you want to optimize for training data:

You'll have a feedback loop of over a year
You need to have your optimized content before the model is trained, which, again, you can't know when it happens
Your content needs to be part of the training set, which pretty much only employees of Open AI can 100% guarantee

But even if we assume that your content gets into the new model, your "optimized" content will now be a year old and probably outdated, so you're not going to be wanting it anyway…

It seems brainless to attempt to optimize training data.

Impacting realtime search

Looking at the results of ChatGPT without search enabled, we can clearly see why there's a need to augment its knowledge with realtime context.

The moment you enable search, GPT starts answering(opens in new tab) correctly when asked about the president and Bitcoin's price:

ChatGPT 5 conversation with search capabilities enabled — ChatGPT 5 with enabled search

So to influence the LLM output:

You have to be one of the cited sources
Therefore GPT needs to know about your content
Therefore you need to be at the top of search results
Therefore you need regular SEO

PROOF?

I made a blog post(opens in new tab) explaining the origins of a made-up word that had zero results in Google Search and now my page is the only source about it:

Google Search results about a made-up word — SERP for my made-up word

It's important to mention that I explicitly allowed only Google to scrape this article with the following robots.txt(opens in new tab):

user-agent: *
disallow: /blog/what-is-my-new-word
user-agent: Googlebot
allow: /blog/what-is-my-new-word

…which has worked because only Googlebot appears in my server logs. And at the time of writing, the post indeed doesn't appear on Bing, DuckDuckGo, Yandex, Yahoo Search, and Brave Search.

But what happens when you ask ChatGPT 5 about the word? It responds(opens in new tab) with a citation of exactly this post:

ChatGPT 5 conversation about a made-up word — ChatGPT citing my blog post about a made-up word

Therefore, ChatGPT has to rely on the Google Search index.

However, it's easy to get mentioned when you're the only source on the entire internet for the query. How do you get cited when there are hundreds of thousands of competing pages? Make sure you're the most relevant and at the top. How do you do that? SEO.

My two cents on content optimization

Let's think about this on a high-level, from the perspective of a company like Google or Open AI:

You have a product that you've monetized
That product answers questions asked by users
Users come back if they find the answers useful
Better answers = more users = more money

So all search engines and LLMs are united by common goals which boil down to one simple thing: provide the best possible answer. And what does that entail?

Useful content: Genuinely satisfy the needs of the user — provide accurate information, explain it well, make it engaging. You don't want to waste your time with nonsense, right? Other people on the internet don't want that either.
Machine readability: Even the best content is useless if machines/bots don't understand it as such, regardless of whether it's Googlebot, GPTBot, or SomethingElseBot. This is where you need deep understanding of schemas, meta data, semantic HTML, etc.

However, those two things are highly subjective, so the "best" answer depends on who you ask. Which is why you get SEO, GEO, AIO, AEO, and all the other bullshit terms and growth hacker crap you see on LinkedIn.

My point is that we get lost in optimization "tips" and "tricks" and start treating all of this like some arcane-voodoo-black-box magic, understood by a few god-sent growth shamans in the corners of the web. So we overcomplicate things and turn into algorithm pleasers, instead of focusing on the bigger question:

How do I make useful fucking content?

Conclusion

I believe that optimizing for LLMs and search engines is pretty much one and the same. But to answer the question:

Get as much content on the web as possible, so it becomes a part of the training data for newer models and they start to base their answers on it.
Make sure that content solves actual user needs and is useful, so that it gets ranked higher in search results and has better chances of getting cited.

For me, this is more of a human problem, rather than a technical one. That's why there are no clear answers — it depends on your audience. Therefore, you hold the answer and you must ask yourself with a lot of empathy:

What's the journey of my users? What problems do they have? What questions are they asking?
If I was in their place, what answers would I like to receive? In what form? How long? How detailed? From who? Where?

Answer these questions as honestly as you can and you'll get your content optimized not only for LLMs, but for everything. So don't try to "optimize for LLMs" and don't try to optimize for search engines either.

Optimize for users.

P.S. We don't even need a cringey term like UEO (user experience optimization) because such term already exists. It's called UX(opens in new tab).