Discussion about this post

User's avatar
Aaron Scher's avatar

> And, while most data centers need to be located near to their key users, to reduce latency, AI data centers are often located where energy and land are cheap

I think it's often overstated how important it is for data centers to be near users, including in this piece. Just look at the actual latency numbers, e.g., https://wondernetwork.com/pings/San%20Francisco. From SF to the East Coast is sub 100ms (round-trip), from SF to New Orleans is ~51ms. This is just not enough to matter for *most* LLM uses we see today; real time things like translation or voice are the exception. Maybe uses will change, but I expect the vast majority of LLM use won't be latency sensitive on the order of 1/20th of a second.

For reference on how much 50ms is, ChatGPT as measured by Artificial Analysis (https://artificialanalysis.ai/models/gpt-5-chatgpt/providers) has ~530ms time to first token (latency), many small models are also in the 0.3-1s range.

Expand full comment

No posts