Discussion about this post

User's avatar
Lorena Mayoral's avatar

This is a really clear breakdown of why continual learning matters and why it’s hard, without overhyping either side. I appreciate how you separate hype from mechanism, especially the explanation of catastrophic forgetting.

Antimoni's avatar

>It would be very convenient if models could learn like we do, though. Imagine how much energy you’d waste if, after learning the basics of driving, you couldn’t learn how to parallel park unless you relearned how to drive and parallel park from day one of driver’s ed. Current LLMs are pretrained on all the data, before being released into the world. To update a model, developers have to retrain it on everything it already learned plus the new stuff.

Isn’t that an exaggeration? Continued pretraining can update a model without starting from scratch. You might mix in older data to reduce catastrophic forgetting, but that’s still incremental training, not necessarily a full retrain on the entire pretraining corpus plus new data.

1 more comment...

No posts

Ready for more?