Grokking in Machine Learning

Ever heard of “grokking” in machine learning? It’s when machine learning models suddenly go from memorizing data to generalizing on new inputs after extensive training–that is, actually applying and calibrating what they have learned, i.e. what college teachers are trying to get students to do in their courses.

Researchers have observed this shift in both simple and more complex models, raising questions about whether large language models truly understand or simply recall text. So far, especially for LLMs (Large Language Models, including the generative ones currently in circulation), the outcome has been that memorization is easier than generalization.

Not much of a surprise there.

Super interesting and a bit technical article, but readable if you’re familiar with the basics of ReLU: https://pair.withgoogle.com/explorables/grokking/

This entry was posted in Uncategorized. Bookmark the permalink.

Comments are closed.