LLMs are unreliable for lists of inputs

Be careful if you’re asking an LLM to do something to each item in a list: the behavior might vary based on the order of the items. I was asking a model to classify a few items in a list according to the same rubric, when I realized that modifying one item changed the classification of another, unmodified item.

See the Quarto notebook here.