I would love to better understand what you mean by "classify however it wants." ...

rorylaitila · 2025-10-21T14:45:37 1761057937

Yeah, the output is json structured, but I mean the entity value that is returned. A simple case is classifying the Brand of the ad. It might return any of "Ford", "Ford Motor Company", "Ford Trucks", "The Ford Motor Company", "Lincoln Ford" even on very similar ads. Rather than try to enhance the prompt like "always use 'Ford Motor Company' for every kind of Ford" I just accept whatever the value is. I have a dictionary that maps all brands back to a canonical brand on my end.

AbstractH24 · 2025-10-21T15:21:35 1761060095

What are you using to build the dictionary? Particularly when it encounters something you've never seen before.

This is really interesting to me.

rorylaitila · 2025-10-23T11:32:26 1761219146

Continuing the brands example, by default I store all of the brands returned as is (in SQL). On occasion, I will manually come across different variations of a brand that I decide is better combined into a primary brand. All of the secondary brands get marked as relating to a primary brand. Then the next time a new ad gets tagged as a secondary brand, I know I can use the primary brand instead.

So in essence, the process is what I might call 'eventually modelled' (to borrow from the concept of eventual consistency). I use the LLM entities as is, and gradually conform them to my desired ontology as I discover the correct ontology over time.