You can't unit test for taste(dev.karltryggvason.com)
281 points by kalli 2 days ago | 128 comments
tl;dr: A developer building a virtual running app (In the Long Run) built a pipeline using GeoNames, Wikipedia, DuckDB, and Parquet to surface points of interest along routes, with Claude Haiku providing subjective relevance ratings. Hallucinations forced him to abandon LLM-generated summaries in favor of Wikipedia text, relegating AI to a scoring role alongside traditional signals like Wikipedia language counts. The hardest part was evaluation: there's no ground truth or unit test for "taste," and each route needed custom tuning to balance natural, historical, and populated landmarks.
HN Discussion:
  • Taste is implicit knowledge that cannot be fully externalized or specified to a machine
  • ~Taste can be partially codified or trained through accumulated context and tooling
  • Concrete technical suggestions for better ranking signals like QRank or Wikipedia quality classes
  • LLMs are useful in a supporting role but need human oversight for quality work
  • LLMs produce clever but poorly-judged outputs compared to humans with taste