| You can't unit test for taste(dev.karltryggvason.com) | |
| 287 points by kalli 2 days ago | 128 comments | |
tl;dr: A developer building a virtual running app (In the Long Run) built a pipeline using GeoNames, Wikipedia, DuckDB, and Parquet to surface points of interest along routes, with Claude Haiku providing subjective relevance ratings. Hallucinations forced him to abandon LLM-generated summaries in favor of Wikipedia text, relegating AI to a scoring role alongside traditional signals like Wikipedia language counts. The hardest part was evaluation: there's no ground truth or unit test for "taste," and each route needed custom tuning to balance natural, historical, and populated landmarks. | |
HN Discussion:
| |