BUY.VODKA
Free database of every US spirits label approval
21 followers
Free database of every US spirits label approval
21 followers
Search 9,000+ vodka product groups, 6,000+ brands, and 2,400+ producers: all from TTB public data. See which distillery actually makes your vodka, track label approvals, and map contract distilling relationships across the US spirits industry.








Hey everyone, I built this because I kept hitting dead ends trying to answer a simple question: who actually makes this vodka? The TTB publishes all label approval data, but it's trapped in a clunky government search interface that requires session cookies and returns results you can't link to.
So I wrote a pipeline to pull it all out, cross-reference DSP permits with producers, and generate static pages for every product group, brand, and distillery. The stack is Astro 5 + Tailwind CSS v4 on Cloudflare Pages, with a Python pipeline handling data extraction and enrichment. The whole site is statically generated: no server, no database at runtime.
The contract distilling relationships are the most interesting part. You can see which facility actually distilled a product vs. who just bottled it. Turns out the US vodka market is more consolidated than the shelf suggests.
Would love feedback on what data views would be most useful. We're starting with vodka and expanding from there.
Who makes Trader Joe's vodka? I built a database in three weeks that can tell you.
Gulf Coast Distillers (Houston, TX): a contract producer behind 26 brands, including 3 Trader Joe's vodka labels approved between 2020 and 2023.
I know this because I built BUY.VODKA in three weeks: a reference database indexing every TTB-approved spirits label in the US. The TTB (federal agency that approves alcohol labels) publishes this data. In theory. In practice, it's locked behind a 2003 Java web app where detail pages return 39 bytes of empty HTML unless your browser has an active server-side session.
The technical story:
Playwright scraper with persistent non-headless browser sessions: Imperva WAF blocks anything that smells automated. The WAF also rotates TLS certs under sustained load, nuking your browser context with no recovery path. Only fix: kill and relaunch. Rate-limited at 2,500 requests per 10-minute pause cycle.
The scraper is agentic, every parsed record gets a quality score, incidents self-classify (parser_failure, value_anomaly, distribution_shift), and it flags when field distributions drift more than 2σ from baseline mid-session. 12,127 records scraped. 284 incidents. Zero parser failures.
12,127 individual federal label approvals consolidated into 9,038 product groups. Grouping key: (brand_name, fanciful_name, class_type, permit_number, origin). 74% of vodka labels have NULL in the fanciful_name column, which makes the grouping logic non-trivial.
One data quality trap worth sharing: the COLA record has an "Origin" field. An intuitive place to get the producer state. Wrong. A brand bottled in Kentucky might file through a New York DSP permit. The correct state comes from the permit prefix: "DSP-MI-20123" → Michigan.
Astro 5 generates 17,590 static HTML pages—9,038 labels. 6,081 brands. 2,439 producers. Cloudflare Pages serves them. Lighthouse 100. Zero client-side JS.
What's a product where you've always wondered who actually makes it?