
DataCheck
CLI-first data validation. YAML config. 27+ built-in rules.
3 followers
CLI-first data validation. YAML config. 27+ built-in rules.
3 followers
DataCheck validates data quality with YAML configuration and a simple CLI. Auto-generate rules from your data, validate files and databases, and integrate with CI/CD. No servers, no dashboards - just clean data in your pipelines.







Hey Product Hunt! 👋
I'm Yash Chauhan, and today I'm launching DataCheck 🚀
DataCheck is a CLI-first data validation tool for data engineers.
You define validation rules in YAML, run one command, and get an exit code: 0 = pass • 1 = fail.
No dashboards, no servers - just validation that fits into your pipeline.
Why I built this
Most teams validate data in one of two ways:
Ad-hoc scripts (Pandas + assert) → quick but messy and hard to maintain.
Heavy platforms like Great Expectations → powerful but slow to set up.
I wanted a middle ground: simple, version-controlled, and CI-friendly.
What DataCheck does
~3-minute setup
YAML configs readable by non-Python teammates
Auto-generate rules from your dataset
27+ built-in checks (nulls, regex, stats, dates, cross-column)
Works with Snowflake, BigQuery & Redshift (no data stored)
Runs in GitHub Actions, GitLab and Jenkins
Try it
pip install datacheck-cli
datacheck config generate your_data.csv
datacheck validate
I'm actively building this and would really value feedback:
Would you use this in your pipeline?
What's missing for your workflow?
GitHub
Thanks for checking out DataCheck ❤️