Tidbit: Capture anything into structured Markdown notes and training-ready JSONL.
Built a small CLI where you can capture anything and convert to a personlized markdown and training ready jsonl format
You read a paper. You paste it into an LLM. You get a summary. You close the tab. Two weeks later you need the methodology section and the specific numbers. Gone.
tidbit fixes that without becoming yet another note-taking app. You stay in your existing editor (Obsidian, Logseq, vim, VS Code, whatever) and tidbit becomes the layer that turns ephemeral content into structured Markdown that fits the workflow you already have.
It does two things at once, from a single capture:
Builds your knowledge base. Define for research papers, extract title, authors, methodology, findings, limitations once in a YAML file. Every paper you capture afterwards has the same shape. Two hundred notes later you can grep across all of them by field because they all match.
Builds a training dataset. Every capture also writes a JSONL row containing the raw input and the extracted fields. Over time this becomes a domain-specific dataset of (content, structured output) pairs. Use it for evals, retrieval, or fine-tuning a small local model on your exact extraction patterns.
You don't have to choose between the two. You get both for free, on every capture.
Check it here and share your honest thoughts and improvement ideas : https://github.com/phanii9/Tidbit

Replies