so You Want Git for Data
Aspects of "Git" that you might want for data
- Version Control
- rollbacks
- diffs
- lineage
- branch-merge
- sharing
- addressability
- multiple remotes
- staging area
- Data Catalog
- thriving open data community
- collaborate remotely and asynchronously
- pull requests
- create issues referring to certain parts of the data
- Types
- transformation can happen externally
- Data Labelling can happen as statement metadata?
Three types of solutions
The products fell into three general categories:
1. [[Data catalogs|t.cs.data.catalog]]
2. Data pipeline versioning
3. Versioned databases