The webshart format is a community-driven, loosely-organised attempt at pushing a better standard for dataset metadata.
The datasets here are either a converted dataset from a third-party source (such as CC12M) or were created by SimpleTuner or CaptionFlow community members.