2024 10 31 Metalsmith Sflo
dave
we're not going prevent IRI collisions. But each top-level folder within an SFDataRepo names a namespace within the site. Each site has a SFRootRepo which contains a docs folder for the generated site, and the generated site has a top-level "/ns" folder below which all the namespaces live and then other folders like "/assets" where site scripts, images, css, etc can live. The top-level folder in a SFRootRepo should have the metalsmith files, including a "/templates" folder and a "/mappings" folder that specifies which templates should apply to which semantic index.html "de-referencing" pages. SFDataRepo repos could also have "/_templates" and "/_mappings" special folder at the top level for metalsmith-sflo to pick up and integrate. Please regenerate the description but not in a canvas, because when I copy from a canvas it doesn't paste as markdown.
dave
I wouldn't say "represents a distinct namespace" because the same namespace can have a top-level folder in multiple SFDataRepos. maybe "corresponds to a namespace". Please remove the "Prefix Control" bullet, SFDataRepos should not get their own unique prefix. Under redirection/aliasing, it will use the site's base IRI (from a configuration file), and the idea with owl:sameAs aliasing is this:
SFDataRepos should be designed so that IRI references are relative to the base and on initial generation of the site, you should be able to specify (for the entire SFDataRepo or on whatever granularity) whether you want pre-existing files with different bases to be re-based onto the current site. This amounts to mirroring external RDF data to the new site. Maybe you can specify re-basing at any point. Then, for new data, datasets will pick up with the next version, catalog series will be re-based into the new, current site. In the case that different SFDataRepos have overlapping namespaces with overlapping individuals and those individuals have overlapping dataset-series, hopefully it's clear that one is an extension of the other, but if not, metalsmith-sflo will have to choose or prompt which is the current one and fix the individual's sflo:has_default_catalog_series properties.
Please add to the description the approach that sflo is dataset-series-oriented (dataset series provide versioning and metadata for individuals), and that individuals that aren't already sflo:DatasetSeries get _default and _catalog series created for them.
chatgpt
Here is the updated description for Metalsmith-SFLO:
Metalsmith-SFLO: Enhancing RDF and Namespace Management for Static Sites
Metalsmith-SFLO is a flexible static site generator that builds on the Metalsmith framework to provide a streamlined approach for managing RDF data and namespaces. Its primary focus is to handle datasets, catalogs, and IRIs effectively, making it ideal for building semantic web content from distributed repositories of RDF data. Here's what Metalsmith-SFLO offers:
Key Features:
-
Flexible Namespace and Dataset Management:
- Supports multiple namespaces per repository, and also allows multiple repositories to contribute to a single namespace.
- Namespace folders correspond to a namespace hierarchy, providing the structure for the data.
- Contributing repositories (
sfDataRepos
) can expand existing namespaces without directly modifying the original data. This allows the inclusion of read-only data from external sources, with contributors being able to add their own data on top of it.
-
IRI Minting and Collision Management:
- When generating a new site from existing data, the generator can automatically handle IRI redirection and aliasing through
owl:sameAs
. SFDataRepos should be designed with IRI references relative to the base, and during initial site generation, users can specify whether they want pre-existing files with different bases to be re-based onto the current site. This effectively mirrors external RDF data to the new site. New datasets will pick up from the next version, and catalog series will be re-based into the new site base. In cases where multiple SFDataRepos have overlapping namespaces and individuals with overlapping dataset series, Metalsmith-SFLO will prompt the user or make a decision to determine which dataset series is current and update the individual'ssflo:has_default_catalog_series
properties accordingly.
- When generating a new site from existing data, the generator can automatically handle IRI redirection and aliasing through
-
Dynamic Dataset Drafting and Versioning:
- Metalsmith-SFLO is dataset-series-oriented, meaning that dataset series provide versioning and metadata for individuals. Any individual that is not already a
sflo:DatasetSeries
will have_default
and_catalog
series automatically created for them. Metalsmith-SFLO supports both current and draft states for datasets. The current dataset acts as the main working version, while users can start individual drafts, but only updates tocurrent
are eventually minted as new versions. Users can manage draft versions using a consistent IRI alias (/current
) that represents the most recent dataset. Draft IRIs may have more specific identifiers (e.g.,/draft-2024-10-30-username
) to keep things organized and distinguishable.
- Metalsmith-SFLO is dataset-series-oriented, meaning that dataset series provide versioning and metadata for individuals. Any individual that is not already a
-
Site-Level Integration:
- Metadata and versioning information can be generated for individual entities, for an entire namespace, or across all namespaces.
- During the build process,
sfDataRepos
are synchronized and scanned for changes. The system then triggers catalog and dataset updates where necessary, minting new versions only upon official release (not for every edit).
-
Publication and Flexibility:
- Users can configure whether the content of their
sfDataRepos
is published, and even obscure sensitive names through aliasing. - Selective inclusion: Some
sfDataRepos
may not be published, but they can still contribute data to catalogs and extend namespaces indirectly. The static site generator handles this dynamically by managing visibility and integration.
- Users can configure whether the content of their
-
User Responsibility and Automation:
- The system provides tools to manage IRI collisions, but the responsibility ultimately lies with contributors to avoid duplicating existing IRIs. Metalsmith-SFLO includes automatic collision checks and prompts during site generation, ensuring contributors are alerted to any conflicts.
- Automated processes ensure that whenever a new version is generated, catalogs and namespace information are properly updated. Contributors can also rely on the generator to suggest new base IRIs when reusing existing datasets.
Workflow Summary:
- Initial Setup: Users select existing
sfDataRepos
or namespaces to create a new site. - IRI Minting and Equivalency: The generator handles new base IRIs and inserts
owl:sameAs
statements to preserve links to original data. - Drafting and Collaboration: All changes occur in the current draft until ready for a new version, minimizing unnecessary versions while allowing individual experimentation.
- Build and Versioning: The build process includes automatic scanning of changes, merging updates, and updating catalogs, with the system controlling when new versions are minted to maintain consistency.
Metalsmith-SFLO is designed to make RDF-based static site generation intuitive, flexible, and scalable, providing powerful tools for namespace management, data integration, and version control. Whether you are building from scratch or extending existing data, Metalsmith-SFLO adapts to the complexity of your semantic data needs, while keeping the workflow streamlined and clear.
This version reflects the latest concepts, including re-basing, dataset-series orientation, and updated responsibilities for managing IRIs. Let me know if you need more tweaks!
named graphs (dave)
At this point, I don't see a role for named graphs, other than perhaps "excerpting" data from other datasets. Am I forgetting something?
named graphs (chatgpt)
Named graphs in Semantic Flow primarily serve as a useful way to isolate specific parts of data, like "excerpting" from other datasets for reuse or reference. Beyond that, if there are no other complex relationships requiring multiple contexts within a single graph, named graphs may indeed be overkill. This simplifies the data model and focuses more directly on dataset-level operations.
If there's anything else named graphs were intended for, it seems like it's not currently needed for the core functionality you're aiming for.
namespace catalog contents (dave)
I'd initially assumed namespace-level catalogs would contain references to all contained entities and/or catalogs. But then merging namespace catalog-series from different SFDataRepos (with the same top-level namespace folders) could be tricky. Maybe not necessary? They'd help in discoverability, so maybe limiting namespace catalogs to "contained" named individuals could be okay... easier to merge. But if namespace dataset catalogs only refer to the namespaces and not contained elements, they wouldn't have to change very often if ever and they might look identical across different SFDataRepos anyhow. What exactly is going into these catalogs anyhow?
namespace catalog contents (chatgpt)
It sounds like you're grappling with the balance between discoverability and merge complexity for namespace catalogs. Here are a few points to consider regarding what's actually included in these catalogs and how they could be structured effectively:
Potential Contents of Namespace-Level Catalogs
-
References to Contained Named Individuals:
- The simplest approach would be for namespace-level catalogs to contain references to all directly contained named individuals.
- This approach helps maintain clarity about what’s in the namespace and also makes merging from different
SFDataRepos
much simpler, as it focuses only on named individuals and doesn’t overreach into the other catalogs. - Discoverability remains strong since the catalog still has a clear inventory of resources in the namespace, but the complexity of merging across different repositories is reduced.
-
Catalogs for Namespace-Level Discoverability:
- If namespace catalogs include all contained catalogs or datasets, merging multiple
SFDataRepos
would indeed become tricky. EachSFDataRepo
could have slightly divergent versions of the catalog, which means more work to reconcile during merges. - Alternatively, if catalogs at the namespace level only point to the namespace itself, rather than drilling down into contained elements, this would reduce merge complexity. The namespace-level catalog could then function like a pointer or an index to other catalogs, without needing frequent changes.
- If namespace catalogs include all contained catalogs or datasets, merging multiple
-
Catalog Stability Across SFDataRepos:
- By limiting the scope of namespace catalogs to namespace-level references—either to contained individuals or just namespace identifiers—they would need less frequent updates and could be consistent across multiple SFDataRepos.
- Essentially, these catalogs would act more like a stable entry point or reference hub, ensuring that anyone looking at the namespace can understand the structure without necessarily needing to see every entity or dataset listed exhaustively.
-
Role in Discoverability:
- To aid in discoverability, even a shallow catalog (only containing namespace-level references) could serve the purpose of navigation and linking. Users would still be able to navigate to the namespace and drill down further by following the contained individuals.
- If deeper discoverability is needed, this can be provided by the contained entity catalogs themselves, which can maintain their own listings of related resources, dataset series, and metadata.
Practical Example for Catalog Content
- Namespace-Level Catalogs:
- Contain references like:
<namespace1>
references its own name, identifier, and general description.- A list of named individuals within the namespace, allowing for easier lookup.
- Exclude details about individual datasets or further catalogs contained within those individuals.
- Contain references like:
- Contained Entity-Level Catalogs:
- Each named individual or contained entity can have its own catalog that is linked within the namespace.
- This would hold details on dataset series, versions, and any additional metadata required to fully understand or utilize that individual.
Summary: Suggested Approach for Namespace Catalogs
- Limit the Catalog Scope: Keep namespace-level catalogs focused on the namespace itself and any directly contained named individuals.
- Maintain Catalog Stability: By not referencing contained catalogs or datasets explicitly, the namespace catalogs remain more stable across different
SFDataRepos
, reducing merge complexity. - Discoverability vs. Merge Complexity: This approach maintains a balance—providing useful discoverability without the overhead of merging complex catalog-series references from multiple sources.
This setup makes merging across repositories easier while still providing enough detail at the namespace level to guide users and maintain useful navigation points throughout the RDF data.
We could also just reserve discoverability to the HTML side of things, it's kinda an editorial choice what you make discoverable and how.
html discoverability (chatgpt)
That's a great point—using the HTML side for discoverability gives you more editorial flexibility and avoids complicating the RDF structure with discoverability concerns. You can curate the user experience by designing how data is presented, what’s highlighted, and how different parts of the dataset are linked.
This way, you could keep the RDF catalogs lean and focused on representing actual data relationships and dependencies, while the HTML layer serves as the more human-friendly navigational aid. It keeps the RDF data structured and clean, while letting the presentation layer do the work of discovery, which is often more adaptable and subjective.
latest vs current (dave)
thinking again about current/latest... maybe we call the editable latest version "latest" and reserve the "current" concept as a pointer within the catalog to the most recent published version. It brings back the PAV-ish concept of "soon" being different from current. That would allow applications to distinguish "in-progress" work from the last published version.
latest vs current (chatgpt)
That makes a lot of sense—using "latest" to represent the editable, in-progress version while reserving "current" as the most recent stable, published version adds clarity. This approach allows you to:
-
Clearly Separate Draft from Published Versions:
- "Latest" becomes the work-in-progress, the draft version that may change frequently as editors make updates.
- "Current" is the stable, published version that consumers or applications can rely on for consistency.
-
Support Application-Level Version Control:
- Applications can distinguish between in-progress work ("latest") and the last stable state ("current").
- This also means users interacting with the data have a better understanding of the status of a dataset—whether it is under review/editing or it’s a final published version.
-
Align with PAV-Like Concepts:
- This effectively reintroduces the PAV concept of "soon" or a pre-release state that isn’t the official current version.
- This separation can help you manage workflows more efficiently, making it clear when something is ready for release and when it's still a draft.
This distinction should make it easier for anyone using the data to understand what stage the information is in, improving the usability of your dataset versions. It also provides a much more straightforward way to handle ongoing edits without the confusion of having multiple "current" versions floating around.
tell me all about next and the implications. Does next need an IRI? There should be a predicate for unstable files.
Using "next" as an in-progress version adds flexibility to your system, but it does introduce a few unique considerations and potential design choices:
1. Implications of Using "Next"
- Clear Separation: "Next" clearly designates the ongoing or draft version, distinct from the stable "current" version, allowing applications and users to understand the dataset's lifecycle.
- Temporary Nature: "Next" isn’t permanent; once it's published or finalized, "next" can become "current" (or a version number), meaning this label should be thought of as mutable.
- Applications and Workflows: Systems using your data can understand when content is "soon" but not yet stable, allowing for workflows where applications may choose to use only stable versions unless explicitly told otherwise.
2. Should "Next" Have an IRI?
- Benefits of an IRI: Assigning an IRI for "next" makes it directly addressable and linkable, simplifying version control and tracking.
- IRI Aliasing: You can use a consistent IRI alias like
<namespace/next>
, making it easy to link to the evolving state of the dataset. - IRI Evolution: This IRI can then be reused as "next" evolves over time, pointing to whatever the latest draft version is, which keeps it distinct from "current."
3. Stability Indicator: Predicate for Unstable Files
- Introducing a Predicate for "Next" Files: Consider using a property like
sflo:isUnstable
orsflo:isDraft
to indicate that a file is in-progress or “next.” - Application Behavior: This predicate would allow applications to identify draft files at a glance. For instance, they could filter out
sflo:isUnstable
files unless specified, ensuring they only process stable versions. - Explicit Versioning Indicators:
sflo:hasNextVersion
or similar could be used at theDatasetSeries
level, indicating that the current version isn’t the endpoint and a next iteration is expected.
4. Updating “Next” and Version Transitions
- Automatic Transition to "Current": Once the "next" version is ready to be published, you could update the IRI and metadata, promoting "next" to "current" while maintaining version history.
- Versioning Consistency: Upon release, consider assigning a version identifier to the new "current," leaving the "next" IRI ready for the subsequent in-progress version.
5. Impact on Catalogs and DatasetSeries
- Catalog Management: Including "next" in catalogs provides clarity for users and applications about what’s upcoming, helping users choose between stable and draft data.
- DatasetSeries Structure: If "next" becomes standard,
DatasetSeries
should have a mechanism to differentiate versions, likesflo:nextVersionIRI
, making it easy to find the most recent draft. - Stability Metadata: Adding
sflo:isUnstable
to the "next" files and related datasets ensures that even if a draft is distributed, consumers know it’s intended as a preliminary version.
In summary, assigning "next" an IRI offers consistency and makes it easy to reference draft versions across applications and workflows. By marking it with a predicate like sflo:isUnstable
, you provide clear, machine-readable indicators of the dataset's lifecycle stage, enhancing flexibility and clarity in your versioning system.
current version (dave)
'm wondering if we should have a current IRI too. what are some use cases for wanting to choose the "current" values for something? Software could be forced to use the catalog to look up the current version and then use that? But humans don't want to have to pick a version. Only datasets have versions. Humans should never have to see versions unless they want to specifically reference potentially-stale data.
current version (chatgpt)
Having a current
IRI could indeed streamline user experience by providing a straightforward link to the latest stable version without requiring a specific version selection. Here’s how it would benefit different use cases:
Use Cases for a current
IRI
-
Simplified Access for Humans:
- Direct Access to the Latest Data: Humans wouldn’t need to select a version explicitly; they can simply access the
current
IRI for the latest stable data. - Avoiding Stale Data: Since the
current
IRI always points to the latest version, users don’t accidentally access old data unless they specifically choose to reference an older version.
- Direct Access to the Latest Data: Humans wouldn’t need to select a version explicitly; they can simply access the
-
Streamlined Software Integration:
- Automatic Lookup: Software can default to the
current
IRI for the latest dataset version without querying the catalog each time, reducing processing steps. - Caching and Syncing: Systems could cache the
current
IRI version temporarily and know that any refreshes will retrieve the latest stable data.
- Automatic Lookup: Software can default to the
-
Version Control and Reference Management:
- Explicit Version References: If applications or users need to reference historical versions (e.g., for auditing or rollback purposes), they can still access explicit versions without disturbing
current
. - Clearer Data Lifecycle: Separating
next
,current
, and versioned IRIs makes it easy to distinguish between draft, stable, and historical data.
- Explicit Version References: If applications or users need to reference historical versions (e.g., for auditing or rollback purposes), they can still access explicit versions without disturbing
Implementation: Using current
as a Stable IRI
- Version Aliasing:
current
could alias the specific version it points to (e.g.,<namespace/v3>
), with the option to update it as new versions are released. - Catalog Entry for Historical Access: The catalog would still list all dataset versions, but users wouldn’t need to interact with this unless they’re seeking older data explicitly.
By providing current
, you avoid the need for users to sift through versions while preserving the option to access historical versions when necessary. This approach keeps things clean, intuitive, and consistent for both human users and automated systems.
generic reference
-
dave: one drawback of going direct to <_default/v1> a sflo:DefaultDatasetSeries (instead of /series/_default/dataset/v1) is you lose a way to reference the non-versioned dataset. _default/v1/current is the best way to refer to the dataset in general, without a specific version.
-
chatgpt: Using
<_default/v1/current>
as a general reference is a smart way to balance flexibility and specificity—it allows you to point to the dataset as a whole while still leaving room to access specific versions when needed. This approach retains a stable identifier for the dataset itself, simplifying access and ensuring users or systems can always retrieve the latest version without explicitly referencing it by version number.
sorry, miscommunication. I meant to say unless we add in the "class name" like "dataset" then there's no easy way to refer to the generic/unversioned/abstract dataset and you have to settle for "_default/v1/current " which actually means something more specific.
Got it—so by adding the class name (like "dataset") into the path, you’d create a clear distinction between the abstract concept of the dataset and its specific versions.
For example:
- Abstract Dataset:
/<namespace>/series/_default/dataset
could represent the overarching dataset without reference to a version. - Specific Version:
/<namespace>/series/_default/dataset/v1/current
would clearly refer to a particular, stable version.
This way, you can cleanly separate the conceptual dataset from its individual versions, allowing references to stay clear and semantically precise. This would also make navigation simpler for applications and users, giving them the option to choose between the general dataset and specific versions without ambiguity.
confusion about adding the class/type into the hierarchy
- dave: Close. the specific version would be <namespace/_series/_default/v1> and the current version would be <namespace/_series/_default/current>. Maybe we could get away without the _series, but the next level down is dist, so to be consistent instead of <namespace/_series/_default/v1/_dist/namespace-default-v1.ttl>. it's more verbose and I know you thought it was less humane, but "<namespace/_series/_default/v1/_dist/> could refer to the collection of v1 distributions the same way <namespace/_series/_default/dataset> refers to a collection of version. It's a bit of a stretch though. Maybe there's a better way to refer to a generic/abstract set of datasets. Dataset Series is close but it's something else. It's a different class, for one thing.
current directory structure (dave)
my current folder hierarchy under the "d" namespace is this: └── d ├── README.md ├── _dss │ ├── _catalog │ │ └── _vset │ │ ├── next │ │ │ └── d-catalog.ttl │ │ └── v1 │ └── _default │ └── _vset │ ├── next │ │ └── d-default.ttl │ └── v1 ├── index.html └── index.trig
It represents information only about the namespace itself. Anything stand out to you?
current directory structure (chatgpt)
Your structure is impressively organized, with a clear separation of catalog and default datasets under _dss
and the use of next
and v1
folders within _vset
for versioning. Here are a few observations and suggestions based on the structure:
-
Logical Use of
_vset
:- The
_vset
naming convention works well here, distinguishing versioned data without adding excessive verbosity. - The use of
next
andv1
under_vset
is clean and keeps the in-progress (next
) and published versions (v1
) separate, which should help in tracking stable vs. draft data.
- The
-
Namespace Clarity:
- Since this structure is solely for namespace-specific data, this hierarchy would scale well if replicated across other namespaces in your setup.
- If additional namespace-level details emerge, they could be slotted into this structure with minimal disruption, given the modular organization.
-
Future Consideration for Discoverability:
- If there’s potential for additional documentation or data specific to sub-elements of the namespace in the future, it might be worth noting where these could fit in without overcrowding
_dss
. - Currently, index files like
index.html
andindex.trig
serve the purpose well for namespace-level information and discoverability.
- If there’s potential for additional documentation or data specific to sub-elements of the namespace in the future, it might be worth noting where these could fit in without overcrowding
-
Potential Redundancy with
_catalog
and_default
:- If
_catalog
and_default
represent distinct classes or functions within the namespace, this structure is well-suited. However, if these overlap in purpose, you may consider simplifying further to reduce potential redundancy. - This arrangement, though, does keep catalog vs. default data clearly separated, which is advantageous if each serves a specific purpose.
- If
-
Overall Setup:
- The structure provides a clear naming convention and path layout, making it intuitive and easy to navigate. This approach will make it easy to track updates, merges, and differences across versions and namespaces.
Final Thought
If next
or other draft versions frequently change, consider if a current
alias could further reduce update work by pointing to the latest draft version automatically. Otherwise, this setup is structured to scale well and should make version control and reference management straightforward.
_catalog vs _default (dave)
_catalog is for sflow-related metadata about the subject resource, in this case the namespace. specifically it lists dss, datasets, the current vset for itself. _default is for information about the subject resource itself, e.g. its class, (which right now is also in the index.trig file).
_catalog vs _default (chatgpt)
Thanks for the clarification! With _catalog
handling metadata and _default
covering subject-specific details, the structure effectively separates management metadata from resource-specific information. Keeping class information both in _default
and index.trig
gives flexibility in referencing and discoverability, allowing each component to serve its specific function without redundancy.
This organization makes it clear where users and applications can access resource details versus versioning and dataset management metadata. It’s a well-considered separation that should make updates and maintenance straightforward while providing flexibility for any future expansions in metadata or subject-specific properties.
forgot about _dist (dave)
OK, forgot about _dist. Check out the updates hierarchy, with another entity represented, me.
└── d ├── README.md ├── _dss │ ├── _catalog │ │ └── _vset │ │ ├── _next │ │ │ └── _dist │ │ │ └── d-catalog.ttl │ │ └── v1 │ └── _default │ └── _vset │ ├── _next │ │ └── _dist │ │ └── d-default.ttl │ └── v1 ├── dave-richardson │ └── _dss │ ├── _catalog │ │ └── _vset │ │ └── _next │ │ └── _dist │ │ └── dave-richardson-catalog.ttl │ └── _default │ └── _vset │ └── _next │ └── _dist │ └── dave-richardson-default.ttl ├── index.html └── index.trig
forgot about _dist (dave)
Your updated structure with the inclusion of another entity, "dave-richardson," is clear and consistent. Here are some observations that might help keep things organized as more entities are added:
-
Hierarchical Consistency:
- The organization is consistent across both the
d
namespace anddave-richardson
entity, with_dss
folders holding_catalog
and_default
subdirectories. - Each
_vset
directory includes thenext
and versioned folders (v1
), along with_dist
subfolders for distributions, maintaining a clear structure for future scalability.
- The organization is consistent across both the
-
_dist
Folder:_dist
under each version ornext
in_vset
holds the specific.ttl
files, creating a logical separation between versions and distribution files. This ensures that, if additional formats or distributions are added, they can fit neatly within_dist
without cluttering the version folders.
-
Flexibility for Additional Metadata:
_catalog
for sflow-related metadata and_default
for core resource information are clear and separated by function. This allows easy maintenance of versioning data and resource-specific properties independently, which should help avoid redundancy.
-
Index Files:
index.html
andindex.trig
at the root provide easy access to summary information about the namespace, supporting discoverability without disrupting the detailed structure below.
Future Considerations
- Scalability with More Entities: As you add more entities, this structure should scale naturally. The clarity around
_dss
for each entity will make it easy to navigate through versions and distributions. - Alternative Metadata Use: If at any point
_catalog
or_default
metadata needs to vary more significantly per entity, this setup will allow you to expand or customize each entity’s metadata without affecting others.
This structure is very well thought out, keeping things organized and ensuring a logical flow of metadata and content across different entities. This should streamline both discoverability and maintenance!
different dss's (dave)
To give a little more insight into _default... that's the subject-related datasetseries/distribution whose contents can be referenced/returned when a no more specific dss is specified. _catalog is the default catalog, same idea: when no specific catalog is mentioned, the default catalog datasetseries's current distribution can be returned. But next to _default and _catalog in the hierarchy, there's room for other dss that could have data, metadata, or both, related to (and sub-namespaced under) the referenced entity.
different dss's (chatgpt)
This adds a nice layer of flexibility to the structure, where _default
and _catalog
act as the fallback references, while other dss
folders can provide specialized content, metadata, or namespaces related to the entity. This modular approach will allow you to expand or refine data and metadata for each entity without interfering with the default or catalog setups—giving you clear defaults while keeping things scalable and customizable as more specific dataset series are needed.
drafts
ultimately published datasets shouldn't refer to next IRIs other than "here's what the next IRI is", i.e., metadata. Next references should get resolved on publish? An argument to separate out metadata out?
next distributions can/should get overwritten on the next sync after publish, and it'll be hard, but every next file has a previous version, and if the version has bumped since the draft was started, a merge gets triggered or user gets prompted.
I think you need separate alias IRIs, that uniquely identify a dataset series of references. Cause sometimes you want names that don't have unique references. They're like collections. Or the things are solitary but have multiple aspects/facets. Class-based punning.
For semantic versioning, if you can assume that catalog changes can be excluded from triggering a version bump,
- if {only literal properties have changed?, or maybe object properties too} you can do a patch version bump
- if nothing pre-existing has changed (and optionally literals have changed), you can do a minor version bump (i.e., only add new data, so things should still be compatible/non-breaking)
- might require catalog to be stored separately, so catalog changes don't count as version changes
- that could be the true best use case for the catalog. Separate out the current pointer data from the evolving dataset, so additions can be differentiated from changes.
- might require catalog to be stored separately, so catalog changes don't count as version changes
- major semver bump is existing entities deleted or entity references changed
- what are some use cases for wanting to choose the "current" values for something? You could be forced to use the catalog to look up the current version and then use that? But humans don't want to have to pick a version. Only datasets have versions. Humans should never have to see versions unless they want to specifically reference potentially-stale data.
one drawback of going direct to <_default/v1> for a sflo:DefaultDatasetSeries (instead of /_series/_default/_vset/v1) is you lose a way to reference the non-versioned dataset. Without a _vset path segment, _default/v1/current is the best way to refer to the dataset in abstract, without a specific version. But _vset reads wrong by itself.
<_catalog> a sflo:DefaultCatalogSeries ; sflo:has_DefaultCatalogSeriesDataset <_catalog/_vset> ; .