ProvONE: A PROV Extension Data Model for Scientific Workflow Provenance

Draft 01 May 2016

Contributors:
Víctor Cuevas-Vicenttín, UC Davis/University of New Mexico
Bertram Ludäscher, UIUC
Paolo Missier, Newcastle University
Khalid Belhajjame, PSL, Paris-Dauphine University, LAMSADE
Fernando Chirigati, New York University
Yaxing Wei, Oak Ridge National Laboratory
Saumen Dey, UC Davis
Parisa Kianmajd, UC Davis
David Koop, New York University
Shawn Bowers, Gonzaga University
Ilkay Altintas, UC San Diego
Christopher Jones, NCEAS, UC Santa Barbara
Matthew B. Jones, NCEAS, UC Santa Barbara
Lauren Walker, NCEAS, UC Santa Barbara
Peter Slaughter, NCEAS, UC Santa Barbara
Ben Leinfelder, NCEAS, UC Santa Barbara
Yang Cao, UIUC

Abstract

Provenance describes the origin and processing history of an artifact. Data provenance is an important form of metadata that explains how a particular data product was generated, by detailing the steps in the computational process producing it. Provenance information brings transparency and helps to audit and interpret data products. The state of the art scientific workflow systems (e.g. Kepler, Taverna, VisTrails, etc.) provide environments for specifying and enacting complex computational pipelines commonly referred to as scientific workflows. In such systems, provenance information is automatically captured in the form of execution traces. However, they often rely on proprietary formats that make the interchange of provenance information difficult. Furthermore, the workflow itself, which represents very useful information, may be disregarded in provenance traces. The evolution history of the workflow (i.e. its provenance) can likewise be missing. To address these shortcomings we propose ProvONE, a standard for scientific workflow provenance representation. ProvONE is defined as an extension of the W3C recommended standard PROV, aiming to capture the most relevant information concerning scientific workflow computational processes, and providing extension points to accommodate the specificities of particular scientific workflow systems.

This document specifies the ProvONE model and details how its constituting parts are related to the W3C PROV standard. The description provided is complemented by examples including queries on ProvONE data.

Status of This Document

Version 1 Draft: This specification is in review and is publicly released for evaluation and possible adoption. However, it is not associated with and is not supported by any standards organization.

Please Send Comments

This specification was developed by the DataONE Cyberinfrastructure Working Group. If you wish to make comments regarding this document, please send them to developers@dataone.org. All comments are welcome.

The source for this specification and for the OWL document that implements the specification are maintained in our sem-prov-ontologies GitHub repository. We all welcome submission of a pull requests via our sem-prov-ontologies GitHub repository if you wish to propose specific changes to the specification or the OWL model.

1. Introduction

Historically, one of the main uses of provenance has been to support claims of attribution and authenticity, and therefore of value for material objects (e.g. works of art, manuscripts, etc.). In science, provenance is required to provide evidence in support of the experimental results that underpin scientific publications. The importance of provenance still applies in e-Science settings, where the data is obtained through computational methods. In these cases, the provenance of the experimental outcome is typically a graph structured account of the individual computational steps, which is recorded automatically, at the level of detail specified by the system instrumentation. This form of provenance, suitably encoded for machine processing, can then be exploited using a variety of graph query and analysis tools.

This scenario, where each piece of scientific data obtained by a computational method is associated with its provenance, is becoming increasingly prevalent. Regarding scientific workflows, detailed execution traces are routinely collected by a number of broadly used Workflow Management Systems (WfMSs) including Taverna, Kepler, VisTrails, Galaxy, e-Science Central, Pegasus, and others. However, these systems often adopt proprietary models for encoding the provenance traces captured by workflow executions. Moreover, they adopt different models to specify the workflows themselves. Such heterogeneity makes it difficult for a scientist to analyze and compare provenance traces captured using the same or similar workflows that were specified and enacted using different systems. The absence of a standard model for representing workflow provenance also means that opportunities for stitching the traces produced by different workflows, and therefore assisting the scientist in her analysis, are likely to be missed.

This document presents ProvONE, a model for scientific workflow provenance that aims to fulfill the requirements of the desired standard. The name originates from its development in the context of the DataONE Project, which is creating a large scale and federated data infrastructure serving the earth sciences community. Nevertheless, ProvONE is designed to support a large variety of WfMSs that in turn are used by numerous scientific communities.

1.1 Relation to other standards

The provenance community has made significant efforts in developing standard models that can be used for capturing and publishing provenance of artifacts and resources on the Web. These efforts resulted in, first, the Open Provenance Model (OPM) [MCF+11], and more recently, the W3C PROV model [PROV]. While such models are useful and are being adopted by academics and industrials alike, as suggested by the number of PROV implementations, they do not suffice for encoding scientific workflow provenance. The reason being, that both OPM and PROV were developed as minimal models meant to be used for tracking the provenance of resources on the Web regardless of their types. As such, they do not provide all the concepts that are necessary for specifying workflows and encoding the provenance of data products used and generated as a result of their execution. Consequently, many WfMSs adopt their own provenance models, resulting in the aforementioned loss of interoperability opportunities.

Thus the need arises for a new model that acts as a standard for encoding scientific workflow provenance. Instead of creating such a model from scratch, the W3C PROV model can be used as a starting point. A preliminary proposal following this direction was published in [MDB+13]; an independent extension of PROV for scientific workflows is also presented in [CSdO+13], as well as in [BKG+13] (focusing on workflow preservation). This document aims to incorporate and standardize the ideas of these works, as well as additional contributions, to derive an adequate standard that can be used by the scientific workflow community.

1.2 Aspects covered by ProvONE

ProvONE aims to provide the fundamental information required to understand and analyze scientific workflow-based computational experiments. Therefore, it covers the main aspects that have been identified as relevant in the provenance literature. These correspond to prospective and retrospective provenance [ZWF06] as well as process provenance [FSC+06]; additionally, some essential elements of data structure are also considered. Each of these aspects is described next.

1.3 Structure of this Document

Section 2 provides an overview of the ProvONE conceptual model, covering the aspects outlined in Section 1.2. The conceptual model of ProvONE is given using the Unified Modeling Language [UML].

Section 3 provides a detailed characterization of the various components of ProvONE, which is serialized as an OWL 2 ontology. It clarifies how the ProvONE concepts are related to the W3C PROV concepts, accompanying the descriptions with examples.

Section 4 gives references to additional resources that form part of the ProvONE standard.

1.4 Namespaces

The following namespaces and prefixes are used throughout this document.

Table 1 ◊: Prefix and Namespaces used in this specification
prefix namespace IRI definition
prov http://www.w3.org/ns/prov# The PROV namespace [ PROVO]
provone http://purl.dataone.org/provone/2015/01/15/ontology# The ProvONE namespace [ProvONE]
xsd http://www.w3.org/2000/10/XMLSchema# XML Schema namespace [ XMLSCHEMA11-2]
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns# The RDF namespace [ RDF-CONCEPTS]
rdfs http://www.w3.org/2000/01/rdf-schema# The RDFS namespace [ RDF-SCHEMA]
owl http://www.w3.org/2002/07/owl# OWL 2 specification namespace [ OWL2]
dcterms http://purl.org/dc/terms/ Dublin Core Metadata Elements namespace [ DC-RDF]
bibo http://purl.org/ontology/bibo The Bibliographic Ontology namespace [ BIBO]
wfms http://www.wfms.org/registry.xsd Placeholder example WfMS namespace
: http://example.com/ Artificial namespace for examples

2. ProvONE Conceptual Model Overview

This section introduces ProvONE informally through a UML class diagram representing its conceptual model and brief descriptions of each of the aspects covered by the model.

The ProvONE conceptual model is illustrated by the UML diagram of Figure 1. All classes have a correspondent PROV type denoted by a UML stereotype (e.g. «entity»), whereas this is the case for only a subset of the associations (e.g. «used»). Each of the aspects covered by ProvONE is briefly described next.

ProvONE Conceptual Model UML Diagram
Figure 1 ◊: ProvONE Conceptual Model UML Diagram

Workflow Representation. The various tasks that form part of a workflow are represented by the Program class. Programs can be either atomic or composite, the latter case specified through the hasSubProgram self association. A given program can be distinguished as a Workflow. Each Program may have a series of Ports that function as input or output ports. Ports from the various Programs are connected through Channels. Note that both input and output ports may be associated with multiple Channels, thus allowing workflow models in which a single output is copied and sent to multiple destinations, as well as in which tasks take inputs from different sources through a single input port.

In order to specify executable instances of a Workflow, default parameters can be defined for some of its constituent Programs. The default parameters are represented by Entities that will described shortly. A Controller class can be used to specify that the execution of a given Program is controlled by another Program, which allows for differing models of computation. For instance, in a synchronous dataflow model, a given Program may only start once the execution of a preceding Program terminates.

Trace Representation. The execution traces associated with a given Workflow are represented in ProvONE through the Execution class. Each Execution instance represents the execution of a particular Program (its Plan), which itself may be a Workflow, and may also be associated with a User responsible for the execution. For the execution of a Program, a series of input Entity items are read from the input Ports and are used to generate a series of output Entity items sent through the output Ports. These outputs may be Data, Visualization, or Document items, depending on the goals of the Workflow. Through the use of the Usage and Generation classes, whenever an Entity item is sent from an output Port to an input Port, this event is recorded through the hadEntity, hadInPort and hadOutPort properties between the Entity item and the associated Ports. In this manner, the graph structure that represents the provenance of the workflow results is generated.

Data Structure Representation. The various entities associated with workflow instances and traces are represented by the Data class, the Visualization class, or the Document class. The Data class is defined to be generic and represents data items of various types (e.g. XML, JSON, CSV files, etc.). Visualizations are a differentiated class intended to represent various visualization items often output from workflows (JPG, PNG, SVG, MP4, etc.). The Document class is a generic representation of a published or unpublished article or report that was created as a result of a given Execution of a Program or Workflow. In the ProvONE model, each Entity subclass instance is uniquely identifiable regardless of it sharing the same value as another Entity instance. Although specific data types are not covered directly in ProvONE, collections of Entity items are represented through the Collection class. A Collection may in turn represent a set, bag, list or another variant of a group of items.

Workflow Evolution Representation. The specific changes that are performed during the specification of a Workflow are not modeled directly in ProvONE, since these are expected to vary among different WfMSs. However, the different versions of a Workflow form a derivation tree that can be represented using PROV's wasDerivedFrom association, as is explained in the next section.

The ProvONE constructs are summarized in Table 2. The first column lists the aspects covered by ProvONE, serving to indicate the various constructs associated with each aspect. The second and third columns indicate the type of each construct as presented in the UML class diagram (class or association) and the construct name, respectively. The last column contains a link to each construct specification in Section 3.

Table 2 ◊: ProvONE Constructs
ProvONE Aspect Construct type Name Specification
Workflow Class Program Section 3.1.1
Port Section 3.1.2
Channel Section 3.1.3
Controller Section 3.1.4
Workflow Section 3.1.5
Association hasSubProgram Section 3.1.6
controlledBy Section 3.1.7
controls Section 3.1.8
hasInPort Section 3.1.9
hasOutPort Section 3.1.10
hasDefaultParam Section 3.1.11
connectsTo Section 3.1.12
wasDerivedFrom Section 3.1.13
Trace Class
Execution Section 3.2.1
Association Section 3.2.2
Usage Section 3.2.3
Generation Section 3.2.4
User Section 3.2.5
Association
used Section 3.2.6
wasGeneratedBy Section 3.2.7
wasAssociatedWith Section 3.2.8
wasInformedBy Section 3.2.9
wasPartOf Section 3.2.10
qualifiedAssociation Section 3.2.11
agent Section 3.2.12
hadPlan Section 3.2.13
qualifiedUsage Section 3.2.14
hadInPort Section 3.2.15
hadEntity Section 3.2.16
qualifiedGeneration Section 3.2.17
hadOutPort Section 3.2.18
Data Structure Class Entity Section 3.3.1
Collection Section 3.3.2
Data Section 3.3.3
Visualization Section 3.3.4
Document Section 3.3.5
Association wasDerivedFrom Section 3.3.6
hadMember Section 3.3.7

3. ProvONE Model Specification

This section presents the specification of the various components of the ProvONE model outlined in the previous section, covering them as presented in Figure 1 and Table 2. The specification takes the form of an OWL 2 [OWL2] ontology that extends the W3C PROV-O ontology [PROVO].

The namespace for all ProvONE terms is http://purl.dataone.org/provone/2015/01/15/ontology#.

The encoding of the ProvONE ontology can be found under this link: provone.owl

3.1 ProvONE Workflow Specification

3.1.1 Program class

A Program represents a computational task that consumes and produces data through its input and output ports, respectively. It can be atomic or composite, the latter case represented by a possibly nested Program.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#Program

has super-class

is in domain of

is in range of
Example 1

The following RDF fragment specifies a Program identified within the RDF document by program_1.

          1    @prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
          2    @prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
          3    @prefix owl:     <http://www.w3.org/2002/07/owl#> .
          4    @prefix dcterms: <http://purl.org/dc/terms/> .
          5    @prefix prov:    <http://www.w3.org/ns/prov#> .
          6    @prefix provone: <http://purl.org/provone> .
          7    @prefix wfms:    <http://www.wfms.org/registry.xsd> .
          8    @prefix :        <http://example.com/> .
          9
          10   :program_1
          11
          12      a provone:Program;
          13      dcterms:identifier "e1"^^xsd:string;
          14      dcterms:title "CDMSBoxfill"^^xsd:string;
          15      wfms:package "gov.llnl.uvcdat.cdms"^^xsd:string;
          16
          17   .
          
Line 12 specifies class membership. In order to specify additional attributes for the Program, we employ first the Dublin Core Metadata Elements [DC-RDF]. In particular, we assign the string e1 as an identifier, thus stating an identifier explicitly, which is independent of the RDF node identifier. In addition, a descriptive title is given in line 14. Additional attributes associated with the specific WfMS in use, in this case a placeholder example, can also be specified in a similar fashion. In this case the software package responsible for the execution of the Program within the WfMS is specified in line 15.

3.1.2 Port class

A Port enables a Program to send or receive Entity items (Data, Visualization, or Document instances). When semantically typed as an input port using the provone:hasInPort object property, the Port may also be further described with default parameters using the provone:hasDefaultParam object property. Ports that are semantically typed as output ports using the provone:hasOutPort object property may be connected to input ports using the Channel class.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#Port

has super-class

is in domain of

is in range of
Example 2

The following RDF fragment specifies a Port identified within the RDF document by p1_ip1.

        1    :p1_ip1
        2
        3       a provone:Port;
        4       dcterms:identifier "e1_ip1"^^xsd:string;
        5       dcterms:title "input_vars"^^xsd:string;
        6       wfms:signature "gov.llnl.uvcdat.cdms:CDMSVariable"^^xsd:string;
        7
        8    .
        
Again we make use of the Dublin Core Metadata Elements as well as of elements specific to an example placeholder WfMS. The Port is given an identifier and a descriptive title in lines 4 and 5, respectively. A signature denoting the type of data that the Port consumes is defined in line 6.

3.1.3 Channel class

The provone:Channel class provides a connection between Ports that are defined for Programs. Typically, a Program with an output Port defined using the provone:hasOutPort object property connects to a Program with an input Port defined using the provone:hasInPort object property. The two Ports are connected using a Channel with the provone:connectsTo object property.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#Channel

has super-class is in range of
Example 4

The following RDF fragment specifies a Channel identified within the RDF document by p1_p2Ch.

          1    :p1_p2Ch
          2
          3       a provone:Channel;
          4       dcterms:identifier "e1_e2Ch"^^xsd:string;
          5
          6    .
          
The Channel is simply specified as such (line 3) given the string e1_e2Ch as an identifier (line 4).

3.1.4 Controller class

A Controller specifies a Program that controls other Programs under a particular model of computation.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#Controller

has super-class

is in domain of

is in range of
Example 5

The following RDF fragment specifies a Controller identified within the RDF document by p1_p2CL.

            1    :p1_p2CL
            2
            3       a provone:Controller;
            4       dcterms:identifier "e1_e2CL"^^xsd:string;
            5
            6    .
            
The Controller is given the string e1_e2CL as an identifier (line 4).

3.1.5 Workflow class

A Workflow is a distinguished Program, which indicates that is meant to represent a computational experiment in its entirety. It is also subject to versioning by prov:wasDerivedFrom through its super-class provone:Program.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#Workflow

has super-class

is in domain of

is in range of
Example 6

The following RDF fragment specifies a Workflow identified within the RDF document by workflow_1.

          1    :workflow_1
          2
          3       a provone:Workflow;
          4       dcterms:identifier "wf1"^^xsd:string;
          5       dcterms:title "ModelComparison"^^xsd:string;
          6
          7    .
          
The Workflow is given the string wf1 as an identifier in line 4. In addition, it is given the string ModelComparison as a descriptive title in line 5.

3.1.6 hasSubProgram object property

hasSubProgram specifies the recursive composition of Programs, a parent Program includes a child Program as part of its specification.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#hasSubProgram

has domain

has range
Example 8

The following RDF fragment illustrates the use of the hasSubProgram object property by extending Example 1 of the Program class.

        1   :top_program
        2
        3      a provone:Program;
        4      dcterms:identifier "main"^^xsd:string;
        5
        6   .
        7
        8   :top_program provone:hasSubProgram :program_1 .
        
A Program identified within the document as top_program, is given the identifier value main. Subsequently, in line 26 it is specified that the same Program top_program has as a sub-program the Program program_1, defined in Example 1.

3.1.7 controlledBy object property

controlledBy relates a Controller to its source Program.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#controlledBy

has domain

has range
Example 9

The following RDF fragment illustrates the use of the controlledBy object property by complementing Example 1 of the Program class and Example 5 of the Controller class.

        1
        2    :program_1 provone:controlledBy :p1_p2CL .
        3
        
Line 2 specifies that the Program program_1, defined in Example 1, is the source Program of the Controller p1_p2CL, defined in Example 5.

3.1.8 controls object property

controls relates a Controller to its destination Program.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#controls

has domain

has range
Example 10

The following RDF fragment illustrates the use of the controls object property by complementing Example 5 of the Controller class.

        1
        2    :program_2
        3
        4       a provone:Program;
        5       dcterms:identifier "e2"^^xsd:string;
        6       dcterms:title "TemporalStatistics"^^xsd:string;
        7    .
        8
        9    :p1_p2CL provone:controls :program_2 .
        10
        
A Program identified within the document as program_2, is given the identifier value e2 and TemporalStatistics as its title. Line 9 states that the Controller p1_p2CL, defined in Example 5, has as its destination the Program program_2.

3.1.9 hasInPort object property

hasInPort specifies the Ports of a particular Program that are used as input ports.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#hasInPort

has domain

has range
Example 11

The following RDF fragment illustrates the use of the hasInPort object property by complementing Example 1 of the Program class and Example 2 of the Port class.

        1
        2    :program_1 provone:hasInPort :p1_ip1 .
        3
        
Line 2 specifies that the Program program_1, defined in Example 1, has an input Port p1_ip1, defined in Example 2.

3.1.10hasOutPort object property

hasOutPort specifies the Ports of a particular Program that are used as output ports.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#hasOutPort

has domain

has range
Example 12

The following RDF fragment illustrates the use of the hasOutPort object property by complementing Example 1 of the Program class and the Port class.

        1
        2    :program_1 provone:hasOutPort :p1_op1 .
        3
        4    :p1_op1 a provone:Port .
        
Line 2 specifies that the Program program_1, defined in Example 1, has a Port p1_op1.

3.1.11 hasDefaultParam object property

hasDefaultParam specifies that a given input Port has a certain Entity item as a default parameter (usually Data). This enables the pre-configuration of executable Workflow instances with zero or more parameters.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#hasDefaultParam

has domain

has range
Example 13

The following RDF fragment illustrates the use of the hasDefaultParam object property by complementing Example 2 of the Port class and Example 24 of the Data class.

        1
        2    :p1_ip1 provone:hasDefaultParam :data1 .
        3
        
Line 2 specifies that the Port p1_ip1, defined in Example 2 and associated in Example 11 with Program program_1 of Example 1, has as a default parameter the Data item data1, defined in Example 40.

3.1.12 connectsTo object property

connectsTo specifies the Channel that the given Port(s) connect to, typically with an output Port connected to an input Port.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#connectsTo

has domain

has range
Example 14

The following RDF fragment illustrates the use of the connectsTo object property.

          1    :pmain_ip1
          2       a provone:Port;
          3       dcterms:identifier "e1_ip1"^^xsd:string;
          4    .
          5
          6    :ch1
          7       a provone:Channel;
          8       dcterms:identifier "pmain_ch1"^^xsd:string;
          9    .
          10
          11   :pmain_ip1 provone:connectsTo :ch1 .

          
First, an input Port pmain_ip1is defined in lines 1-4. Lines 6-9 specify the Channel ch1, which is then is used to associate the input Port and the Channel using the connectsTo statement in line 11.

3.1.13wasDerivedFrom object property

prov:wasDerivedFrom is adopted in ProvONE, in relation to workflow structure, to describe the evolution of programs and workflows.

IRI:http://www.w3.org/ns/prov#wasDerivedFrom

has domain

has range
Example 15

The following RDF fragment illustrates the use of the wasDerivedFrom object property by extending Example 6 of the Workflow class.

        1
        2    :workflow_1update1
        3
        4       a provone:Workflow;
        5       dcterms:identifier "wf1upd1"^^xsd:string;
        6       dcterms:title "ModelComparison"^^xsd:string;
        7
        8    :workflow_1update1 prov:wasDerivedFrom :workflow_1 .
        9
        
First, a Workflow workflow_1update1 is defined beginning at line 2. Line 8 specifies that Workflow workflow_1update1 was derived from Workflow workflow_1 of Example 6, which implies that it is a new version and the result of workflow evolution. Hence it is given the same title.

3.2 ProvONE Trace Specification

3.2.1 Execution class

An Execution represents the execution of a Program. If the Program in question is a Workflow, then the Execution represents a trace of its execution.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#Execution

has super-class

is in domain of

is in range of
Example 20

The following RDF fragment specifies a Execution identified within the RDF document by program_1ex1.

            1    :program_1ex1
            2
            3       a provone:Execution;
            4       dcterms:identifier "e1_ex1"^^xsd:string;
            5       prov:startTime "2013-08-21 13:37:53"^^xsd:string;
            6       prov:endTime "2013-08-21 13:37:53"^^xsd:string;
            7       wfms:cached "0"^^xsd:integer;
            8       wfms:completed "1"^^xsd:integer;
            9
            10   .
            
An Execution is created with the string e1_ex1 as an identifier. In addition, timestamps denoting the moment in time at which the execution begins, and then is completed, are specified through the prov:startedAtTime and prov:endedAtTime data properties, respectively. In addition, the value 0 in line 7 indicates that the result was not obtained from a cache, while the 1 value in line 8 indicates that the execution was completed successfully.

3.2.2 Association class

The prov:Association class is adopted directly from the PROV Ontology, and is an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity. It further allows for a plan to be specified, which is the plan intended by the agent to achieve some goals in the context of this activity.

IRI:http://www.w3.org/ns/prov-o#Association

has super-class

is in domain of

is in range of
Example 21

The following RDF fragment specifies a Association identified within the RDF document by association_1.

            1    :association_1
            2
            3       a prov:Association;
            4       prov:hadPlan :program_1;
            5    .
            
First, an Association association_1 is defined beginning at line 1. Line 4 specifies that Association association_1 had Program program_1 defined in Example 1 as a plan.

3.2.3 Usage class

The prov:Usage class is adopted directly from the PROV Ontology, and is the beginning of utilizing an entity by an activity. Before usage, the activity had not begun to utilize this entity and could not have been affected by the entity.

IRI:http://www.w3.org/ns/prov-o#Usage

has super-class

is in domain of

is in range of
Example 22

The following RDF fragment specifies a Usage identified within the RDF document by usage_1.

            1    :usage_1
            2
            3       a prov:Usage;
            4       prov:used dataSetA;
            5       provone:hadEntity dataSetA;
            6   .
            7   :dataSetA a prov:Entity .
            
A Usage usage_1 specifies that an Entity dataSetA is used at line 4.

3.2.4 Generation class

The prov:Generation class is adopted directly from the PROV Ontology, and is the completion of production of a new entity by an activity. This entity did not exist before generation and becomes available for usage after this generation.

IRI:http://www.w3.org/ns/prov-o#Generation

has super-class

is in domain of

is in range of
Example 23

The following RDF fragment specifies a Generation identified within the RDF document by generation_1.

            1   :generation_1
            2
            3       a prov:Generation;
            4       prov:wasGeneratedBy :program_1ex1;
            5       provone:hadEntity :dataSetB;
            6   .
            7   :dataSetB a prov:Entity .
            
A Generation generation_1 is simply defined beginning at line 1. generation_1 was generated by Execution program_1ex1 defined in Example 20 at line 4 and had an EntitydataSetB at line 5.

3.2.5 User class

A User is the person(s) responsible for the execution of an Execution. Its specification serves attribution and accountability purposes.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#User

has super-class

is in range of
Example 24

The following RDF fragment specifies a User identified within the RDF document by user_1.

            1    :user_1
            2
            3       a prov:Agent;
            4       dcterms:identifier "user_eg_1"^^xsd:string;
            5    .
            
A User user_1 is created and given the string user_eg_1 as an identifier.

3.2.6 used object property

prov:used is adopted in ProvONE to state that an Execution made use of a particular Entity item as input for its execution.

IRI:http://www.w3.org/ns/prov#used

has domain

has range
Example 25

The following RDF fragment illustrates the use of the used object property by complementing Example 8 of the Execution class and Example 40 of the Data class.

        1
        2    :program_1ex1 prov:used :data1 .
        3
        
Line 2 specifies that Execution program_1ex1 of Example 20 used as an input Data item data1 of Example 26.

3.2.7 wasGeneratedBy object property

prov:wasGeneratedBy is adopted in ProvONE to state that an Execution produced a particular Entity item as output with its execution.

IRI:http://www.w3.org/ns/prov#wasGeneratedBy

has domain

has range
Example 26

The following RDF fragment illustrates the use of the wasGeneratedBy object property by complementing Example 20 of the Execution class.

        1
        2    :data2
        3
        4       a provone:Data;
        5       dcterms:identifier "cdms1"^^xsd:string;
        6       rdfs:label "cdms_data"^^xsd:string;
        7       wfms:type "gov.llnl.uvcdat.cdms:CDMSVariable"^^xsd:string;
        8
        9    :data2 prov:wasGeneratedBy :program_1_ex1 .
        10
        
First, a Data item data2 is defined beginning at line 2. Line 9 specifies that the Data item data2 was produced as an output of Execution program_1ex1 of Example 20.

3.2.8 wasAssociatedWith object property

prov:wasAssociatedWith is adopted in ProvONE to state that a User was associated with a particular Execution. This serves as an assignment of attribution and responsibility.

IRI:http://www.w3.org/ns/prov#wasAssociatedWith

has domain

has range
Example 27

The following RDF fragment illustrates the use of the wasAssociatedWith object property by complementing Example 20 of the Execution class.

        1    :program_1ex1 prov:wasAssociatedWith :user_1 .
        
Line 1 specifies that the User user_1 was associated with Execution program_1ex1 of Example 20.

3.2.9 wasInformedBy object property

prov:wasInformedBy is adopted in ProvONE to state that an Execution communicates with another Execution through an output-input relation, and thereby triggers its execution.

IRI:http://www.w3.org/ns/prov#wasInformedBy

has domain

has range
Example 28

The following RDF fragment illustrates the use of the wasInformedBy object property by complementing Example 8 of the Execution class.

        1
        2    :program_2ex1
        3
        4      a provone:Execution;
        5      dcterms:identifier "e2_ex1"^^xsd:string;
        6      prov:startTime "2013-08-21 13:37:54"^^xsd:string;
        7      prov:endTime "2013-08-21 13:37:54"^^xsd:string;
        8      wfms:cached "0"^^xsd:integer;
        9      wfms:completed "1"^^xsd:integer;
        10
        11   .
        12
        13   :program_2ex1 prov:wasInformedBy :program_1ex1 .
        14
        
First, an Execution program_2ex1 is defined beginning at line 2. Line 13 specifies that Execution program_2ex1 defined previously received data from Execution program_1ex1 of Example 20.

3.2.10 wasPartOf object property

wasPartOf nables the specification of the structure of Execution instances in that a parent Execution (associated with a Workflow) has child Executions (associated with Programs and subworkflows).

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#wasPartOf

has domain

has range
Example 29

The following RDF fragment illustrates the use of the wasPartOf object property by complementing Example 6 of the Workflow class and Example 20 of the Execution class.

        1
        2    :workflow_1ex1
        3
        4      a provone:Execution;
        5      dcterms:identifier "wf1_ex1"^^xsd:string;
        6      prov:startTime "2013-08-21 13:37:54"^^xsd:string;
        7      prov:endTime "2013-08-21 13:37:59"^^xsd:string;
        8      wfms:completed "1"^^xsd:integer;
        9
        10   .
        11
        12   :workflow_1ex1 prov:wasAssociatedWith :workflow_1  .
        13
        14   :program_1ex1 provone:wasPartOf :workflow_1ex1 .
        15
        
First, an Execution workflow_1ex1 is defined beginning at line 2 and associated with Workflow workflow_1 of Example 6 in line 12. Line 14 specifies that Execution program_1ex1 of Example 20 is part of Execution workflow_1ex1.

3.2.11 prov:qualifiedAssociation object property

The prov:qualifiedAssociation object property is adopted directly from the PROV Ontology, and is cited as an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity. It further allows for a plan to be specified, which is the plan intended by the agent to achieve some goals in the context of this activity. In the case of ProvONE, a User (Agent) is responsible for the Execution (Activity) with a specified Program or Workflow (Plan).

IRI:http://www.w3.org/ns/prov-o#qualifiedAssociation

has domain

has range
Example 30

The following RDF fragment illustrates the use of the qualifiedAssociation object property by complementing Example 6 of the Workflow class and Example 20 of the Execution class.

        1    :program_1ex1
        2        prov:qualifiedAssociation [
        3            a prov:Association;
        4            prov:agent :user_1;
        5            prov:hadPlan :program_1;
        6            rdfs:comment "user_1 created this association.";
        7        ]
        8    .
        
This example is complementary to the Execution program_1ex1 defined in Example 20. Line 2 specifies that program_1ex1 has a qualified association with a plan Program program_1, defined in Example 1. Then, program_1ex1 is assocated with a User user_1.

3.2.12 prov:agent object property

The prov:agent object property is adopted directly from the PROV Ontology, and is cited as a property that references an prov:Agent which influenced a resource. This property applies to an prov:AgentInfluence, which is given by a subproperty of prov:qualifiedInfluence from the influenced prov:Entity, prov:Activity or prov:Agent. In ProvONE, a User (Agent) influences the Execution (Activity).

IRI:http://www.w3.org/ns/prov-o#p_agent

has domain

has range
Example 31

The following RDF fragment illustrates the use of the agent object property by complementing Example 21 of the Association class.

        1    :association_1 prov:agent foo_University .
        2
        3    foo_University a prov:Organization, prov:Agent .
        
The Association asociation_1 is defined to have an agent foo_University which is an Organization.

3.2.13 prov:hadPlan object property

The prov:hadPlan object property is adopted directly from the PROV Ontology, and is cited as a property that references an optional Plan adopted by an Agent in Association with some Activity. In ProvONE, a User (Agent) adopts a Plan (Program or Workflow) in Association with an Execution (Activity).

IRI:http://www.w3.org/ns/prov-o#hadPlan

has domain

has range
Example 32

The following RDF fragment illustrates the use of the hadPlan object property by complementing Example 21 of the Association class and Example 6 of the Workflow class.

        1    :association_1 provone:hadPlan :program_1 .
        
Line 1 specifies that the Association association_1 has Program program_1 as a plan.

3.2.14 prov:qualifiedUsage object property

The prov:qualifiedUsage object property is adopted directly from the PROV Ontology, and is cited as a qualification of how an Activity used an Entity. In ProvONE, an Execution (Activity) uses an Entity (Data, Visualization, Document) as input to the process.

IRI:http://www.w3.org/ns/prov-o#qualifiedUsage

has domain

has range
The prov:qualifiedGeneration object property is adopted directly from the PROV Ontology, and is cited as a qualification of how an Activity generated an Entity. In ProvONE, an Execution (Activity) generates an Entity (Data, Visualization, Document) as output of the process.

IRI:http://www.w3.org/ns/prov-o#qualifiedGeneration

has domain

has range
Example 33

The following RDF fragment illustrates the use of the qualifiedUsage object property by complementing Example 20 of the Execution class.

        1    :program_1ex1
        2        prov:qualifiedUsage [
        3            a prov:Usage;
        4            prov:entity :dataSetA;
        5        ]
        6    .
        
Line 2 defines a qualified usage with an Entity dataSetA for the Execution program_1ex1 in Example 20.

3.2.15 hadInPort object property

hadInPort specifies the Port of a particular Execution that was used as input ports, described in a given Usage.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#hadInPort

has domain

has range
Example 34

The following RDF fragment illustrates the use of the hadInPort object property by complementing Example 22 of the Usage class.

        1    :usage_1 provone:hadInPort :p1_ip1 .
        
Line 2 specifies that the Usage usage_1, defined in Example 22, has an input Port p1_ip1, defined in Example 2.

3.2.16 hadEntity object property

hadEntity specifies the Entity of a particular Execution that were used as input ports, described in a given Usage.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#hadEntity

has domain

has range
Example 35

The following RDF fragment illustrates the use of the hadOutPort object property by complementing Example 23 of the Generation class and Example 22 of the Usage class..

        1    :generation_1 provone:hadEntity :dataSetC .
        2    :usage_1 provone:hadEntity :dataSetD .
        3
        4    :dataSetC a prov:Entity .
        5    :dataSetD a prov:Entity .
        
Line 1 specifies the Generation generation_1 has an Entity dataSetC. Line 2 specifies the Usage usage_1 has an entity dataSetD.

3.2.17 prov:qualifiedGeneration object property

The prov:qualifiedGeneration object property is adopted directly from the PROV Ontology, and is cited as a qualification of how an Activity generated an Entity. In ProvONE, an Execution (Activity) generates an Entity (Data, Visualization, Document) as output of the process.

IRI:http://www.w3.org/ns/prov-o#qualifiedGeneration

has domain

has range
Example 36

The following RDF fragment illustrates the use of the qualifiedGeneration object property by complementing Example 20 of the Execution class.

        1    :program_1ex1
        2        prov:qualifiedGeneration [
        3            a prov:Generation;
        4            prov:atTime "2013-08-21 13:37:53"^^xsd:string;
        5        ]
        6    .
        
Line 2 defines that the Execution program_1ex1 in Example 20 has a qualified generation with a timestamp "2013-08-21 13:37:53".

3.2.18 hadOutPort object property

hadOutPort specifies the Port of a particular Execution that was used as an output port, described in a given Generation.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#hadOutPort

has domain

has range
Example 37

The following RDF fragment illustrates the use of the hadOutPort object property by complementing Example 23 of the Generation class.

        1    :generation_1 provone:hadOutPort :p1_op1 .
        
Line 1 specifies the Generation generation_1 has a Port p1_op1 defined in Example 12 as an output.

3.3 ProvONE Data Structure Specification

3.3.1 Entity class

The prov:Entity class is adopted directly from the PROV Ontology, and is a physical, digital, conceptual, or other kind of thing with some fixed aspects; entities may be real or imaginary. In ProvONE, an Entity may be instances of the Data, Visualization, or Document classes, or other subclasses of the prov:Entity class.

IRI:http://www.w3.org/ns/prov-o#Entity

is in domain of

is in range of

3.3.2 Collection class

Instead of specifying a new class or subclass, we adopt explicitly as part of the ProvONE model PROV's prov:Collection class, whose description we cite below.

A Collection is an entity that provides a structure to some constituents, which are themselves entities. These constituents are said to be member of the collections.

IRI:http://www.w3.org/ns/prov#Collection

Example 39

The following RDF fragment specifies a Collection identified within the RDF document by col1.

       1    :col1
       2
       3       a prov:Collection;
       4       dcterms:identifier "inputset1"^^xsd:string;
       5
       6    .
       7
            
A Collection is created with the string inputset1 as an identifier.

3.3.3 Data class

A Data item represents the basic unit of information consumed or produced by a Program. Multiple Data items may be grouped into a Collection.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#Data

has super-class
Example 40

The following RDF fragment specifies a Data item identified within the RDF document by data1.

       1    :data1
       2
       3       a provone:Data;
       4       dcterms:identifier "defparam1"^^xsd:string;
       5       rdfs:label "filename"^^xsd:string;
       6       prov:value "DLEM_NEE_onedeg_v1.0nc"^^xsd:string;
       7       wfms:type "edu.sci.wfms.basic:File"^^xsd:string;
       8
       9    .
       10
            
A Data item is created with the string defparam1 as an identifier. It is also given the descriptive string filename through the rdfs:label data property. The prov:value data property specifies the actual value of the data item, namely DLEM_NEE_onedeg_v1.0nc. Finally, the type of the data item as defined by the WfMS is specified in line 7 to be edu.sci.wfms.basic:File.

3.3.4 Visualization class

A Visualization item represents a basic unit of information consumed or produced by a Program, in the form of a digital visual represention. Multiple Visualization items may be grouped into a Collection.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#Visualization

has super-class

3.3.5 Document class

A Document item represents a body of information produced as a result of an Execution, in the form of a communication medium. Multiple Document items may be grouped into a Collection. A Document may, for instance, be a scholarly journal article or government report generated as a result of running a particular Execution of a Program.

IRI:http://purl.dataone.org/provone/2015/01/15/ontology#Document

has super-class

3.3.6 wasDerivedFrom object property

prov:wasDerivedFrom is adopted in ProvONE, in relation to data structure, to describe dependencies between the Data items produced during workflow execution.

IRI:http://www.w3.org/ns/prov#wasDerivedFrom

has domain

has range

The following RDF fragment illustrates the use of the wasDerivedFrom object property by extending Example 40 of the Data class.

      1
      2    :data2
      3
      4       a provone:Data;
      5       dcterms:identifier "defparam1"^^xsd:string;
      6       rdfs:label "filename"^^xsd:string;
      7       prov:value "DLEM_NEE_onedeg_v1.0nc"^^xsd:string;
      8       wfms:type "edu.sci.wfms.basic:File"^^xsd:string;
      9
      10   .
      11
      12   :data2 prov:wasDerivedFrom :data1 .
      13
            
First, a Data item data2 is defined beginning at line 2. Line 12 specifies that the Data item data2 was produced from Data item data1 of Example 40.

3.3.7 hadMember object property

prov:hadMember is adopted in ProvONE, in relation to data structure, to specify the Data items that form part of a Collection.

IRI:http://www.w3.org/ns/prov#hadMember

has domain

has range
Example 44

The following RDF fragment illustrates the use of the hadMember object property by extending Example 39 of the Collection class.

      1
      2    :infile1
      3
      4       a provone:Data;
      5       dcterms:identifier "data_file1"^^xsd:string;
      6       rdfs:label "file1"^^xsd:string;
      7       prov:value "file1.dat"^^xsd:string;
      8       wfms:type "edu.sci.wfms.basic:File"^^xsd:string;
      9
      10   .
      11
      12   :infile2
      13
      14       a provone:Data;
      15       dcterms:identifier "data_file2"^^xsd:string;
      16       rdfs:label "file2"^^xsd:string;
      17       prov:value "file2.dat"^^xsd:string;
      18       wfms:type "edu.sci.wfms.basic:File"^^xsd:string;
      19
      20   .
      21
      22   :col1 prov:hadMember :infile1 .
      23
      24   :col1 prov:hadMember :infile2 .
      25
            
Two Data items infile1 and infile2 are defined in lines 2-20. Line 22 specifies that Data item infile1 was a member of Collection item col1 of Example 39. Analogously, line 24 specifies Data item infile2 also as part of Collection item col1.

A. References

[BKG+13]
Khalid Belhajjame, Graham Klyne, Daniel Garijo, Oscar Corcho, Esteban García-Cuesta, and Raul Palma. Wf4ever Research Object Model. 20 August 2013. URL: http://wf4ever.github.io/ro/
[CSdO+13]
Flavio Costa, Vítor Silva, Daniel de Oliveira, Kary Ocaña, Eduardo Ogasawara, Jonas Dias, and Marta Mattoso. Capturing and Querying Workflow Runtime Provenance with PROV: a Practical Approach. In Proceedings of the Joint EDBT/ICDT 2013 Workshops, EDBT'13, pages 282-289, New York, NY, USA, 2013. ACM.
[DC-RDF]
Dublin Core Metadata Initiative. DCMI term declarations represented in RDF schema language. 2012 URL: http://dublincore.org/schemas/rdfs/
[BIBO]
The Bibliographic Ontology. BIBO term declarations represented in RDF schema language. 2014 URL: http://bibliontology.com/
[FSC+06]
Juliana Freire, Cláudio T. Silva, Steven P. Callahan, Emanuele Santos, Carlos E. Scheidegger, and Huy T. Vo. Managing Rapidly-Evolving Scientific Workflows. In Proceedings of the 2006 international conference on Provenance and Annotation of Data, IPAW'06, pages 10-18, Berlin, Heidelberg, 2006. Springer-Verlag.
[MCF+11]
Luc Moreau, Ben Clifford, Juliana Freire, Joe Futrelle, Yolanda Gil, Paul Groth, Natalia Kwasnikowska, Simon Miles, Paolo Missier, Jim Myers, Beth Plale, Yogesh Simmhan, Eric Stephan, and Jan Van den Bussche. The Open Provenance Model Core Specification (v1.1). Future Gener. Comput. Syst., 27(6):743-756, June 2011.
[MDB+13]
Paolo Missier, Saumen Dey, Khalid Belhajjame, Víctor Cuevas-Vicenttín, and Bertram Ludäscher. D-PROV: Extending the PROV Provenance Model with Workflow Structure. In Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance, TaPP '13, pages 9:1-9:7, Berkeley, CA, USA, 2013. USENIX Association.
[OWL2]
World Wide Web Consortium (W3C). OWL 2 Web Ontology Language Document Overview (Second Edition). W3C Recommendation 11 December 2012, URL: http://www.w3.org/TR/owl2-overview/
[PROV]
World Wide Web Consortium (W3C). PROV-Overview: An Overview of the PROV Family of Documents. W3C Working Group Note 30 April 2013, URL: http://www.w3.org/TR/prov-overview/
[PROVO]
World Wide Web Consortium (W3C). PROV-O: The PROV Ontology. W3C Recommendation 30 April 2013, URL: http://www.w3.org/TR/prov-o/
[RDF-CONCEPTS]
Graham Klyne; Jeremy J. Carroll.Resource Description Framework (RDF): Concepts and Abstract Syntax.10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210
[RDF-SCHEMA]
Dan Brickley; Ramanathan V. Guha. RDF Vocabulary Description Language 1.0: RDF Schema.10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-schema-20040210
[UML]
Object Management Group. Unified Modeling Language: Superstructure. version 2.0, 2005 URL: http://www.omg.org/spec/UML/2.0/Superstructure/PDF/
[XMLSCHEMA11-2]
Henry S. Thompson et al. W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes. 5 April 2012. W3C Recommendation. URL: http://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/
[ZWF06]
Yong Zhao, Michael Wilde, and Ian Foster. Applying the Virtual Data Provenance Model. In Proceedings of the 2006 international conference on Provenance and Annotation of Data, IPAW'06, pages 148-161, Berlin, Heidelberg, 2006. Springer-Verlag.