Kurator-FFDQ and FFDQ-API

From Kurator
Jump to: navigation, search

Conceptual Framework

This wiki page is dedicated to our example implementation of the fundamental and derived concepts from the Fitness For Use Framework described in the following publications:

  • A.K. Veiga, A.M. Saraiva, A.D. Chapman, P.J. Morris, C. Gendreau, D. Schigel, T.J. Robertson 2017 A conceptual framework for quality assessment and management of biodiversity data. PLOS One https://doi.org/10.1371/journal.pone.0178731

RDF Representation of FFDQ

Code for serializing FFDQ concepts as RDF can be found at: https://github.com/kurator-org/kurator-ffdq

This project also contains utilities as well as a set of example competency questions (SPARQL queries) and examples demonstrating use of the framework (Turtle and JSON-LD). See https://github.com/kurator-org/kurator-ffdq/tree/master/competencyquestions

Also see the project readme here for an overview of the project: https://github.com/kurator-org/kurator-ffdq/blob/master/README.md

Ontologies

Spreadsheet of standardized tests expressed in terms of FFDQ

Overview

The following is an example of a single row from the spreadsheet of standardized tests (linked above) produced by Task Group 2 of the TDWG Biodiversity Data Quality (BDQ) Interest Group (see: https://github.com/tdwg/bdq/blob/master/tg2/README.md)

Shown is a validation test that checks event date consistency with the verbatim event date field.

Standardized tests row.png

The diagram below provides an overview of how the example row above maps to concepts in FFDQ.

Ffdq singlerow.png

Below are additional subheadings with specific examples in the rdf turtle format describing each of the concepts in greater detail.

Specification

The specification serves as a technical desription of the assertion test implemented by some instance of the mechanism concept. For example:

   <urn:uuid:da63f836-1fc6-4e96-a612-fa76678cfd6a> a <ffdq:Specification> ;
       <rdfs:label> """If eventDate and verbatimEventDate are not 
                       empty, compare the values and assert Compliant 
                       if the two represent the same data or date range.""" .

Criterion

Describes the criterion a Validation test uses to determine compliance. For example:

   <urn:uuid:8ab8f26e-c541-4d2b-8a38-91a7d8149937> a <ffdq:Criterion> ;
       <rdfs:label> "eventDate and verbatimEventDate are consistent" .

InformationElement

The information element in FFDQ can be represented as a single or composite element that consists of one or more terms from a controlled vocabulary (fields actedUpon or consulted by an assertion test). For example:

   <urn:uuid:dfa97a58-2588-49cf-93f6-5174f1c11b0b> a <ffdq:InformationElement> ;
       <ffdq:composedOf> <dwc:eventDate> , <dwc:verbatimEventDate> .

ContextualizedCriterion

Describes an instance of the criterion concept in terms of the associated information elements from some controlled vocabulary (fields actedUpon or consulted), and a resource type of SingleRecord or MultiRecord. Based on the examples above, the concepts are linked together via:

   <urn:uuid:b52fe995-af4d-46c0-b1b2-b238a2e57728> a <ffdq:ContextualizedCriterion> ;
       <ffdq:hasCriterion> <urn:uuid:8ab8f26e-c541-4d2b-8a38-91a7d8149937> ;
       <ffdq:hasInformationElement> <urn:uuid:dfa97a58-2588-49cf-93f6-5174f1c11b0b> ;
       <ffdq:hasResourceType> <rt:SingleRecord> .

ValidationMethod

The ValidationMethod in FFDQ is a DQ Solutions level concept describing the relationship between a specification (technical description of a test) and a criterion in the context of resource type (SingleRecord or MultiRecord) and associated information elements. Based on the examples above, the concepts are linked together via:

   <urn:uuid:b31c335d-2426-45d5-bc94-0f4372c1101b> a <ffdq:ValidationMethod> ;
       <ffdq:criterionInContext> <urn:uuid:b52fe995-af4d-46c0-b1b2-b238a2e57728> ;
       <ffdq:hasSpecification> <urn:uuid:da63f836-1fc6-4e96-a612-fa76678cfd6a> .

Relationship between FFDQ concepts and PROV-O

In the current serialization, ffdq:Specification fits in with prov:Plan (See: https://www.w3.org/TR/prov-o/#Plan), an ffdq:Assertion is roughly equivalent to prov:Activity (See: https://www.w3.org/TR/prov-o/#Activity) and ffdq:DataResource corresponds to prov:Entity (See: https://www.w3.org/TR/prov-o/#Entity).

In the domain specific case our actors' expectations are that ffdq:DataResource is defined in terms of a dwc:Occurrence according to the (DWC RDF guide: http://rs.tdwg.org/dwc/terms/guides/rdf/index.htm). However, the framework defines ffdq:DataResource loosely as having properties from any controlled vocabulary.

The prov:Agent subclass https://www.w3.org/TR/prov-o/#SoftwareAgent also seems like a good fit for describing an ffdq:Mechanism for actors that implement the standardized tests (Java classes, Python modules). Since many of the concepts in PROV-O seem appropriate for describing relationships between some of the FFDQ concepts it seemed like a good fit for the FFDQ report concepts related to retrospective provenance.

Consumer expectations that aren't currently expressed in the RDFS/OWL

Some of the relationships that aren't clear from the RDFS alone are more explicitly defined in the Java bean implementation at the moment. The ffdq:Validation for example, has relationships defined via Java method annotations on the code here: https://github.com/kurator-org/kurator-ffdq/blob/master/src/main/java/org/datakurator/data/ffdq/model/report/Validation.java

Based on that Java bean, an ffdq:Validation is expected to have a property, ffdq:criterionInContext, and the rest of the relationships are described using properties from PROV-O (for subject prov:Activity).

Relationships that are currently missing from the FFDQ RDFS for subclasses of ffdq:Assertion are relationships such as:

  • prov:hadPlan has object of type ffdq:Specification (a subclass of prov:Plan)
  • prov:informedBy has object of type ffdq:Ammendment (a subclass of prov:Activity)
  • prov:used has object of type ffdq:DataResource (a subclass of prov:Entity in the general case and for our domain specific actors, expected to have properties appropriate for dwc:Occurrence according to the DWC RDF Guide)
  • etc.

DQ Report Concepts

Validation.png

Specification

The specification for the DAY_POSSIBLE_FOR_MONTH_YEAR test (guid can be mapped to Java method DwCEventDQ.isDayPossibleForMonthYear)

   <urn:uuid:5618f083-d55a-4ac2-92b5-b9fb227b832f> a ffdq:Specification ;
       rdfs:label "Check that the value of dwc:eventDate is consistent with the values for dwc:month and dwc:year. Requires valid values for month and year." .

Mechanism

Date validator mechanism (guid can be mapped to DwCEventDQ Java class from the event_date_qc project)

   <urn:uuid:b844059f-87cf-4c31-b4d7-9a52003eef84> a ffdq:Mechanism ;
       rdfs:label "org.filteredpush.qc.date.DwCEventDQ" .
   

Measure, Validation, Amendment

   <#validation> a ffdq:Validation ;
       ffdq:criterionInContext <#contextualized-criterion> ;
       prov:used <#data-resource> ;
       prov:generated <#validation-result> ;
       prov:hadPlan <urn:uuid:5618f083-d55a-4ac2-92b5-b9fb227b832f> ;
       prov:wasAttributedTo <urn:uuid:b844059f-87cf-4c31-b4d7-9a52003eef84> .

Result

ResultStatus for the Validation

   <#status-compliant> a ffdq:ResultStatus ;
       rdfs:label "COMPLIANT" .

The Result of running the validation contains a comment and ResultStatus (COMPLIANT)

   <#validation-result> a ffdq:Result ;
       ffdq:hasStatus <#status-compliant> ;
       rdfs:comment "Provided value for year-month-day 1974-2-12 parses to a valid day." .

DataResource

DataResource for a single occurrence record contains original values from the input file

   <#data-resource> a dwc:Occurrence ;
       dwc:day "12" ;
       dwc:month "2" ;
       dwc:year "1974" .

DQ Needs Concepts

Ffdq needs.png

UseCase

Example UseCase for validating internal consistency of dates

   <urn:uuid:dd78b90c-640f-4b9c-bece-564e525a43e0> a ffdq:UseCase ;
       rdfs:label "Check for internal consistency of dates" .

MeasurementPolicy, ValidationPolicy and AmendmentPolicy

The ValidationPolicy ties a UseCase to some Criterion by referencing an instance of ContextualizedCriterion

   <#validation-policy> a ffdq:ValidationPolicy ;
       ffdq:coversUseCase <urn:uuid:dd78b90c-640f-4b9c-bece-564e525a43e0> ;
       ffdq:criterionInContext <#contextualized-criterion> .

Implementation

Implementation describes the relationship between specification and mechanism (can be used to identify Java method and class)

   <#implementation> a ffdq:Implementation ;
       ffdq:hasSpecification <urn:uuid:5618f083-d55a-4ac2-92b5-b9fb227b832f> ;
       ffdq:implementedBy <urn:uuid:b844059f-87cf-4c31-b4d7-9a52003eef84> .

Dimension, Criterion, Enhancement

The Criterion for a validation test

   <#criterion> a <ffdq:Criterion> ;
       rdfs:label "Check if a value for day is consistent with a provided month and year." .

DQ Solutions Concepts

Ffdq solutions.png

ContextualizedDimension, ContextualizedCriterion and ContextualizedAmendment

ContextualizedCriterion describes Criterion in the context of a resource type (SingleRecord) and information elements (year, month, day)

   <#contextualized-criterion> a ffdq:ContextualizedCriterion ;
       ffdq:hasCriterion <#criterion> ;
       ffdq:hasInformationElement <#year> , <#month> , <#day> ;
       ffdq:hasResourceType <#single-record> .

ResourceType can be single or multi record

   <#single-record> a <http://example.com/rt/SingleRecord> .

The InformationElements describe how terms from a controlled vocabulary (darwin core) relate to fields acted upon by the test

   <#year> a ffdq:InformationElement ;
       ffdq:composedOf <dwc:year> , <dwc:month> , <dwc:day> .

ValidationMethod

ValidationMethod ties ContextualizedCriterion to a Specification for a test

   <#validation-method> a ffdq:ValidationMethod ;
       ffdq:hasContextualizedCriterion <#contextualized-criterion> ;
       ffdq:hasSpecification <urn:uuid:5618f083-d55a-4ac2-92b5-b9fb227b832f> .

FFDQ Annotated Java Classes

   @Provides("urn:uuid:da63f836-1fc6-4e96-a612-fa76678cfd6a")
   
   @Validation(label = "Event Date and Verbatim Consistent",
               description = "Test to see if the eventDate and verbatimEventDate are consistent.")
   
   @Specification("If a dwc:eventDate is not empty and the verbatimEventDate is not empty " +
                  "compare the value of dwc:eventDate with that of dwc:verbatimEventDate, " +
                  "and assert Compliant if the two represent the same date or date range.")
   
   public static EventDQValidation eventDateConsistentWithVerbatim(
           @ActedUpon(value = "dwc:eventDate") String eventDate,
           @ActedUpon(value = "dwc:verbatimEventDate") String verbatimEventDate) {
       
       EventDQValidation result = new EventDQValidation();
       
       // Actor logic here...
       result.setResult(EnumDQValidationResult.COMPLIANT);
       result.addComment("Provided value for eventDate '" + eventDate + "' represents the " +
                         "same range as verbatimEventDate '" + verbatimEventDate + "'.");
       
       return result;
   }
Ffdq annotated method.png

See also: FFDQ JSON example