updated schema documentation

dc330160 · Manuel · 5adcbcc4 · dc330160
Commit dc330160 authored Oct 28, 2020 by Manuel
Hide whitespace changes
Inline Side-by-side

Showing with 53 additions and 25 deletions

schema_information.md documentation/schema_information.md +53 -25

No files found.
--- a/documentation/schema_information.md
+++ b/documentation/schema_information.md
@@ -4,39 +4,51 @@ The data in Articonf is organized in Use-Cases, Tables, Layers and Nodes. That w

 # Trace

-A trace is an incoming piece of data. It is a JSON encoded object that can be nested arbitrarily deep. This raw datapoint is the input in Articonf.
+A trace is an incoming piece of data. It is a JSON encoded object that can be nested arbitrarily deep. This raw datapoint is the input in Articonf. There are no rules to the format of the document except that it has to be an object ```{}``` and must not be a list ```[]```.

 __Example:__ A running example will be a pizza shop, that provides its data to Articonf. The shop serves different pizzas and offers a loyalty program where he needs to protocol customers at every visit.

 A trace for the shop could look like this: 

-    { 
-        "type": "blockchain-transaction",
-        "content": {
-            "ApplicationType": "pizzashop",
-            "docType": "pizza",
-            "name": "Margherita",
-            "description": "A pizza containing mozzarella, basil and tomato sauce.",
-            "dough": {
-                "type": "wheat",
-                "cheese": False
-            },
-            "sauces": [
-                "tomato",
-                "chilli oil"
-            ]
-        }
-    }
+    {
+		"type": "blockchain-transaction",
+		"ApplicationType": "pizzashop",
+		"docType": "pizza",
+		"id": 1,
+		"name": "Diavolo",
+		"dough": {
+			"type": "wheat",
+			"spinach": false
+		},
+		"toppings": [
+			{
+				"name": "Tomato Sauce",
+				"price": 1.00
+			},
+			{
+				"name": "Cheese",
+				"price": 0.50
+			},
+			{
+				"name": "Chilli Oil",
+				"price": 0.50
+			},
+			{
+				"name": "Peppers",
+				"price": 1.50
+			}
+		]
+	}

 # Use-Cases

-Each trace is assigned to exactly one Use-Case, a label which assigns the piece of data to a Third-Party data source. 
+Each trace is assigned to exactly one Use-Case, a label which assigns the piece of data to a Third-Party data source. This Use-Case has to be unique in the system, it is not allowed that there exist two different Use-Cases with the same label.

-__Example:__ To label each datum, the Use-Case "pizzashop" is added to the system, and each trace has to contain this string identifier with the key ```ApplicationType``` (see example in "Traces").
+__Example:__ The field ```ApplicationType``` serves as the Use-Case of trace. In the example above, that would be ```pizzashop```.

 # Tables

-Tables describe how the incoming data is built up and how it is flattened for further processing. One table describes one kind of incoming datum. The incoming JSON data can be nested arbitrarily deep, the job of the table is to *flatten* it in a way, that the data is no longer nested. Each trace has to contain the string identifier of the table with the key ```docType``` (see example in "Traces").
+Tables describe how incoming data gets translated into flattened objects for further processing. One table is used for one kind of incoming datum and one Use-Case can have multiple tables. The incoming JSON data can be nested arbitrarily deep, the job of the table is to *flatten* it in a way, that the data is no longer nested. In the trace, the field ```docType``` serves as the identifier for the table used.

 The flattening of a trace with a table results in an object that is no longer nested and that does not contain any lists. Each table must also contain a mapping with the key ```UniqueID```. This property is later used to uniquely identify a trace. The value obtained in this field is [hashed](https://en.wikipedia.org/wiki/Hash_function) afterwards, resulting in an seemingly random string that identifies a trace.

@@ -46,9 +58,9 @@ To describe the flattening of nested JSON data, a simple syntax was implemented

 - __//__ ... With the // operator, a path through a nested object can be described. ```"doughType": "dough//type"``` means that the flattened object will contain a property called "doughType" which gets filled with the value ```type``` of the ```dough``` object in the trace.

- __+__ ... The + Operator concatenates values from the original trace. The datatype of a property including the + operator will automatically become ```String```. ```"fullName": "name+description"``` means that the flattened object will contain a property called ```fullName``` which is the concatenation of the values ```name``` and ```description``` in the trace.
+- __+__ ... The + Operator concatenates values from the trace. The datatype of a property including the ```+``` operator will automatically become ```String```. ```"fullName": "name+description"``` means that the flattened object will contain a property called ```fullName``` which is the concatenation of the values ```name``` and ```description``` in the trace.

- __[0]__ ... The index operator is used to access an element in a list. ```"firstSauce": "sauces[0]"``` means that the flattened object will contain a property called ```firstSauce``` which is the first entry of the list ```sauces``` in the trace. The ```0``` here can be replaced with any non-negative integer.
+- __[0]__ ... The index operator is used to access an element in a list. ```"firstTopping": "toppings[0]//name"``` means that the flattened object will contain a property called ```firstTopping``` which is the ```name``` property of the first entry of the list ```toppings``` in the trace. The ```0``` here can be replaced with any valid numeric python list index.

 __Example:__ The pizza shop has two data kinds: The pizza orders and customer visits. So the shop will add 2 tables, which are called "pizza" and "customer".

@@ -62,7 +74,7 @@ The table could look like this:
            "name": "name",
            "doughType": "dough//type",
            "fullName": "name+description",
-            "firstSauce": "sauces[0]"
+            "firstTopping": "toppings[0]//name"
        }
    }

@@ -91,4 +103,20 @@ Here, ```UniqueID``` and ```name``` were selected as attributes for the layer, b

 # Nodes

-Finally, a node is a data point which belongs to a layer. It contains only the data selected in the layer description. This is the most fine-grained result of the initial trace.
+Finally, a node is a data point which belongs to a layer. It contains the data from the trace that is selected by the overlaying layer. This is the most fine-grained result of the initial trace.
+
+
+__Example:__ The node for the initial trace of the pizzashop and the layer ```name_layer``` above would look like this:
+
+    {
+        "use_case": "pizzashop",
+        "layer": "layer_name",
+        "name": "pizza",
+        "mappings": {
+            "UniqueID": "name",
+            "name": "name",
+            "doughType": "dough//type",
+            "fullName": "name+description",
+            "firstTopping": "toppings[0]//name"
+        }
+    }
\ No newline at end of file