Articonf microservices get accessed via [HTTPS](https://en.wikipedia.org/wiki/HTTPS), this enables an encrypted and secure communication. To provide such a secure channel, [certificates](https://en.wikipedia.org/wiki/Public_key_certificate) are required. With these certificates, a client can verify the identity of the server.
The certificates get passed to the Flask application in the main.py file of each microservice. The argument itself is a [vector](https://datascienceplus.com/vectors-and-arrays-in-python/) with 2 entries: the certificate itself (a .crt file) and a key (a .key file), both passed as paths. This vector gets passed to app.run(..) via the named argument ssl_context. For further information on how a certificate is passed to a Flask app, look into this guide on [how to enable HTTPS communication in Flask](https://blog.miguelgrinberg.com/post/running-your-flask-application-over-https).
![ssl_context is used to pass the certificate to the Flask app](images/app_run_with_ssl.png)
## Resources
Articonf uses static resources, which are not present in the Git-repository due to security reasons. These files are located in a dedicated folder where the application will look for them. The files needed are:
| articonf1.crt | certificate for HTTPS communication (certificate) |
| articonf1.key | certificate for HTTPS communication (key) |
| default_users.json | list of default users who are already existing in articonf |
| jwt_secret.txt | secret needed to sign the JWT tokens |
| regular_user_credentials.json | credentials of the user used for internal authentication between microservices |
If Articonf runs on the server, the folder ```/srv/articonf``` is used for this purpose. If no other options are set, this location gets used per default.
## Running Articonf Locally
To enable the local mode, set the environment variable ```ARTICONF_LOCAL``` to ```1```. With this setting, the local swagger configuration is used to serve the routes. With this configuration, __no token validation will be performed__ for a better debugging experience, every token gets accepted.
To use a different resource directory than ```/srv/articonf```, the environment variable ```ARTICONF_RESOURCES_PATH``` is used. This variable can hold a path that points to a custom resource directory. For example, if the resource folder is located in the project root and called ```resources```, the content of the variable for the trace-retrieval microservice would be ```'../../../../resources'``` (note that there is no trailing slash in the path).
The swagger-ui webinterface can accessed at [https://localhost:5000/api/ui](https://localhost:5000/api/ui) in local-mode. Chrome will report a security warning because the certificate is issued for ```articonf1.itec.aau.at``` and not for ```localhost```, so it is still safe to access the page.
## Authentication
![The can store an arbitrary amount of key-value pairs which are used in the backend](images/token_contents.png)
The token must be placed into the Authorization HTTP-Header. Note, that the keyword __Bearer__ must stay before the actual token. The meaning can be understood as “grant access to the bearer of the token”. Before the request gets executed at the server, the token is decoded and checked if it is still valid. If not, ```401 Unauthorized``` is returned. If the token is valid, the request gets processed.
![the token has to be placed in the "Authorization" HTTP header.](images/postman_request_with_token.png)
## JWT Tokens
A token has to have the following properties:
- __Integrity:__ manipulated tokens must be detected to avoid forging of tokens by third parties
- __User Authenticity:__ a token must be assignable to a user in the system
- __Server Authenticity:__ only the server must be able to issue tokens
Articonf uses the [JWT](https://jwt.io/) standard, because it provides all of these points and is implemented in a lot of different platforms which gives compatibility with other services. It allows to embed information as claims in the token which then can be decoded by anyone (no actual encryption of information happens). The token gets signed with a secret in order to provide Integrity and Server Authenticity. Each change in the payload of the token can then be detected as the signature cannot be reproduced.
## Roles
Currently, there are 2 roles in Articonf: User(u) and Administrator(a). The following matrix defines which role can access which request:
| | no authentication | User(u) | Administrator(a) |
The first thing a client needs to do is authenticating itself on the server. This is done by passing username and password of a user to the rest-gateways endpoint
```POST https://articonf1.itec.aau.at:30401/tokens```. If the credentials were correct, a JWT token is returned by the server, which can now be used to authenticate requests.
Each microservice keeps track of valid tokens. If it encounters a token that has not been checked, it forwards it to the rest-gateway (```POST https://articonf1.itec.aau.at/api/tokens/<TOKEN>```) which then checks the validity of the token (signature, timestamps) and returns this information to the requester. If the token was valid, the microservice adds it to its list of valid tokens, so future requests with the same token get processed without checking the token again.
There are scenarios where microservices must call each other. For this case, a helper class is implemented, which performs the necessary steps automatically. This class can be found at ```src/modules/security/token_manager.py```. The class is a singleton, so the ```getInstance()``` method gives access to the only available instance of ```TokenManager```, where the method ```getToken()``` can be used to retrieve a JWT token for internal requests. The roll of this user has regular User permissions.
```TokenManager``` uses the credentials stored in ```regular_user_credentials```.json to authenticate at the rest-gateway and caches the returned token, such that multiple requests to ```getToken()``` return the same one, which reduces traffic.
This microservice contains use-case specific informations, like schemas and contexts.
## Schema information
```GET https://articonf1.itec.aau.at:30420/api/use-cases/{use-case}/schema``` returns all schema mappings for the use-case identifier. The mapping is used to flatten nested input from the blockchain.
```GET https://articonf1.itec.aau.at:30420/api/use-cases/{use-case}/tables``` returns all schema mappings for the use-case identifier. The mapping is used to flatten nested input from the blockchain.
## Context information
```GET https://articonf1.itec.aau.at:30420/api/use-cases/{use-case}/layers``` returns all layers from the schema used for clustering interally.
# Semantic Linking Microservice
https://articonf1.itec.aau.at:30101/api/ui/
This microservice contains the nodes from the transactions preprocessed as defined in *Schema Information*. Additionally it splits the raw input into multipe layers.
```GET https://articonf1.itec.aau.at:30101/api/use-cases/{use-case}/nodes``` returns all preprocessed transactions, called nodes, before splitting them into layers.
# Role Stage Discovery Microservice
https://articonf1.itec.aau.at:30103/api/ui
This microservice contains the communities based on clusters and similarities between communities.
This microservice contains the communities based on clusters and similarities between communities. It additionally contains time slices with subsets of clusters, which's transaction happened in the corresponding time.
The endpoints are currently refactored, I will describe them once we are finished and processed data is available.
\ No newline at end of file
The endpoints are currently refactored, so please check the Swagger UI autogenerated documentation on its website.
The data in Articonf is organized in Use-Cases, Tables, Layers and Nodes. That way, a clear seperation of schema information and actual application data is ensured.
# Trace
A trace is an incoming piece of data. It is a JSON encoded object that can be nested arbitrarily deep. This raw datapoint is the input in Articonf. There are no rules to the format of the document except that it has to be an object ```{}``` and must not be a list ```[]```.
__Example:__ A running example will be a pizza shop, that provides its data to Articonf. The shop serves different pizzas and offers a loyalty program where he needs to protocol customers at every visit.
A trace for the shop could look like this:
{
"type": "blockchain-transaction",
"ApplicationType": "pizzashop",
"docType": "pizza",
"id": 1,
"name": "Diavolo",
"dough": {
"type": "wheat",
"spinach": false
},
"toppings": [
{
"name": "Tomato Sauce",
"price": 1.00
},
{
"name": "Cheese",
"price": 0.50
},
{
"name": "Chilli Oil",
"price": 0.50
},
{
"name": "Peppers",
"price": 1.50
}
]
}
# Use-Cases
Each trace is assigned to exactly one Use-Case, a label which assigns the piece of data to a Third-Party data source. This Use-Case has to be unique in the system, it is not allowed that there exist two different Use-Cases with the same label.
__Example:__ The field ```ApplicationType``` serves as the Use-Case of trace. In the example above, that would be ```pizzashop```.
# Tables
Tables describe how incoming data gets translated into flattened objects for further processing. One table is used for one kind of incoming datum and one Use-Case can have multiple tables. The incoming JSON data can be nested arbitrarily deep, the job of the table is to *flatten* it in a way, that the data is no longer nested. In the trace, the field ```docType``` serves as the identifier for the table used.
The flattening of a trace with a table results in an object that is no longer nested and that does not contain any lists. Each table must also contain a mapping with the key ```UniqueID```. This property is later used to uniquely identify a trace. The value obtained in this field is [hashed](https://en.wikipedia.org/wiki/Hash_function) afterwards, resulting in an seemingly random string that identifies a trace.
## Flattening Language
To describe the flattening of nested JSON data, a simple syntax was implemented comprising of the following operators:
- __//__ ... With the // operator, a path through a nested object can be described. ```"doughType": "dough//type"``` means that the flattened object will contain a property called "doughType" which gets filled with the value ```type``` of the ```dough``` object in the trace.
- __+__ ... The + Operator concatenates values from the trace. The datatype of a property including the ```+``` operator will automatically become ```String```. ```"fullName": "name+description"``` means that the flattened object will contain a property called ```fullName``` which is the concatenation of the values ```name``` and ```description``` in the trace.
- __[0]__ ... The index operator is used to access an element in a list. ```"firstTopping": "toppings[0]//name"``` means that the flattened object will contain a property called ```firstTopping``` which is the ```name``` property of the first entry of the list ```toppings``` in the trace. The ```0``` here can be replaced with any valid numeric python list index.
__Example:__ The pizza shop has two data kinds: The pizza orders and customer visits. So the shop will add 2 tables, which are called "pizza" and "customer".
The table could look like this:
{
"use_case": "pizzashop",
"name": "pizza",
"mappings": {
"UniqueID": "name",
"name": "name",
"doughType": "dough//type",
"fullName": "name+description",
"firstTopping": "toppings[0]//name"
}
}
The properties ```use_case``` and ```name``` indicate the Use-Case the table belongs to and the tables name. ```mappings``` is the map that indicates how the flattened object is constructed out of the trace, in this example the resulting object will have 4 properties.
# Layers
Once the data has been added to Articonf, the layers take *slices* out of the flattened traces. This is a preproccessing step needed in order to perform the clustering later. Each layers holds two lists: ```properties``` and ```cluster_properties```. The first one selects a set of properties from the underlying table which are the properties stored in the layer while the latter one marks a subset of these properties as *properties to be clustered after* later on.
__Example:__ For the ```pizza``` table, a layer was defined which looks like this:
{
"use_case": "pizzashop",
"table": "pizza",
"name": "name_layer",
"properties": [
"UniqueID",
"name",
],
"cluster_properties": [
"name"
],
}
Here, ```UniqueID``` and ```name``` were selected as attributes for the layer, but only ```name``` is a cluster attribute.
# Nodes
Finally, a node is a data point which belongs to a layer. It contains the data from the trace that is selected by the overlaying layer. This is the most fine-grained result of the initial trace.
__Example:__ The node for the initial trace of the pizzashop and the layer ```name_layer``` above would look like this: