danubePrediction
Once a rules-set is defined you can start using the danubePrediction endpoint of the danube.ai cloud API.
danubePrediction will send your dataset to danube.ai and returns the evaluation results including a new
scoring for columns and rows and a list of derived percentage matches per row.Request
Below, the danubePrediction request's structure is shown.body: {
"query": String,
"variables": {
"data": PredictionInputData
}
}
The query parameter specifies which GraphQL endpoint to call (see example).The PredictionInputData has the following structure:type PredictionInputData: {
"rulesId": String,
"data": String,
"searchData": String,
"initialColumnScores": [ColumnScore],
"strategy": String,
"mixFactor": Float,
"impact": Float
}
-
rulesId: The id of a rules-set.
-
data: A stringified Json-Array, holding all data elements (rows) as objects. You can see an example on how to encode your data as a json array here.
-
searchData: A stringified Json-Object with the same structure as a data element.
-
initialColumnScores: A list of initial Column Scores.
-
strategy: Defines the way data attributes are treated by the algorithm. The following options are available:
-
"exclusive": Rare data attributes tend to obtain more weight.
-
"fair": Overly rare or overly frequent data attributes lose weight.
-
"mixed": Mixes "exclusive" and "fair" strategy (see mixFactor).
-
"inverse": Inverse behavior of "fair" strategy.
-
mixFactor: The factor to mix exclusive and fair strategies (Float between 0 and 1; only for mixed strategy.):
0 (= exclusive) ----------x---------- 1 (= fair)
-
impact: Determines how strongly the initial column values are changed. n=1 means one run of the algorithm with small
changes to the initial values. Higher values of n mean iterative runs of the algorithm with stronger changes.
A ColumnScore has the following structure:type ColumnScore: {
"property": String,
"score": Float
}
-
property: The name of a property.
-
score: The property's score.
Response
A response from the danubePrediction endpoint has the following structure:{
"data": {
"danubePrediction": {
"newColumnScores": [ColumnScore],
"rowScores": [Float],
"rowMatches": [[Float]]
}
}
}
-
newColumnScores: A list of new Column Scores.
-
rowScores: A list of row scores (same ordering as the data elements in the request), each determined by danube.ai.
The row scores define the sorting (highest score = best row).
-
rowMatches: A list of row matches (same ordering as the data elements in the request), each being an array of
percentage values, describing how well a property value matches the best data value in this column.