Creating a machine learning model in Jube can be a convoluted process involving creating a model, specifying fields to be extracted, specifying tags and then loading data via HTTP endpoint, before being available for training in the embedded Exhaustive machine learning algorithm. The requirements contrast to products which can achieve the same through the application of a CSV file. It follows that despite having more advanced capabilities the adoption may be reduced to other products. While Jube was not designed as an automated machine learning Wizard, there appears increasing overlap
It is proposed that a Model Wizard be created to take a CSV file and parse the metadata and data itself, automatically creating all configuration elements that are otherwise created manually. The file will be parsed for its data to identify the universe of categorical variables, with these being created as Boolean XPath expressions (a process which currently is done typically outside of Jube).
Task: Ensure JSON Path Expression returns a Boolean value
As categorical data pivoting will be done in Jube, JSON Path must be available in the Request XPath Model Configuration to return based on Expression, for example, $.[?(@.=='Politician')].
Task: Create a new page to parse the CSV file
The new page called Model Wizard, existing under the Models menu item, will accept a CSV file as an upload and proceed to parse the headers. For each header the data will be inspected:
- Is all numeric, in which case will be treated as Float for the purpose of model configuration.
- Has the presence of string data, in which case will be treated as String for the purpose of model configuration.
In keeping with the stateless nature of the design, the parsing will be stored in tables in the database for recall by the user interface. At this stage, the model will not be created.
Task: Allocate Dependent Variable
With the metadata having been established, the page must accept further configuration parameters, specifically including the dependent variable, which will go on to be a tag value, corresponding Exhaustive Model and Activation Rule.
Task: Create Model
Based on metadata and configuration create the model in Jube comprising:
- Headers will be transposed to Request XPath configuration elements.
- For each String in Categorical variables the header will be transposed as an expression (i.e. Categorical Data Pivoting).
- For each String in the Categorical variable specified as Dependent Variable a Tag element will be created and;
- An Exhaustive configuration element will be created to target the Tag disposition for machine learning and;
- For good measure, an Activation Rule element will be created targeting the return value from Exhaustive models, where > 0.5 will drive activation. The Activation Rule is not strictly necessary as the Exhaustive recall values are available in their raw form on recall.
Task: Load Data from CSV into JSON for storage in the Archive
Transpose the CSV file to a JSON representation and store it in the Archive table which will make the data available for Exhaustive training.
Task: Synchronise Model
Insert data to cause the model to synchronise and thus start Exhaustive training.