Lightning AI Model Builder provides a guided, step-by-step approach to training a machine learning model without writing a line of code. Model Builder is built for healthcare-specific use cases, but you can use our hosted Jupyter notebook to build models that are trained on non-healthcare data.
To get started building your model in the UI, select build in the Lumiata AI Model Builder tool and then manage experiments. You can find previous experiments or start new ones here:
Each ML problem begins with an experiment. An experiment is a workspace where you can try different configurations of your pipelines. You can use experiments like folders to organize your runs into logical groups.
Once you have created an experiment you can build your first machine learning model; we call this a run.
Within an experiment, create multiple runs with different configurations, time slices, features and parameters to see which run generates the best results. Runs are a collection of configurations attempting to optimally predict something (the target variable) from data. The end result of a run will be a model with certain performance metrics, telling you how well this particular collection of parameters does at predicting ‘the target’.
To get started, click on New Run in the top right of the screen (inside an experiment). If you have already started several runs, you can manage them and compare performance inside the experiment, as shown below:
Start by labeling your run (provide a run name) at the top then follow the prompts in the form to configure your run, select your target variable, configure your back test time period, and select features.
We have pre-engineered thousands of healthcare features to save you time and boost the performance of your ML healthcare models. Features are data values that have been derived from your data as well as the enrichment's applied to your data. Available feature categories include:
- Medical Codes
- Disease Tags
Click on the + button next to a Feature Category to add the entire Category or click on the drop down menu to select a subset of the Feature Categories.
Our features inherit the global start and end date of the model training period you configured in the form. To customize the date ranges or other aspects of your features, add the feature to your model and select the settings wheel.
Edit the start and end date of your feature or configure other aspects of the feature, such as the threshold for which patient/member you wish to consider "high cost", as in the example below:
Advanced settings include train/validate/test data split, data sampling size, learning algorithm, and hyperparameter selections. These options have been pre-populated for you but can be configured to boost performance.
- Split Data: The purpose of creating this split is to give you more control over how much or how little data you want to train, validate, and test on.
- Sample Size: As an optional choice, a user can sample the training data rather than using the full dataset. This option is useful if you are working with a large data set.
- Learning Algorithm: A learning algorithm is the model that you train your data on to make predictions. The level of model complexity increases from Logistic Regression, to Random Forest, to Gradient Boosted Decision Tree. We recommend you trying out all three to achieve the best performance.
- Hyperparameters: Once you have selected the best learning algorithm, you should adjust the associated hyperparameters. The hyperparameters will change based on the learning algorithm that you selected. We have pre-generated hyperparameters for you. We suggest you complete your first run without adjusting hyperparameters.
We offer the following learning algorithms:
- Logistic Regression: Type of classification algorithm used to estimate discrete values, such as 0 or 1, yes or no, or true or false. This algorithm predicts the probability of an event occurring by fitting the data into a logit regression.
- Gradient Boosted Decision Tree (XGBoost): Type of supervised learning algorithm used for both regression and classification problems. This algorithm uses an ensemble of many weaker models to generate a high-performing model.
- Random Forest: Type of supervised learning algorithm that is used for both regression and classification problems. This algorithm uses many decision trees to train and generate outputs for classification and/or regression.
Once you have completed the new run form, click on the Progress tab to see the pipeline running. Click on the Refresh button to follow the updates to the pipeline. Depending on the size of the data, this process can take an average of 10-15 minutes.
Once the run has completed, you can view the features that are most relevant to your model in the Features tab. You can view the performance of the model in the Performance tab.
Once a run has successfully been completed and you are happy with the performance, a model can be published by clicking on the Publish as Model button at the top. A popup window will appear, requesting you to create a name for the model. The model will be registered in the Spectrum Model Catalog.
Updated about 2 months ago