Welcome to the Model Builder Platform: Your No-Code ML Solution!
This guide will walk you through using our intuitive, drag-and-drop platform to build and deploy powerful Scikit-learn models without writing a single line of code. Let's get started!
Getting Started
1. The Dashboard Page
When you first log in, you'll be greeted by the Dashboard page. This is where you manage your projects and create new models.
2. Creating a New Model
To start building, click the "Add Model" button. A menu will appear. Select "Machine Learning" to initiate a new machine learning model building session. This will take you to the model building canvas.
Building Your Model: The Canvas
The model building canvas is where the magic happens. Here's a breakdown of how to create your model:
The Node Palette (Left Sidebar)
On the left side of the screen, you'll find the Node Palette. This contains a list of pre-built components, from data loading to preprocessing and, of course, machine learning algorithms. Think of these as the building blocks of your model.
Drag and Drop: Building Your Graph
Building your model is as simple as dragging and dropping. Find the node you want from the Node Palette and drag it onto the main canvas area (the "light graph").
Node Properties
Once a node is on the canvas, you can customize its properties. Each node has various adjustable parameters to fine-tune its behavior.
- Accessing Properties: Click the arrow button next to the property you want to modify to expand the options.
Minimizing and Maximizing Nodes
Keep your canvas clean and organized by minimizing nodes.
- Minimizing/Maximizing: Click the top-left corner of a node to toggle between its minimized and maximized states.
Connecting the Nodes: Data Flow
This is where you define the flow of data through your model.
- Creating Connections: Click on the output port of one node. Then, drag the resulting line to the input port of another node. This establishes a connection between the two, passing data from the output of the first to the input of the second.
Saving Your Progress
Don't lose your work!
- Saving: Click the "Save" button to save your model.
Generating Code: From Visual to Functional
Once you've built and connected your model, it's time to generate the Scikit-learn code.
- Local Execution: Open the downloaded .ipynb file using Jupyter Notebook or JupyterLab installed on your computer. Follow the instructions within the notebook to run your model.
- Cloud Execution (Google Colab) Upload the .ipynb file to Google Colab (colab.research.google.com) and run it there. This is a great option if you don't have Python or the necessary libraries installed locally.
Connection Logic: Ensuring a Valid Model
To ensure your model functions correctly, the following connection rules apply:
- Single Sample Dataset: You'll start with a default sample dataset. You can adjust its properties to change which dataset you are using.
Algorithm Type Compatibility:
A classifier dataset will only connect to classifier algorithms.
A classifier dataset will only connect to classifier algorithms.
- Single Preprocessing Node: You can only add one preprocessing node to the original dataset.
- Algorithm Output Restriction: The output of an algorithm node cannot be connected to another algorithm node.
Dataset Section: Selecting Your Data Source
Our platform offers two ways to provide data to your model: Sample Datasets and CSV Datasets.
Sample Datasets:
We provide several built-in sample datasets to get you started:
Classification Datasets:
- iris: A classic dataset for multi-class classification, containing measurements of iris flower features. Scikit-learn Docs
- breast_cancer: A dataset for binary classification, predicting whether a tumor is benign or malignant based on various features. Scikit-learn Docs
- wine: A dataset for multi-class classification, classifying different types of wine based on their chemical properties. Scikit-learn Docs
- digits: A dataset for multi-class classification, containing images of handwritten digits (0-9). Scikit-learn Docs
Regression Datasets:
- Diabetes: A dataset for regression, predicting a quantitative measure of disease progression one year after baseline, based on various features. Scikit-learn Docs
CSV Dataset:
You can leverage the "CSV Dataset" node to load and use your own datasets. This node allows you to specify the path to a CSV file that will be read and used as input for your model. This enables you to test the model with your own datasets.
Important Considerations (Warning):
- Dataset Dependency: When using a custom CSV dataset, the model's behavior becomes highly dependent on the dataset's structure, data types, and content.
- Input type: It does not take any input
- Output type: It will be depended on dataset selected as mentioned in Classification Datasets section
- Runtime Errors: Using custom data may lead to runtime errors in the generated code. This could be due to various factors:
- Missing Values: Your dataset may contain missing values that the model is not prepared to handle.
- Incorrect Data Types: Columns may have unexpected data types (e.g., strings instead of numbers) that cause errors during processing or training.
- Feature Mismatch: The dataset may not have the features expected by the selected preprocessing steps or algorithms.
- Encoding Issues: problems might arise due to file encoding issues while importing the file.
- Troubleshooting is Required: You are responsible for resolving any runtime errors that occur due to the custom dataset. This may involve modifying the code within the Jupyter Notebook or adjusting the data within the CSV file. Make sure the CSV file is in the correct format, it has the correct column delimiter and encoding.
- Data Cleaning and Preprocessing: Ensure your CSV dataset is clean, properly formatted, and preprocessed as needed before using it.
Algorithm Nodes: Your ML Toolbox
Here's a brief overview of the available algorithm nodes. Click the links for more detailed information on each algorithm.
Support Vector Machines Algorithms:
- Support Vector Classifier (SVC): Finds the optimal hyperplane that separates data points of different classes with the largest margin. Effective in high-dimensional spaces.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
- Linear Support Vector Classifier (LinearSVC): Similar to SVC but uses a linear kernel. More efficient for large datasets.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
- Nu-Support Vector Classifier (NuSVC): Similar to SVC but uses a different parameterization to control the number of support vectors.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
- Epsilon-Support Vector Regression (SVR): Similar to SVC but for regression tasks. Uses a margin of tolerance (epsilon) around the predicted values.
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
- Nu-Support Vector Regression (NuSVR): Similar to SVR but uses a different parameterization to control the number of support vectors.
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
- Linear Support Vector Regression (LinearSVR):
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
- Unsupervised Outlier Detection SVM: Uses SVM principles to identify data points that deviate significantly from the rest of the dataset.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
Decision Tree Algorithms:
- DecisionTreeClassifier: A decision tree classifier capable of performing multi-class classification by learning simple decision rules inferred from the data features.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
- ExtraTreeClassifier: An extremely randomized tree classifier that builds decision trees from the training set where randomness is introduced in the selection of the split.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
- DecisionTreeRegressor: A decision tree regressor that predicts a target value by learning decision rules from features.
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
- ExtraTreeRegressor: An extremely randomized tree regressor that fits a number of randomized decision trees on various sub-samples of the dataset and uses averaging to improve predictive accuracy.
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
Discriminant Analysis Algorithms:
- LinearDiscriminantAnalysis: A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes' rule.
- Input type: Any
- Output type: Any
- Scikit-learn Docs
Clustering Algorithm:
- KMeans: A clustering algorithm that partitions data into K distinct clusters based on feature similarity.
- Input type: regressor data
- Output type: regressor data
- Scikit-learn Docs
Dimensionality Reduction Algorithms:
- Principal Component Analysis (PCA): A linear dimensionality reduction technique that projects data to a lower-dimensional space by maximizing variance.
- Input type: Any
- Output type: Any
- Scikit-learn Docs
- Fast Independent Component Analysis (FastICA): A computational method for separating a multivariate signal into additive, independent components.
- Input type: Any
- Output type: Any
- Scikit-learn Docs
Ensemble Algorithms:
- AdaBoost Classifier: An adaptive boosting algorithm that sequentially trains weak learners (e.g., decision stumps) and combines their predictions, giving more weight to misclassified instances.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
- AdaBoost Regressor: An adaptive boosting algorithm for regression.
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
- Bagging Classifier: Creates an ensemble of classifiers by training multiple instances of a base classifier on random subsets of the training data. Reduces variance and improves stability.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
- Bagging Regressor: An ensemble method for regression, using multiple instances of a base regressor.
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
- Extra-Trees Classifier: An ensemble method that builds multiple decision trees from subsets of the training data and features. More randomization in the splitting of nodes makes it robust against overfitting.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
- Extra-Trees Regressor: Similar to its classification counterpart, but for regression tasks.
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
- Gradient Boosting Classifier: Builds an ensemble of decision trees in a stage-wise fashion, where each tree corrects the errors of the previous trees. Known for its high accuracy.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
- Gradient Boosting Regressor: Builds an ensemble of decision trees in a stage-wise fashion for regression.
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
- Random Forest Classifier: Another popular ensemble method that builds multiple decision trees. It combines the predictions of multiple decision trees to improve accuracy and reduce overfitting.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
- Random Forest Regressor: An ensemble method for regression using multiple decision trees.
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
- Histogram Gradient Boosting Regressor: An ensemble method for regression, similar to the Extra-Trees Classifier.
- Input type: regression data
- Output type: regression data
- Scikit-learn Docs
- Histogram Gradient Boosting Classifier: A fast and efficient gradient boosting method that uses histograms to approximate the data distribution. Often provides excellent performance.
- Input type: classifier data
- Output type: classification data
- Scikit-learn Docs
Pipeline Utilities Algorithms:
- FeatureUnion: Concatenates results of multiple transformer objects. This is useful to combine several feature extraction mechanisms into a single transformer.
- Input type: Any
- Output type: Any
- Scikit-learn Docs
Node Inputs:
- Node Input Float: A node that outputs a float number. It does not take any input.
- Input type: None
- Output type: float number
- Node Input Int: A node that outputs an integer number. It does not take any input.
- Input type: None
- Output type: int number
Important Note: Ensure that the input and output types of connected nodes match. If they do not, the connection will be disconnected automatically. For detailed parameters and usage, please refer to the provided Scikit-learn documentation links.
Afterword: The AI application Behind This Article
