Mian
  • Mian Overview
  • Tutorial
  • Getting Started
  • Projects
  • Tool Parameters and Filters
  • Boxplots
  • Barplot (Composition)
  • Donut (Composition)
  • Scatterplot (Correlation)
  • Heatmap (Correlation)
  • Heatmap (Composition)
  • Correlation Network
  • Rarefaction Curves
  • Taxonomic Tree View
  • Table
  • Alpha Diversity
  • Beta Diversity
  • NMDS
  • PCoA
  • Boruta (Feature Selection)
  • Elastic Net Classification
  • Elastic Net Regression
  • Fisher Exact Test
  • Differential Selection
  • Correlations Selection
  • Linear Regressor
  • Linear Classifier
  • Random Forest Classifier
  • Deep Learning
Powered by GitBook
On this page
  • Used For
  • Feature Selection Parameters
  • Interpreting Your Results
  • Interactive Elements
  • Additional Features

Was this helpful?

Random Forest Classifier

PreviousLinear ClassifierNextDeep Learning

Last updated 3 years ago

Was this helpful?

Used For

  • Assess the performance of a random forest classifier on the OTU data in predicting a categorical variable

Machine learning models tend to work best with a dataset with a large number of samples

Feature Selection Parameters

Taxonomic Level

The taxonomic level to aggregate the OTUs at. The OTUs will be grouped together (by summing the OTU values) at the selected taxonomic level before the analysis is applied.

Categorical Variable

Create comparative sample groups based on categorical variables uploaded in the metadata file.

Optionally create a categorical variable from a quantitative variable by using the Quantile Range feature on the Projects home page.

Evaluation Method

Train the model using either by splitting the data into a training and test set or by cross-validating over a specified number of folds.

Cross-Validation Folds

Freeze Training Set Between Changes

If set to yes, ensures that the same samples are used as the training set every the model is retrained. This useful to keep the test set untouched.

Training Proportion

Define the proportion of the data that should be randomly picked to form a training dataset.

Number of Trees

Max Tree Depth

Interpreting Your Results

  • Tune your model for better performance by looking only at the validation AUC. Tuning refers to changing the configurable parameters to try to achieve a better performance for your dataset. It is important to not tune against the test AUC to ensure you don't overfit your model to the test set.

  • The AUC tells you the probability that a randomly sampled positive patient will have a higher predicted score for the positive class than the negative class. The AUC will be shown in a "one-vs-all" format.

Interactive Elements

  • Hover over the ROC curve generated for the test data

Additional Features

  • Save Snapshot: Save the results to the experiment notebook

  • Download: Downloads the results as a CSV file

  • Share: Creates a shareable link that allows you to share the results with others

Specify the number of folds used in

The maximum number of trees to generate for the

The max depth of each tree in the . If empty, it will expand each tree to its fullest extent

Assess the predictive performance of your model using the test AUC (area under the shown). Note: Whenever possible, it is still recommended to validate a trained model against an independent dataset (one that is collected outside of your study).

The training error is also available as the

k-fold Cross Validation
random forest
random forest
ROC curve
out-of-bag training error