You are viewing main version, which requires installation from source. If you'd like
regular pip install, checkout the latest stable version (v0.8.24).
Tabular Parameters
--batch-size BATCH_SIZE
Training batch size to use
--seed SEED Random seed for reproducibility
--target-columns TARGET_COLUMNS
Specify the names of the target or label columns separated by commas if multiple. These columns are what the model will
predict. Required for defining the output of the model.
--categorical-columns CATEGORICAL_COLUMNS
List the names of columns that contain categorical data, useful for models that need explicit handling of such data.
Categorical data is typically processed differently from numerical data, such as through encoding. If not specified, the
model will infer the data type.
--numerical-columns NUMERICAL_COLUMNS
Identify columns that contain numerical data. Proper specification helps in applying appropriate scaling and normalization
techniques, which can significantly impact model performance. If not specified, the model will infer the data type.
--id-column ID_COLUMN
Specify the column name that uniquely identifies each row in the dataset. This is critical for tracking samples through the
model pipeline and is often excluded from model training. Required field.
--task {classification,regression}
Define the type of machine learning task, such as 'classification', 'regression'. This parameter determines the model's
architecture and the loss function to use. Required to properly configure the model.
--num-trials NUM_TRIALS
Set the number of trials for hyperparameter tuning or model experimentation. More trials can lead to better model
configurations but require more computational resources. Default is 100 trials.
--time-limit TIME_LIMIT
mpose a time limit (in seconds) for training or searching for the best model configuration. This helps manage resource
allocation and ensures the process does not exceed available computational budgets. The default is 3600 seconds (1 hour).
--categorical-imputer {most_frequent,None}
Select the method or strategy to impute missing values in categorical columns. Options might include 'most_frequent',
'None'. Correct imputation can prevent biases and improve model accuracy.
--numerical-imputer {mean,median,None}
Choose the imputation strategy for missing values in numerical columns. Common strategies include 'mean', & 'median'.
Accurate imputation is vital for maintaining the integrity of numerical data.
--numeric-scaler {standard,minmax,normal,robust}
Determine the type of scaling to apply to numerical data. Examples include 'standard' (zero mean and unit variance), 'min-
max' (scaled between given range), etc. Scaling is essential for many algorithms to perform optimally