baseline_model(indim=7,hidden_nodes=[8,8],outdim=9)
Model constructor definition, as needed to use scikit-learn wrapper with keras.
Parameters
----------
indim : int, optional
Number of features of dataset and dimension of input layer. The default is 7.
hidden_nodes : list, optional
List of number of nodes per layer. The default is [8,8].
outdim : int, optional
Number of classes and dimension of output layer. The default is 9.
Returns
-------
model : keras.engine.sequential.Sequential
Sequntial NN object to be used inside KerasClassifer method.
Function to load pretrained NN.
Parameters
----------
modpath : String
path (local or URL) of model in joblib format..
Returns
-------
keras.wrappers.scikit_learn.KerasClassifier
Wrapper from the Scikit.learn library of a the Keras Classifier.
or
xgboost.core.Booster
Booster is the model of xgboost, that contains low level routines for training, prediction and evaluation.
Function to load data from disk or using an URL.
Parameters
----------
datapath : String
path (local or URL) of data in csv format.
Returns
-------
pandas.dataframe
Dataframe containing data used for training and/or inference.
Feature scaling: rescaling via min-max normalization.
Parameters
----------
datapath : String
path (local or URL) of data in csv format..
cor : Boolean, optional
Set True if correlation matrix plot is needed. The default is False.
Returns
-------
pandas.dataframe
Dataframe rescaled using min-max normalization.
Function to compute classification of a dataset using a pretrained NN.
Parameters
----------
datapath : String
path (local or URL) of data in csv format..
modelpath : String
path (local or URL) of model in joblib format..
performance : Boolean, optional
Set between two return mode: False -> return only predictions; True -> return predictions and true labels if provided (for evaluating performance). The default is False.
NSamples : int, optional
number of entries used of the dataset. If NSamples == 0 or NSamples > data size the all dataset will be used. The default is 0.
Returns
-------
pandas.dataframe
Dataframe containing inferences made by the model for each entry of the data in input.
list of pandas.dataframe
List made up of two dataframes, the first contains the inferences, the second contains the true labels for validation.
Function to perform a simple comparison between prediction and known labels of a test sample.
Parameters
----------
modelpath : String
path (local or URL) of model in joblib format..
datatest : String
path (local or URL) of data in csv format..
Returns
-------
Float
Fraction of good inferences made by a model.
Function performing one-hot encoding.
Parameters
----------
datapath : String
path (local or URL) of data in csv format..
NSample : int, optional
number of entries used of the dataset. If NSamples == None or NSamples > data size the all dataset will be used.. The default is None.
Returns
-------
List of pandas.dataframe
List made up of two Dataframes: the first contains the preprocessed data and the second one contains the one hot encoded labels.
Plotting function that saves three different .png images:
- Representation of the neural network;
- Plot of the model accuracy thorugh epochs for training and validation sets;
- Plot of the model loss function thorugh epochs for training and validation sets.
Parameters
----------
estimator : keras.wrappers.scikit_learn.KerasClassifier
Object containing NN model.
history : keras.callbacks.History
Return of fit function of the NN model.
Returns
-------
None.
NN training function.
Parameters
----------
datapath : String
path (local or URL) of data in csv format..
NSample : int, optional
number of entries used of the dataset. If NSamples == 0 or NSamples > data size the all dataset will be used.. The default is 0.
par : List of int,int,float, optional
list of paramaters passed to the NN costructor [number of epochs the NN will be trained for, size of the batches used to update the weights, fraction of the input dataset used for validation]. The default is [48,30,0.3].
Returns
-------
pandas.DataFrame
Values assumed by evaluation metrics through the epochs.
KFold cross validation function using scikit-learn API.
Parameters
----------
modelpath : String
path (local or URL) of model in joblib format.
datapath : String
path (local or URL) of data in csv format.
Returns
-------
Float
Mean between the inference accuracy of each class.
Function to upload data, perform train test splitting and create DMatrix objects used by XGBoost methods.
Parameters
----------
datapath : String
path (local or URL) of training data in csv format.
Returns
-------
list
Train and validation datasets in DMatrix format.
Plotting function for the trained XGBoost model.
Parameters
----------
evals_result : dictionary
Dictionary with the values of the error metrics in each iteration, divided in train and validation. For example: {'train':[{'merror':##,'mlogloss':##}],'eval':[{'merror':##,'mlogloss':##}]}.
Returns
-------
None.
Function dedicated to saving model on disk and preparing training summary.
Parameters
----------
bst : xgboost.core.Booster
Booster is the model of xgboost, that contains low level routines for training, prediction and evaluation.
evals_result : dictionary
Dictionary with the values of the error metrics in each iteration, divided in train and validation.
Returns
-------
pandas.DataFrame
Values assumed by evaluation metrics through the epochs.
Function to construct and train a BDT using the XGboost library.
Parameters
----------
datapath : String
path (local or URL) of training data in csv format.
args : dictionary, optional
list of parameters. The default is {'eval_metric': ['merror','mlogloss']}.
iterations : int, optional
number of iterations performed in training. The default is 10.
Returns
-------
pandas.DataFrame
Values assumed by evaluation metrics through the epochs.
Main function invoked by execution in shell.
Parameters
----------
argss : argparse.ArgumentParser
Arguments parsed to the invoked function. Contains flags which control the execution of the script: e.g. the model of choice and if you want to train or infer.
Returns
-------
int
Error code return.