suod.models package#

Subpackages#

suod.models.saved_models package
- Module contents

Submodules#

suod.models.base module#

suod.models.cost_predictor module#

suod.models.jl_projection module#

Johnson–Lindenstrauss process. Part of the code is adapted from https://github.com/PTAug/jlt-python

suod.models.jl_projection.jl_fit_transform(X, objective_dim, method='basic')[source]#

Fit and transform the input data by Johnson–Lindenstrauss process. See [BJL84] for details.

Parameters#

Xnumpy array of shape (n_samples, n_features)

The input samples.

objective_dimint

The expected output dimension.

methodstring, optional (default = ‘basic’)

The JL projection method:

“basic”: each component of the transformation matrix is taken at random in N(0,1).
“discrete”, each component of the transformation matrix is taken at random in {-1,1}.
“circulant”: the first row of the transformation matrix is taken at random in N(0,1), and each row is obtained from the previous one by a one-left shift.
“toeplitz”: the first row and column of the transformation matrix is taken at random in N(0,1), and each diagonal has a constant value taken from these first vector.

Returns#

X_transformednumpy array of shape (n_samples, objective_dim): The dataset after the JL projection.
jl_transformerobject: Transformer instance.

suod.models.jl_projection.jl_transform(X, jl_transformer)[source]#

Use the fitted transformer to conduct JL projection.

Parameters#

Xnumpy array of shape (n_samples, n_features): The input samples.
jl_transformerobject: Fitted transformer instance.

Returns#

X_transformednumpy array of shape (n_samples, reduced_dimensions): Transformed matrix.

suod.models.parallel_processes module#

suod.models.parallel_processes.balanced_scheduling(time_cost_pred, n_estimators, n_jobs, verbose=False)[source]#

Conduct balanced scheduling based on the sum of rank, for both train and prediction. The algorithm will enforce the equal sum of ranks among workers.

Parameters#

time_cost_predlist: The list of time cost by the cost predictor. The length is equal to the number of base detectors.
n_estimatorsint: The number of base estimators.
n_jobsoptional (default=1): The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.
verbosebool, optional (default=False): Controls the verbosity of the building process.

Returns#

n_estimators_listlist: The number of estimators for each worker
startslist: The actual index of base detectors to be scheduled. For instance, starts[k, k+1] base detectors will be assigned to worker k.
n_jobs :: The actual usable number of jobs to run in parallel.

suod.models.parallel_processes.cost_forecast_meta(clf, X, base_estimator_names)[source]#

Forecast model cost by pretrained cost estimator.

Parameters#

clfobject, sklearn regressor: Random forest regressor trained to forecast model cost
Xnumpy array of shape (n_samples, n_features): The input samples.
base_estimator_nameslist of str: The list of outlier detection model names in the string format

Returns#

time_cost_pred : numpy array of outlier detection model cost in seconds.

suod.models.parallel_processes.indices_to_one_hot(data, nb_classes)[source]#: Convert an iterable of indices to one-hot encoded labels.

Module contents#

References

[BJL84]

William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189-206):1, 1984.

[BKKSZ11]

Hans-Peter Kriegel, Peer Kroger, Erich Schubert, and Arthur Zimek. Interpreting and unifying outlier scores. In Proceedings of the 2011 SIAM International Conference on Data Mining, 13–24. SIAM, 2011.

[BZNL19]

Yue Zhao, Zain Nasrullah, and Zheng Li. PyOD: a python toolbox for scalable outlier detection. Journal of Machine Learning Research, 20:1–7, 2019.