suod.models package#



suod.models.base module#

suod.models.cost_predictor module#

suod.models.jl_projection module#

Johnson–Lindenstrauss process. Part of the code is adapted from

suod.models.jl_projection.jl_fit_transform(X, objective_dim, method='basic')[source]#

Fit and transform the input data by Johnson–Lindenstrauss process. See [BJL84] for details.


Xnumpy array of shape (n_samples, n_features)

The input samples.


The expected output dimension.

methodstring, optional (default = ‘basic’)

The JL projection method:

  • “basic”: each component of the transformation matrix is taken at random in N(0,1).

  • “discrete”, each component of the transformation matrix is taken at random in {-1,1}.

  • “circulant”: the first row of the transformation matrix is taken at random in N(0,1), and each row is obtained from the previous one by a one-left shift.

  • “toeplitz”: the first row and column of the transformation matrix is taken at random in N(0,1), and each diagonal has a constant value taken from these first vector.


X_transformednumpy array of shape (n_samples, objective_dim)

The dataset after the JL projection.


Transformer instance.

suod.models.jl_projection.jl_transform(X, jl_transformer)[source]#

Use the fitted transformer to conduct JL projection.


Xnumpy array of shape (n_samples, n_features)

The input samples.


Fitted transformer instance.


X_transformednumpy array of shape (n_samples, reduced_dimensions)

Transformed matrix.

suod.models.parallel_processes module#

suod.models.parallel_processes.balanced_scheduling(time_cost_pred, n_estimators, n_jobs, verbose=False)[source]#

Conduct balanced scheduling based on the sum of rank, for both train and prediction. The algorithm will enforce the equal sum of ranks among workers.



The list of time cost by the cost predictor. The length is equal to the number of base detectors.


The number of base estimators.

n_jobsoptional (default=1)

The number of jobs to run in parallel for both fit and predict. If -1, then the number of jobs is set to the number of cores.

verbosebool, optional (default=False)

Controls the verbosity of the building process.



The number of estimators for each worker


The actual index of base detectors to be scheduled. For instance, starts[k, k+1] base detectors will be assigned to worker k.

n_jobs :

The actual usable number of jobs to run in parallel.

suod.models.parallel_processes.cost_forecast_meta(clf, X, base_estimator_names)[source]#

Forecast model cost by pretrained cost estimator.


clfobject, sklearn regressor

Random forest regressor trained to forecast model cost

Xnumpy array of shape (n_samples, n_features)

The input samples.

base_estimator_nameslist of str

The list of outlier detection model names in the string format


time_cost_pred : numpy array of outlier detection model cost in seconds.

suod.models.parallel_processes.indices_to_one_hot(data, nb_classes)[source]#

Convert an iterable of indices to one-hot encoded labels.

Module contents#



William B Johnson and Joram Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics, 26(189-206):1, 1984.


Hans-Peter Kriegel, Peer Kroger, Erich Schubert, and Arthur Zimek. Interpreting and unifying outlier scores. In Proceedings of the 2011 SIAM International Conference on Data Mining, 13–24. SIAM, 2011.


Yue Zhao, Zain Nasrullah, and Zheng Li. PyOD: a python toolbox for scalable outlier detection. Journal of Machine Learning Research, 20:1–7, 2019.