ddn3.parameter_tuning

The module for tuning hyperparameters in DDN

Find strategies are implemented.

  • cv_joint: grid search CV for lambda1 and lambda2. This is time-consuming for larger data.

  • cv_sequential: CV for lambda1 first, then use the determined lambda1 to do CV on lambda2.

  • cv_bai: CV for lambda1 first, then use the method in [1] to directly calculate lambda2.

  • mb_cv: use theorem 3 in [2] to directly calculate lambda1, and use CV to get lambda2.

  • mb_bai: use theorem 3 in [2] to directly calculate lambda1, and the method in [1] to get lambda2.

If the data set is not too large, cv_joint is a better choice. If the data is larger, it is better to use the cv_sequential option. If the data is really large, consider cv_bai.

A bette approach is to manually choose a set of lambda1 and select the one that leads to a reasonable network. By utilizing the prior knowledge, it is more likely to obtain the network that is usable.

[1] “Learning structural changes of Gaussian graphical models in controlled experiments.” UAI (2012). [2] “High-dimensional graphs and variable selection with the lasso.” Ann. Statist. (2006): 1436-1462.

Module Contents

Classes

DDNParameterSearch

Functions

get_lambda1_mb(alpha, n, p[, mthd])

Estimate lambda1 using theorem 3 in [2]

get_lambda2_bai(x1, x2[, alpha])

Estimate lambda2 using the method in [1]

get_lambda_one_se_1d(val_err, lambda_lst)

Choose a lambda from a list of lambda values using the one standard error rule

get_lambdas_one_se_2d(val_err, lambda1_lst, lambda2_lst)

Choose a lambda from a list of lambda values using the one standard error rule

calculate_regression(data, topo_est)

Linear regression

cv_two_lambda(dat1, dat2[, n_cv, ratio_val, ...])

Cross validation by grid search lambda1 and lambda2

plot_error_1d(val_err[, lambda_lst, ymin, ymax])

Plot the curve of cross validation with one lambda

plot_error_2d(val_err[, cmin, cmax])

Plot the image of cross validation with both lambda1 and lambda2

class ddn3.parameter_tuning.DDNParameterSearch(dat1, dat2, lambda1_list=np.arange(0.05, 1.05, 0.05), lambda2_list=np.arange(0.025, 0.525, 0.025), n_cv=5, ratio_validation=0.2, alpha1=0.05, alpha2=0.01)
fit(method='cv_sequential')

Estimate lambda1 and lambda2.

The validation error is defined as the ratio of the signal in the validation set that cannot be

explained by the network estimated in the training set.

Parameters:

method (str) – The method used for estimation.

Returns:

out – The validation error, estimated lambda1, and lambda2

Return type:

tuple

run_cv_joint()

Estimation lambda1 and lambda2 with grid search CV

run_cv_sequential()

Use CV to estimate lambda1, then use CV for lambda2

run_cv_bai()

Use CV for lambda1, use the method in [1] for lambda2

run_mb_cv()

Use the theorem 3 in [2] for lambda1, and use CV for lambda2

run_mb_bai()

Use theorem 3 in [2] for lambda1, and use method in [1] for lambda2

ddn3.parameter_tuning.get_lambda1_mb(alpha, n, p, mthd=0)

Estimate lambda1 using theorem 3 in [2]

Parameters:
  • alpha (float) – The parameter controlling false positives

  • n (int) – The number of samples

  • p (int) – The number of features

  • mthd (int) – Use 0 for methods in [2], with a correction term of 2. Use 1 for methods used in https://pubmed.ncbi.nlm.nih.gov/25273109/

Returns:

lmb1 – The estimated lambda1

Return type:

float

ddn3.parameter_tuning.get_lambda2_bai(x1, x2, alpha=0.01)

Estimate lambda2 using the method in [1]

Parameters:
  • x1 (array_like) – The data for condition 1

  • x2 (array_like) – The data for condition 2

  • alpha (float) – The parameter controlling false positives

Returns:

lmb2 – Estimated labmda2

Return type:

float

ddn3.parameter_tuning.get_lambda_one_se_1d(val_err, lambda_lst)

Choose a lambda from a list of lambda values using the one standard error rule

let K be the number of CV repeats, L the number of lambda values.

Parameters:
  • val_err (array_like) – Validation errors corresponding to the list of lambda. Shape K by L.

  • lambda_lst (array_like) – The lambda values. Shape L.

Returns:

Chosen lambda value

Return type:

float

ddn3.parameter_tuning.get_lambdas_one_se_2d(val_err, lambda1_lst, lambda2_lst)

Choose a lambda from a list of lambda values using the one standard error rule

After finding the minimum value, we find all the combinations of lambda1 and lambda2 values that are at least one standard error larger than this minimum value. From these combinations, we choose the one whose lambda1 and lambda2 values are closest to those of the minimum.

let K be the number of CV repeats, L1 the number of lambda1 values, L2 the number of lambda2 values.

Parameters:
  • val_err (array_like) – Validation errors corresponding to the list of lambda. Shape K by L1 by L2.

  • lambda1_lst (array_like) – The lambda1 values, shape L1.

  • lambda1_lst – The lambda2 values, shape L2.

Returns:

Chosen lambda value

Return type:

float

ddn3.parameter_tuning.calculate_regression(data, topo_est)

Linear regression

For each variable, use all its neighbors as predictors and find the regression coefficients.

This is an example of regression operation

>>> x = np.array([[-1,-1,1,1.0], [1,1,-1,-1]]).T
>>> y = np.array([1,1,-1,-1.0])
>>> out = np.linalg.lstsq(x, y, rcond=None)
>>> out[0]
Parameters:
  • data (array_like) – All data

  • topo_est (array_like) – Estimated adjacency matrix

ddn3.parameter_tuning.cv_two_lambda(dat1, dat2, n_cv=5, ratio_val=0.2, lambda1_lst=np.arange(0.05, 1.05, 0.05), lambda2_lst=np.arange(0.025, 0.525, 0.025))

Cross validation by grid search lambda1 and lambda2

To estimate the validation error, we estimate the coefficient of each node on the training set based on the estimated network topology. Then for each node in the validation set, we try to use its neighbors to explain the signal in that node. The portion of unexplained signal in all nodes is defined as the validation error.

Let K be the number of CV repeats, L1 the number of lambda1 values, L2 the number of lambda2 values.

Parameters:
  • dat1 (array_like) – Data for condition 1

  • dat2 (array_like) – Data for condition 1

  • n_cv (int) – Number of repeats. Can be as large as you like, as we re-sample each time.

  • ratio_val (float) – Ratio of data for validation. The remaining is used for training.

  • lambda1_lst (array_like) – Values of lambda1 for searching

  • lambda2_lst (array_like) – Values of lambda2 for searching

Returns:

  • val_err (array_like) – The validation error for each lambda1 and lambda2 combination. Shape K by L1 by L2

  • lambda1_lst (array_like)

  • lambda2_lst (array_like)

ddn3.parameter_tuning.plot_error_1d(val_err, lambda_lst=(), ymin=None, ymax=None)

Plot the curve of cross validation with one lambda

ddn3.parameter_tuning.plot_error_2d(val_err, cmin=None, cmax=None)

Plot the image of cross validation with both lambda1 and lambda2