ddn3.ddn
The DDN main functions
There are two top level DDN functions: the ddn and the ddn_parallel. The former one is in serial, which is easier to debug, while the latter one is in parallel. For smaller data sets (e.g., less than 500 nodes), the serial version is fast enough. The two functions should have the same functionality.
Each function allow using several different methods.
org: the method in DDN 2.0. This is slow for larger data sets.
resi: the method in DDN 3.0 using residual update strategy. This is suitable for larger feature number.
corr: the method in DDN 3.0 using correlation matrix update strategy. This is suitable for larger sample number.
strongrule: using the strong rule to accelerate the condition when lambda=0.
We recommend using resi in general case. In case you have much more samples than features, consider corr.
The choice of two hyperparameters lambda1 and lambda2 is critical. We recommend running DDN with a range of parameters, and using prior knowledge to select the suitable. Alternatively, users may refer the parameter tuning tutorial for other methods of choosing the parameters.
These functions also support using warm start. To use this function, input the coefficient matrix from the previous call of DDN. However, we do not observe very significant speed up, and are not generally recommended.
Module Contents
Functions
|
Run DDN in parallel. |
|
Run DDN. |
- ddn3.ddn.ddn_parallel(g1_data, g2_data, lambda1=0.3, lambda2=0.1, threshold=1e-06, mthd='resi', std_est='std', g_rec_in=(), n_process=1)
Run DDN in parallel.
Denote P be the number features. N1 be the sample size for condition 1, and N2 for condition 2.
- Parameters:
g1_data (array_like, shape N1 by P) – The data from condition 1
g2_data (array_like, shape N2 by P) – The data from condition 2
lambda1 (float) – DDN parameter lambda1.
lambda2 (float) – Not used. Must be 0.
threshold (float) – Convergence threshold.
mthd (str) – The DDN solver to use.
std_est (str) – The standardization method. The default value is fine in most cases.
g_rec_in (ndarray) – The input coefficient matrix. If warm start is not needed, do not change it.
n_process (int) – Number of cores to use. Do not exceed the number of cores in your computer. If set to 1, no parallelization is used.
- Returns:
g_res – The estimated coefficient array of shape (2, P, P). g_res[0] is for the first condition, and g_res[1] for the second condition.
- Return type:
ndarray
- ddn3.ddn.ddn(g1_data, g2_data, lambda1=0.3, lambda2=0.1, threshold=1e-06, mthd='resi', std_est='std', g_rec_in=(), standard=True)
Run DDN.
Denote P be the number features. N1 be the sample size for condition 1, and N2 for condition 2.
- Parameters:
g1_data (array_like, shape N1 by P) – The data from condition 1
g2_data (array_like, shape N2 by P) – The data from condition 2
lambda1 (float) – DDN parameter lambda1.
lambda2 (float) – Not used. Must be 0.
threshold (float) – Convergence threshold.
mthd (str) – The DDN solver to use.
std_est (str) – The standardization method. The default value is fine in most cases.
g_rec_in (ndarray) – The input coefficient matrix. If warm start is not needed, do not change it.
- Returns:
g_res – The estimated coefficient array of shape (2, P, P). g_res[0] is for the first condition, and g_res[1] for the second condition.
- Return type:
ndarray