To save samples of the Macau model you can add
save_prefix = "mymodel" when calling
This option will store all samples of the latent vectors, their mean vectors and link matrices to the disk.
Additionally, the global mean-value that Macau adds to all predictions is also stored.
import macau import scipy.io ## loading data ic50 = scipy.io.mmread("chembl-IC50-346targets.mm") ecfp = scipy.io.mmread("chembl-IC50-compound-feat.mm") ## running factorization (Macau) result = macau.macau(Y = ic50, Ytest = 0.2, side = [ecfp, None], num_latent = 32, precision = 5.0, burnin = 100, nsamples = 500, save_prefix = "chembl19")
The saved files for sample
N for the rows are
- Latent vectors
- Latent means:
- Link matrix (beta):
- Global mean value:
chembl19-meanvalue.csv(same for all samples).
Equivalent files for the column latents are stored in
Using the saved model to make predictions¶
These files can be loaded with numpy and used to make predictions.
import numpy as np ## global mean value (common for all samples) meanvalue = np.loadtxt("chembl19-meanvalue.csv").tolist() ## loading sample 1 N = 1 U = np.loadtxt("chembl19-sample%d-U1-latents.csv" % N, delimiter=",") V = np.loadtxt("chembl19-sample%d-U2-latents.csv" % N, delimiter=",") ## predicting Y[0, 7] from sample 1 Yhat_07 = U[:,0].dot(V[:,7]) + meanvalue ## predict the whole matrix from sample 1 Yhat = U.transpose().dot(V) + meanvalue
Note that in Macau the final prediction is the average of the predictions from all samples. This can be accomplished by looping over all of the samples and averaging the predictions.
Using the saved model to predict new rows (compounds)¶
Here we show an example how to make a new prediction for a compound (row) that was not in the dataset, by using its side information and saved link matrices.
import numpy as np import scipy.io ## loading side info for arbitrary compound (can be outside of the training set) xnew = scipy.io.mmread("chembl-IC50-compound-feat.mm").tocsr()[17,:] ## loading sample 1 meanvalue = np.loadtxt("chembl19-meanvalue.csv").tolist() N = 1 lmean = np.loadtxt("chembl19-sample%d-U1-latentmean.csv" % N, delimiter=",") link = np.loadtxt("chembl19-sample%d-U1-link.csv" % N, delimiter=",") V = np.loadtxt("chembl19-sample%d-U2-latents.csv" % N, delimiter=",") ## predicted latent vector for xnew from sample 1 uhat = xnew.dot(link.transpose()) + lmean ## use predicted latent vector to predict activities across columns Yhat = uhat.dot(V) + meanvalue
Again, to make good predictions you would have to change the example to loop over all of the samples (and compute the mean of Yhat’s).