We elect to normalize the input signal, fixing the size of the idstance matrix to 128*128. This normalization occurs through interpolation or extrapolation, depending on whether the input protein was shorter or longer than 128 residues.
Interpolation is to smooth or average the excess points, while extrapolation is to generate additional data for proteins of smaller lengths.
128 was chosen because the proeins in the dataset are mostly shoter than 256, and a number that is power of 2 suits their use.
I have thought about enlarging smaller matrices by extrapolating the values that are present. Although continuous Markov Chain may promise some hope, the values generated are still faked ones. By doing this, we may artificially introduce noisy which perhaps distorts the results. So I think currently we stick to the superimposition method.
No comments:
Post a Comment