Friday, October 19, 2007

stuck a bit

After discussion with Prof. Tan, I found the clustering method is not a good method to predict protein structure, because clustering is based on matrices derived from 3-d structures, and the primary sequences are simply ignored. Unless after clustering analyisis, we could find substantial similarites among members' primary sequence, there is no conclusion of the correlation between the primary sequence and 3-d structrue. In other words, we cannot predict the struture given the clustering information.

So now, I am going to change my goal of predicting protein. The work flow should be as follows:
1.
Rerun Zeyar's program to get a correct output.
Meanwhile, analyze the clusters from the clusters directory.

2.Try split the matrices into 10x10 matrices and use one clustering, and see if can get a better clustering. CM--split-->sub matrices->clusters of sub matrices->bit vectors->clusters of bit vectors->clusters of CM->cluster analysis->the applicatioin mentioned in Zeyar's paper/thesis.

No comments: