Thursday, March 20, 2008

Google Help : Cheat Sheet

From http://www.google.com/help/cheatsheet.html

Here's a quick list of some of our most popular tools to help refine and improve your search. For additional help with Google Web Search or any other Google product, you can visit our main Google Help page.

OPERATOR EXAMPLE FINDS PAGES CONTAINING...
vacation hawaii the words vacation and Hawaii .
Maui OR Hawaii either the word Maui or the word Hawaii
"To each his own" the exact phrase to each his own
virus computer the word virus but NOT the word computer
+sock Only the word sock, and not the plural or any tenses or synonyms
~auto loan loan info for both the word auto and its synonyms: truck, car, etc.
define:computer definitions of the word computer from around the Web.
red * blue the words red and blue separated by one or more words.
I'm Feeling Lucky Takes you directly to first web page returned for your query.
CALCULATOR OPERATORS MEANING TYPE INTO SEARCH BOX
+ addition 45 + 39
- subtraction 45 – 39
* multiplication 45 * 39
/ division 45 / 39
% of percentage of 45% of 39
^ raise to a power 2^5
(2 to the 5th power)
ADVANCED OPERATORS MEANING WHAT TO TYPE INTO SEARCH BOX (& DESCRIPTION OF RESULTS)
site: Search only one website admission site:www.stanford.edu
(Search Stanford Univ. site for admissions info.)
[#][#] Search within a
range of numbers
DVD player $100..150
(Search for DVD players between $100 and $150)
link: linked pages link:www.stanford.edu
(Find pages that link to the Stanford University website.)
info: Info about a page info:www.stanford.edu
(Find information about the Stanford University website.)
related: Related pages related:www.stanford.edu
(Find websites related to the Stanford University website.)

©2008 Google

Wednesday, March 12, 2008

Medicel

"Medicel develops technologies and methods to help scientists turn data into information and knowledge." A sophisticated platform in which data from various sources e.g. HPLC, MS can be processed. It would be wonderful if they also have a data mining feature to run commonly used machine learning and data mining algorithms. From a developer's perspective, this platform is excellent, because it satisfies various daily requirements. It is great with 100 engineers developing 7 years. With moderate training, a junior engineer without much biological background can process tremendous amount of data from biological and medical experiments in various formats very efficiently. The cost of the software is also very high --- half a million.

Doctors do not like this software, though, for reasons that I can understand. From a doctor's perspective, it is good enough to have a small tool to complete a specific task. Simplicity is beauty. Too many functions and features are too distracting. They would think I am a doctor, not a programmer. Give me minimum advice, and I can get the work done.

It is still a valuable tool for a bioinformatician to do case analysis. Some ideas in that software can benefit us in our future plan of designing a similar but far simpler platform.

Future projects and project management tools

1. Design biological platform so that softwares and program in analyzing gene expression etc. can be integrated. (propose to James)
2. Outlier analysis (Try to find its application in biological context; customer relation management is a potential application area )
3. Program that accepts the IC values and outputs the desired cutting point so that gene selection achieves certain accuracy
4. Imaging processing tool to analyse the cancer cell lines
5. Android (pending)

Potential project management portals are: SourceForge.net and Assembla. wetpaint is a nice project management wiki, but Assembla provides both wiki and subversion, which are more suitable for team projects.

Wednesday, February 27, 2008

New clustering algo

I am going to replace current clustering algo inherited from Zeyar's with Neighbor joining. K-means or k-medoids cannot be applied because the points' coordinates are not known, only the distance matrix is known.

Sunday, December 30, 2007

Method to get around the problems

As mentioned in the last post, there are two problems to be solved.
1. Store the distances separately for each matrix. Read the necessary files when needed.
2. Since the ordering matters, a matrix is randomly picked to cluster.

Two problems

I carried on with my own matrix comparison method, in which sliding window is used. Two problems have been identified:
1. I have only computed the distance between A and B where A is before B by name. The distance between B and A should also be computed.
2. In clustering, the ordering of the matrices' appearance is important. For instance, A, B and C are three matrices. Assume A is a single-member cluster. The distance between A and C is bigger than threshold (they are not in the same cluster). And that d(B, C) < d(A,B)< threshold. d(B,C) and d(A,B) are the smallest two distances between matrix B and others. If B appears before C, then B is clustered with A (d(A,B) satisfies the two conditions). If C appears before B, a new cluster is built with a single member C, and when B is examined, B will be clustered with C, since their distance is the smallest.

Friday, October 19, 2007

stuck a bit

After discussion with Prof. Tan, I found the clustering method is not a good method to predict protein structure, because clustering is based on matrices derived from 3-d structures, and the primary sequences are simply ignored. Unless after clustering analyisis, we could find substantial similarites among members' primary sequence, there is no conclusion of the correlation between the primary sequence and 3-d structrue. In other words, we cannot predict the struture given the clustering information.

So now, I am going to change my goal of predicting protein. The work flow should be as follows:
1.
Rerun Zeyar's program to get a correct output.
Meanwhile, analyze the clusters from the clusters directory.

2.Try split the matrices into 10x10 matrices and use one clustering, and see if can get a better clustering. CM--split-->sub matrices->clusters of sub matrices->bit vectors->clusters of bit vectors->clusters of CM->cluster analysis->the applicatioin mentioned in Zeyar's paper/thesis.