Parallel Computing System for ecient computation of Molecular Similarity based on Negative Electrostatic Potential: First results Raul Torres1 1 Grupo de Química Teórica - Universidad Nacional de Colombia

Research Seminar, 2009

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

TARIS Method

Figure: A. Isopotential Surface Size B. Isopotencial value

Raul Torres

Each node saves:

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Data set

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Classication

The similarity matrix obtained with the GPU computing process (CUDA) was analyzed by means of hierarchical clustering using the average linkage method(R Statistical Package).

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

General Representation of molecules Every node is represented by [] characters When a node has children, each child is established inside the [] of the parent: [ [][] ] If a weight is associated to a non-leaf node, this value is written after the rst [: [45,889[78,76[][]][987,5[][]] The leaf nodes have no weight associated We propose a canonical representation The sub-trees with more nodes are translated rst Next the sub-trees with more levels are listed rst Next the sub-trees with greater weight are listed rst

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Proposed Kernel Simple kernel:

k (x , y ) = A more complex kernel:

∑ nums (x )nums (y )

s ∈B

∑s ∈B nums (x )nums (y )ws k (x , y ) ws is 1 if wy and wx are 0 (wy and wx are the respective weights of x and y trees) ws = wwyx if wx ≤ wy Otherwise, ws = wwyx k w (x , y ) =

B is the set of balanced sub-string [...] Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Proposed Kernel

The weights can be calculated in 9 dierent ways The process is reduced to nd the number of balanced sub-strings founded in both molecules;in other words, a sub-tree. The leaf nodes are not counted as sub-trees The whole string is considered a sub-string

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Proposed Kernel

Tree representation: The red and green circles denote sub-trees that appears in both trees. In this case, the simple kernel is 2 Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Proposed Kernel

String representation: The gray area is the same green circle in the previous gure. The red square is related to the red circle too.

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Hardware

CPU: AMD Athlon X2 64 Bits RAM: 2 GB GPU: Geforce 8200 M (CUDA Enabled)

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Parallel programming considerations

The general process is executed over the CPU Host Code: C++

The string comparison process is made in parallel Device Code: CUDA for C

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Experimental congurations Variables (to construct the weight): Isopotential surface size: ISS Isosurface value: IV Factorial design (32 experiments), three states (N)Don't use (A)Accumulated: the summation of all the values of each node gives the weight for the sub-tree (S)Simple: the value of the root node of the sub-tree gives the weight for the sub-tree

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Experimental congurations

W0: There are no weights. Only the structure is important. (N)ISS x (N) IV W1: Accumulated isopotential surface size. (A)ISS x (N) IV W2: Accumulated isopotential surface size times Accumulated isosurface value. (A)ISS x (A) IV W3: Simple isopotential surface size. (S)ISS x (N) IV W4: Accumulated isosurface value. (N)ISS x (A) IV

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Experimental congurations

W5: Simple isosurface value. (N)ISS x (S) IV W6: Accumulated isopotential surface size times Simple isosurface value. (A)ISS x (S) IV W7: Simple isopotential surface size times Simple isosurface value. (S)ISS x (S) IV W8: Simple isopotential surface size times Accumulated isosurface value. (S)ISS x (A) IV

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Experiment W3

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Experiment W8

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Experiment W7

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Experiment W2

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Experiment W0

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Execution time

In general, the execution time is approximately 2 seconds.

Raul Torres

Parallel Computing System for ecient computation of Mo

Methodology Results Conclusions More information

Conclusions We don´t need to accumulate the sizes of the children In general terms, the kernel method used achieves a good classication The pre-assumptions that a string kernel can be applied to tree-like structured data was veried The next eorts of this research will be focused in the application of a kernel that uses the co-rooted tree and a more robust representation of strings named sux tree. The use of CUDA as a programming environment has allow us to perform a several concurrent operations in a fast way than in the serial paradigm Without the need of a cluster of computers, GPU Computing oers a tremendous computational power at low cost. Raul Torres

Parallel Computing System for ecient computation of Mo

Appendix

For Further Reading

For Further Reading I

Raul Torres

Parallel Computing System for ecient computation of Mo