HPCs Which, and how?

1

Opportunistic resources

Some centers have computing resources which are willing to contribute to LHCb, but are not part of WLCG LHCb would like to exploit all of them in an OPPORTUNISTIC way

2

Opportunistic means... Opportunistic, in THEORY, means: ● from time to time: “grab it until you can” but sometimes we are granted a quota

● for “free” ● with no guaranteed support not only volunteer computing (BOINC): ● Local clusters? ● HPC?

3

HPCs for LHCb? ● There are many HPCs out there ○ Many are different from one to the other ○ quite different paradigm from the HTC we do on “the Grid” ■ ...and we use HPCs “like HTCs” ○ not all of them are “so, so used” ■ ■

utilization may not be super-high and especially, not continuous

○ and to some, we can have access ■ ■

4

Not as many as Atlas… … anyway, we already run jobs, in productions on 2 HPCs ● and we’d like to add at least one more

DIRAC.OSC.us osc.edu 4 Gb/core guaranteed

~easy setup! 5

LCG.CSCS-HPC.ch cscs.ch swiss national supercomputing centre (3rd in top500 list) a LCG site, for us (ARC CE + slurm) … so this was “transparent” for us

6

Schematic view of “the Grid”

7

Job running on “the Grid”

8

~easy integration when ● ● ● ● ●

WNs have inbound/outbound connectivity LHCb CVMFS mounted on the WNs SLC6 “compatible” At least 2GB/core x86 This is the case for OSC and CSCS

When some of the requirements above are not met, we can try to go around them, but this requires dedicated work (and anyway it may not be possible, case by case) 9

Santos Dumont (LNCC) HPC at LNCC

sdumont nicely documented in Portuguese only :)

504 B710 computing nodes (thin node), where each node has ● ● ●

2 x Intel Xeon E5-2695v2 Ivy Bridge CPU, 2.4GHZ 24 cores (12 per CPU), totaling 12,096 cores 64GB DDR3 RAM

198 B715 computing nodes (thin node) with K40 GPUs, where each node has: ● ● ● ● ...

10

2 x Intel Xeon E5-2695v2 Ivy Bridge CPU, 2.4GHZ 24 cores (12 per CPU), totaling 4,752 cores 64GB DDR3 RAM 2 x Nvidia K40 (GPU device)

Santos Dumont (LNCC) /2 ... 54 B715 computing nodes (thin node) with Xeon Phi co-processors, where each node has: ● ● ● ●

2 x Intel Xeon E5-2695v2 Ivy Bridge CPU, 2.4GHZ 24 cores (12 per CPU), totaling 1,296 cores 64GB DDR3 RAM 2 x Xeon PHI 7120 (MIC device)

1 MESCA2 computing “fat” node: ● ● ●

16 x Intel Ivy CPU, 2.4GHZ 240 cores (15 per CPU) 6 TB of RAM

The 756 nodes of SDumont are interconnected by an Infiniband FDR interconnect network, with the following technical configurations: ● ●

1,944 doors 58Gb / s and 0.7us per port

Total Throughput = 112,752 Gb / s Flow per port = 137 million messages per second ... 11

Santos Dumont (LNCC) /3 … Finally, SDumont has a Lustre parallel file system, integrated with the Infiband network, with a gross storage capacity of 1.7 PBytes, as well as a secondary file system with a gross capacity of 640 TBytes.

And worker nodes have no external connectivity. We have a friendly deal to grab some of its resources. And they installed CVMFS for us. 12

WNs within “closed doors”

13

WNs within “closed doors”

And we have been allowed to do it.

14

Zooming in... LHCbDIRAC v9r0

Login DIRAC Server installation

DIRAC services

J o b

Gateway J o b

pilot wrapper

Site Director

DIRAC SE proxy (?)

Worker Node pilot

Job

J o b

J o b

RequestExecutingAgent

ReqManager

DIRAC SE

The rest looks quite similar to our BOINC server setup (without security complications)

Computing site

SE SE 15

Status and To-Do Status: ● We have a “personal” login and I used to simply match a “hello world” job To-Do: ● First “installation” ● Testing, “certification” ● Needs administration ○ yet another installation ○ not only baby sitting ○ experience shows we’ll probably see scaling issues 16

Questions

? 17

LCG.CSCS-HPC.ch -

There are many HPCs out there. ○ Many are different from one to the other. ○ quite different paradigm from the HTC we do on “the. Grid”. □ ...and we use ...

870KB Sizes 0 Downloads 147 Views

Recommend Documents

No documents