Performance Issues and Optimizations for Block-level Network Storage M. Marazakis, V. Papaefstathiou and A. Bilas Networked Storage using Commodity Hardware Motivation:

Challenges for Commodity: • Systems software overheads • Increased complexity • Hardware-software orchestration • Performance gap Approach: • Minimal NIC architecture, 10 Gbps

– Primary networked storage subsystems – Consolidation of storage in one subsystem – Single interconnect for application and storage nodes

Concerns: – Cost & End-to-end throughput demands – Transparent access to storage

State-of-the-art:

– RDMA-Write descriptors, request queue, Notification capabilities

– Specialized controllers and interconnects – Expensive and difficult to scale – High incremental cost for increase in performance

• Block-level remote I/O protocol (kernel-space)

Objectives: • Understand the performance gap – By measurements on real system

• Address the associated problems – By applying targeted optimizations

Performance-limiting Factors:: • %CPU util at both Initiator & Target (#IRQs) • Inefficient use of NIC (# small RDMAs) • Initiator & Target execute either send-path or recv-path (exclusively / ctxsw via IRQ) • Disk & network transfers interfere at Target

Experimental System Platform

Experimental Results I/O Throughput Sequential READ

I/O Throughput Sequential WRITE 450

500

•Aggregation of consecutive small RDMAs

400 LOCAL(8_SATA_RAID0)

350 LOCAL(8_SATA_RAID0)

MB/sec

MB/sec

400 REMOTE(8_SATA_RAID0) BASE

300

REMOTE(8_SATA_RAID0) OPT REMOTE(RAMDISK) OPT

200

REMOTE(8_SATA_RAID0) BASE

300 250

REMOTE(8_SATA_RAID0) OPT

200

REMOTE(RAMDISK) OPT

150

100

100 50

0 64

128

256

Optimizations:: •IRQ batching at both Initiator & Target

500

•Asynchronous event processing in the send-path •Polling to capture “events” as soon as possible after completing normal sendpath processing

0

512

64

I/O Request Size (KB)

128

256

I/O Request Size (KB)

512

•Both at Initiator & Target

Summary of Results Quantify overheads at high network speeds & examine various protocol optimization techniques on a real system I/O Throughput: High-score Summary

configuration

REMOTE(8_SATA_RAID0) REMOTE(RAMDISK) FAKE(I + T) FAKE(I) 0

100

200

300

400

500

600

700

MB/sec FAKE(I)

FAKE(I + T)

REMOTE(RAMDISK)

REMOTE(8_SATA_RAID0)

reference

626

560

550

450

achieved

560

550

474

290

References • “Efficient remote block-level I/O over an RDMA-capable NIC”, Proceedings of ACM ICS’06 • “Experiences from Debugging a PCIX-based RDMA-capable NIC”, Proceedings of RAIT’06 workshop (in conjunction with IEEE Cluster’06) • “Optimization and Bottleneck Analysis of Network Block I/O in Commodity Storage Systems”, Proceedings of ACM ICS’07

Foundation for Research and Technology – Hellas (FORTH) Institute of Computer Science (ICS)

Computer Architecture & VLSI Laboratory (CARV)

Performance Issues and Optimizations for Block-level ...

Computer Architecture & VLSI Laboratory (CARV). Institute of Computer Science (ICS). Performance Issues and Optimizations for. Block-level Network Storage.

147KB Sizes 0 Downloads 336 Views

Recommend Documents

Performance Issues and Optimizations for Block-level ...
Institute of Computer Science (ICS), Foundation for Research and ... KEYWORDS: block-level I/O; I/O performance optimization; RDMA; commodity servers.

Autotuning Skeleton-Driven Optimizations for Transactional Worklist ...
such as routing, computer graphics, and networking [15], ...... PUC Minas in 2004 and his B.Sc. in Computer ... of Computer Science at the University of Edin-.

Training Budget Benchmarks and Optimizations for 2017 ... - Litmos
develop one hour of training., but we are now in an environ- ment where learning is ... in-person instructor-led training program, several hours for an. eLearning ...

Performance Issues for Parallel Implementations of ...
Performance Issues for Parallel Implementations of Bootstrap Simulation Algorithm. 22nd International Symposium on Computer Architecture and High ...

Concurrency-aware compiler optimizations for hardware description ...
semantics, we extend the data flow analysis framework to concurrent threads. .... duce two auxiliary concepts—Event Vector and Sensitivity Vector—in section 6, ...

Implementation and Performance Evaluation Issues of Privacy Policies ...
In this paper we study about social network theory and privacy challenges which affects a secure range of ... In recent years online social networking has moved from niche phenomenon to mass adoption. The rapid .... OSN users are leveraged by governm

Implementation and Performance Evaluation Issues of Privacy Policies ...
In this paper we study about social network theory and privacy challenges which affects ... applications, such as recommender systems, email filtering, defending ...

Branch Prediction Techniques and Optimizations
prediction techniques provide fast lookup and power efficiency .... almost as fast as global prediction. .... efficient hash functions enable the better use of PHT.

memory optimizations of embedded applications for ...
pad memories (spms), which we call L0 instruction spms. ...... showing that direct-mapped filter caches outperform 4-way associative filter caches ...... 5When I presented a shorter version of this chapter at a conference, one of the most common.

Optimizations and enhancements to the IEEE RSTP 802.1 W ...
Feb 1, 2011 - 1D standard was designated at a time where recovering network connectivity within about 60 seconds after an outage was considered as .... port receiving the best Bridge Protocol Data Unit (BPDU) on a bridge is a “root port”. ... as

Code Generator Optimizations for the ST120 DSP-MCU ...
the address computations from the data computations. De- coupled .... following SLIW “groupings” (SLIW bundle templates):. Group .... definition sharing the same register. Another ..... part by the hardware loop mapping and the IF-conversion,.

Code Generator Optimizations for the ST120 DSP-MCU ...
Permission to make digital or hard copies of all or part of this work for personal or classroom use is .... In the SLIW mode, the data dependences are scoreboarded, provided they hold ...... servo Hard disk drive digital control loop. efr 5.1.0 ETSI 

Optimizations in speech recognition
(Actually the expected value is a little more than $5 if we do not shuffle the pack after each pick and you are strategic). • If the prize is doubled, you get two tries to ...

Concurrency-aware compiler optimizations for ... - Research at Google
Any reduction in simulation time directly leads to productivity ...... In Proceedings of the 32nd Annual ACM/IEEE Design Automation Conference, 151–156.

WAN Optimizations in Vehicular Networking
DISCLAIMER AND LEGAL INFORMATION. All opinions expressed in this document are those of the authors individually and are not reflective or indicative of the opinions and positions of the authors' employers. The technology described in this document is

With optimizations and testing, Pegasus Airlines ... Services
Performance marketing, analytics and business intelligence consultancy. • Headquarters: İstanbul. • www.hypeistanbul.com. © 2017 Google Inc. All rights reserved. Google and the Google logo are trademarks of Google Inc. All other company and pro

PDF Review Feminism: Issues Arguments: Issues and ...
‘Call Us Ms ’ Viva and arguments for Kenyan women s respectable ... access to affordable healthcare childcare and education reproductive BibMe Free ...