CRUSH: Controlled, Scalable, Decent

CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data
Emerging large scale distributed storage systems are faced with the task of distributing petabytes of data among tens or hundreds of thousands of storage devices. Such systems must evenly distribute data and workload to efficiently utilize available resources and maximize system performance, while facilitating system growth and managing hardware failures. We have developed CRUSH, a scalable pseudorandom data distribution function designed for distributed object-based storage systems that efficiently maps data objects to storage devices without relying on a central directory. Because large systems are inherently dynamic, CRUSH is designed to facilitate the addition and removal of storage while minimizing unnecessary data movement. The algorithm accommodates a wide variety of data replication and reliability mechanisms and distributes data in terms of userdefined policies that enforce separation of replicas across failure domains.
1. Introduction
Object-based storage is an emerging architecture that promises improved manageability, scalability, and performance [Azagury et al. 2003]. Unlike conventional blockbased hard drives, object-based storage devices (OSDs) manage disk block allocation internally, exposing an interface that allows others to read and write to variably-sized, named objects. In such a system, each file’s data is typically striped across a relatively small number of named objects distributed throughout the storage cluster. Objects are replicated across multiple devices (or employ some other data redundancy scheme) in order to protect against data loss in the presence of failures. Object-based storage systems simplify data layout by replacing large block lists with small object lists and distributing the low-level block allocation problem. Although this vastly improves scalability by reducing file allocation metadata and complexity, the fundamental task of distributing data among thousands of storage devices—typically with varying capacities and performance characteristics—remains. Most systems simply write new data to underutilized devices. The fundamental problem with this approach is that data is rarely, if ever, moved once it is written. Even a perfect distribution will become imbalanced when the storage system is expanded, because new disks either sit empty or contain only new data. Either old or new disks may be busy, depending on the system workload, but only the rarest of conditions will utilize both equally to take full advantage of available resources. A robust solution is to distribute all data in a system randomly among available storage devices. This leads to a probabilistically balanced distribution and uniformly mixes old and new data together. When new storage is added, a random sample of existing data is migrated onto new storage devices to restore balance. This approach has the critical advantage that, on average, all devices will be similarly loaded, allowing the system to perform well under any potential workload [Santos et al. 2000]. Furthermore, in a large storage system, a single large file will be randomly distributed across a large set of available devices, providing a high level of parallelism and aggregate bandwidth. However, simple hashbased distribution fails to cope with changes in the number of devices, incurring a massive reshuffling of data. Further, existing randomized distribution schemes that decluster replication by spreading each disk’s replicas across many other devices suffer from a high probability of data loss from coincident device failures. We have developed CRUSH (Controlled Replication Under Scalable Hashing), a pseudo-random data distribution algorithm that efficiently and robustly distributes object replicas across a heterogeneous, structured storage cluster. CRUSH is implemented as a pseudo-random, deterministic function that maps an input value, typically an object or object group identifier, to a list of devices on which to store object replicas. This differs from conventional approaches in that data placement does not rely on any sort of per-file or per-object directory—CRUSH needs only a compact, hierarchical description of the devices comprising the storage cluster and knowledge of the replica placement policy. This approach has two key advantages: first, it is completely distributed such that any party in a large system can independently calculate the location of any object; and second, what little metadata is required is mostly static, changing only when devices are added or removed. CRUSH is designed to optimally distribute data to utilize available resources, efficiently reorganize data when storage devices are added or removed, and enforce flexible constraints on object replica placement that maximize data safety in the presence of coincident or correlated hardware failures. A wide variety of data safety mechanisms are supported, including n-way replication (mi

CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data
Emerging large scale distributed storage systems are faced with the task of distributing petabytes of data among tens or hundreds of thousands of storage devices. Such systems must evenly distribute data and workload to efficiently utilize available resources and maximize system performance, while facilitating system growth and managing hardware failures. We have developed CRUSH, a scalable pseudorandom data distribution function designed for distributed object-based storage systems that efficiently maps data objects to storage devices without relying on a central directory. Because large systems are inherently dynamic, CRUSH is designed to facilitate the addition and removal of storage while minimizing unnecessary data movement. The algorithm accommodates a wide variety of data replication and reliability mechanisms and distributes data in terms of userdefined policies that enforce separation of replicas across failure domains.
1. Introduction 
Object-based storage is an emerging architecture that promises improved manageability, scalability, and performance [Azagury et al. 2003]. Unlike conventional blockbased hard drives, object-based storage devices (OSDs) manage disk block allocation internally, exposing an interface that allows others to read and write to variably-sized, named objects. In such a system, each file’s data is typically striped across a relatively small number of named objects distributed throughout the storage cluster. Objects are replicated across multiple devices (or employ some other data redundancy scheme) in order to protect against data loss in the presence of failures. Object-based storage systems simplify data layout by replacing large block lists with small object lists and distributing the low-level block allocation problem. Although this vastly improves scalability by reducing file allocation metadata and complexity, the fundamental task of distributing data among thousands of storage devices—typically with varying capacities and performance characteristics—remains. Most systems simply write new data to underutilized devices. The fundamental problem with this approach is that data is rarely, if ever, moved once it is written. Even a perfect distribution will become imbalanced when the storage system is expanded, because new disks either sit empty or contain only new data. Either old or new disks may be busy, depending on the system workload, but only the rarest of conditions will utilize both equally to take full advantage of available resources. A robust solution is to distribute all data in a system randomly among available storage devices. This leads to a probabilistically balanced distribution and uniformly mixes old and new data together. When new storage is added, a random sample of existing data is migrated onto new storage devices to restore balance. This approach has the critical advantage that, on average, all devices will be similarly loaded, allowing the system to perform well under any potential workload [Santos et al. 2000]. Furthermore, in a large storage system, a single large file will be randomly distributed across a large set of available devices, providing a high level of parallelism and aggregate bandwidth. However, simple hashbased distribution fails to cope with changes in the number of devices, incurring a massive reshuffling of data. Further, existing randomized distribution schemes that decluster replication by spreading each disk’s replicas across many other devices suffer from a high probability of data loss from coincident device failures. We have developed CRUSH (Controlled Replication Under Scalable Hashing), a pseudo-random data distribution algorithm that efficiently and robustly distributes object replicas across a heterogeneous, structured storage cluster. CRUSH is implemented as a pseudo-random, deterministic function that maps an input value, typically an object or object group identifier, to a list of devices on which to store object replicas. This differs from conventional approaches in that data placement does not rely on any sort of per-file or per-object directory—CRUSH needs only a compact, hierarchical description of the devices comprising the storage cluster and knowledge of the replica placement policy. This approach has two key advantages: first, it is completely distributed such that any party in a large system can independently calculate the location of any object; and second, what little metadata is required is mostly static, changing only when devices are added or removed. CRUSH is designed to optimally distribute data to utilize available resources, efficiently reorganize data when storage devices are added or removed, and enforce flexible constraints on object replica placement that maximize data safety in the presence of coincident or correlated hardware failures. A wide variety of data safety mechanisms are supported, including n-way replication (mi

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

Đè BẸP: Kiểm soát, khả năng mở rộng, phân cấp vị trí của các sao chép dữ liệuNổi lên quy mô lớn, Hệ thống phân phối đang phải đối mặt với nhiệm vụ phân phối petabytes dữ liệu trong số hàng chục hoặc hàng trăm ngàn thiết bị lưu trữ. Hệ thống như vậy đều phải phân phối dữ liệu và khối lượng công việc hiệu quả sử dụng nguồn lực sẵn có và tối đa hóa hiệu năng hệ thống, trong khi tạo điều kiện cho sự phát triển của hệ thống và quản lý các lỗi phần cứng. Chúng tôi đã phát triển lòng, một dữ liệu ngẫu nhiên ảo qua khả năng mở rộng phân phối chức năng được thiết kế cho hệ thống phân phối dựa trên đối tượng lưu trữ hiệu quả bản đồ dữ liệu đối tượng để thiết bị lưu trữ mà không dựa vào một thư mục Trung. Bởi vì hệ thống lớn vốn đã năng động, lòng được thiết kế để tạo điều kiện bổ sung và loại bỏ các lí đồng thời giảm thiểu phong trào dữ liệu không cần thiết. Các thuật toán chứa một loạt các dữ liệu sao nhân bản và độ tin cậy cơ chế và phân phối dữ liệu về chính sách userdefined thi hành tách bản sao qua các tên miền thất bại.1. giới thiệu Object-based storage is an emerging architecture that promises improved manageability, scalability, and performance [Azagury et al. 2003]. Unlike conventional blockbased hard drives, object-based storage devices (OSDs) manage disk block allocation internally, exposing an interface that allows others to read and write to variably-sized, named objects. In such a system, each file’s data is typically striped across a relatively small number of named objects distributed throughout the storage cluster. Objects are replicated across multiple devices (or employ some other data redundancy scheme) in order to protect against data loss in the presence of failures. Object-based storage systems simplify data layout by replacing large block lists with small object lists and distributing the low-level block allocation problem. Although this vastly improves scalability by reducing file allocation metadata and complexity, the fundamental task of distributing data among thousands of storage devices—typically with varying capacities and performance characteristics—remains. Most systems simply write new data to underutilized devices. The fundamental problem with this approach is that data is rarely, if ever, moved once it is written. Even a perfect distribution will become imbalanced when the storage system is expanded, because new disks either sit empty or contain only new data. Either old or new disks may be busy, depending on the system workload, but only the rarest of conditions will utilize both equally to take full advantage of available resources. A robust solution is to distribute all data in a system randomly among available storage devices. This leads to a probabilistically balanced distribution and uniformly mixes old and new data together. When new storage is added, a random sample of existing data is migrated onto new storage devices to restore balance. This approach has the critical advantage that, on average, all devices will be similarly loaded, allowing the system to perform well under any potential workload [Santos et al. 2000]. Furthermore, in a large storage system, a single large file will be randomly distributed across a large set of available devices, providing a high level of parallelism and aggregate bandwidth. However, simple hashbased distribution fails to cope with changes in the number of devices, incurring a massive reshuffling of data. Further, existing randomized distribution schemes that decluster replication by spreading each disk’s replicas across many other devices suffer from a high probability of data loss from coincident device failures. We have developed CRUSH (Controlled Replication Under Scalable Hashing), a pseudo-random data distribution algorithm that efficiently and robustly distributes object replicas across a heterogeneous, structured storage cluster. CRUSH is implemented as a pseudo-random, deterministic function that maps an input value, typically an object or object group identifier, to a list of devices on which to store object replicas. This differs from conventional approaches in that data placement does not rely on any sort of per-file or per-object directory—CRUSH needs only a compact, hierarchical description of the devices comprising the storage cluster and knowledge of the replica placement policy. This approach has two key advantages: first, it is completely distributed such that any party in a large system can independently calculate the location of any object; and second, what little metadata is required is mostly static, changing only when devices are added or removed. CRUSH is designed to optimally distribute data to utilize available resources, efficiently reorganize data when storage devices are added or removed, and enforce flexible constraints on object replica placement that maximize data safety in the presence of coincident or correlated hardware failures. A wide variety of data safety mechanisms are supported, including n-way replication (mi

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

CRUSH: kiểm soát, khả năng mở rộng, phân cấp Vị trí của các nhân rộng dữ liệu
mới nổi hệ thống phân phối lưu trữ quy mô lớn đang phải đối mặt với nhiệm vụ phân phối hàng petabyte dữ liệu trong số hàng chục hoặc hàng trăm hàng ngàn thiết bị lưu trữ. Hệ thống như vậy đều phải phân phối dữ liệu và khối lượng công việc để tận dụng hiệu quả các nguồn lực sẵn có và tối đa hóa hiệu năng hệ thống, trong khi tạo điều kiện cho sự phát triển hệ thống và quản lý các lỗi phần cứng. Chúng tôi đã phát triển Crush, một chức năng phân phối dữ liệu giả ngẫu nhiên khả năng mở rộng được thiết kế cho các hệ thống lưu trữ dựa trên đối tượng phân phối có hiệu quả các bản đồ đối tượng dữ liệu với các thiết bị lưu trữ mà không cần dựa vào một thư mục trung tâm. Bởi vì các hệ thống lớn vốn đã năng động, CRUSH được thiết kế để tạo thuận lợi cho việc bổ sung và loại bỏ các lưu trữ trong khi giảm thiểu di chuyển dữ liệu không cần thiết. Các thuật toán chứa một loạt các cơ chế sao chép dữ liệu và độ tin cậy và phân phối dữ liệu trong các điều khoản của chính sách userdefined rằng thực thi tách các bản sao trên các lĩnh vực thất bại.
1. Giới thiệu
lưu trữ đối tượng dựa trên một kiến trúc mới nổi, hứa hẹn sẽ cải thiện khả năng quản lý, khả năng mở rộng, và hiệu suất [Azagury et al. 2003]. Không giống như các ổ đĩa cứng blockbased thông thường, thiết bị lưu trữ dựa trên đối tượng (OSDs) quản lý phân bổ khối đĩa nội bộ, phơi bày một giao diện cho phép người khác đọc và viết thư cho variably cỡ, tên đối tượng. Trong một hệ thống như vậy, dữ liệu của từng tập tin thường được sọc trên một số lượng tương đối nhỏ của các đối tượng có tên là phân phối khắp các cluster lưu trữ. Đối tượng được nhân rộng trên nhiều thiết bị (hoặc sử dụng một số chương trình dự phòng dữ liệu khác) để bảo vệ chống mất mát dữ liệu trong sự hiện diện của những thất bại. Hệ thống lưu trữ dựa trên đối tượng đơn giản hóa cách bố trí dữ liệu bằng cách thay thế danh sách khối lớn với danh sách đối tượng nhỏ và phân phối các vấn đề phân bổ khối ở mức độ thấp. Mặc dù điều này bao la cải thiện khả năng mở rộng bằng cách giảm tập tin siêu dữ liệu phân bổ và phức tạp, nhiệm vụ cơ bản của phân phối dữ liệu trong số hàng ngàn các thiết bị lưu trữ thường với khả năng và hiệu suất đặc-cốt khác nhau. Hầu hết các hệ thống chỉ đơn giản là ghi dữ liệu mới cho các thiết bị sử dụng đúng mức. Các vấn đề cơ bản của phương pháp này là dữ liệu là hiếm khi, nếu bao giờ hết, di chuyển khi nó được viết ra. Ngay cả một phân phối hoàn hảo sẽ trở nên mất cân bằng khi các hệ thống lưu trữ được mở rộng, vì đĩa mới hoặc ngồi trống rỗng hoặc có chứa dữ liệu mới. Hoặc là đĩa cũ hoặc mới có thể bận rộn, tùy thuộc vào khối lượng công việc của hệ thống, nhưng chỉ hiếm nhất của điều kiện sẽ sử dụng cả hai đều tận dụng đầy đủ các nguồn lực sẵn có. Một giải pháp mạnh mẽ là để phân phối tất cả các dữ liệu trong một hệ thống ngẫu nhiên trong số các thiết bị lưu trữ có sẵn. Điều này dẫn đến một phân phối xác suất cân bằng và thống nhất trộn dữ liệu cũ và mới với nhau. Khi lưu trữ mới được thêm vào, một mẫu ngẫu nhiên các dữ liệu hiện có được di cư vào các thiết bị lưu trữ mới để khôi phục lại sự cân bằng. Cách tiếp cận này có lợi thế quan trọng rằng, trung bình, tất cả các thiết bị sẽ được nạp tương tự, cho phép hệ thống thực hiện tốt dưới bất kỳ khối lượng công việc tiềm năng [Santos et al. 2000]. Hơn nữa, trong một hệ thống lưu trữ lớn, một tập tin lớn duy nhất sẽ được phân phối ngẫu nhiên trên một tập hợp lớn các thiết bị có sẵn, cung cấp một mức độ cao về xử lý song song và băng thông tổng hợp. Tuy nhiên, phân phối hashbased đơn giản không để đối phó với những thay đổi về số lượng các thiết bị, phát sinh một xáo trộn lớn dữ liệu. Phương án phân phối đó, hiện có ngẫu nhiên decluster nhân rộng bằng cách lây lan các bản sao của mỗi đĩa trên nhiều thiết bị khác bị một xác suất cao của sự mất mát dữ liệu từ thiết bị thất bại trùng. Chúng tôi đã phát triển CRUSH (Replication kiểm soát Theo Scalable Băm), một thuật toán phân phối dữ liệu giả ngẫu nhiên có hiệu quả và mạnh, phân phối bản sao đối tượng trên một cluster lưu trữ có cấu trúc không đồng nhất. Crush được thực hiện như một, chức năng xác định giả ngẫu nhiên mà các bản đồ giá trị đầu vào, thường là một định danh đối tượng hoặc nhóm đối tượng, với một danh sách các thiết bị trên đó để lưu trữ bản sao đối tượng. Điều này khác với các phương pháp thông thường trong đó vị trí dữ liệu không dựa trên bất kỳ loại của mỗi tập tin hoặc mỗi đối tượng thư mục-CRUSH chỉ cần một, mô tả phân cấp nhỏ gọn của thiết bị bao gồm các cụm lưu trữ và kiến thức về các chính sách vị trí bản sao. Cách tiếp cận này có hai ưu điểm chính: thứ nhất, nó là hoàn toàn phân tán bất kỳ bên nào trong một hệ thống lớn có thể độc lập tính toán vị trí của bất kỳ đối tượng; và thứ hai, những gì siêu dữ liệu ít được yêu cầu là chủ yếu là tĩnh, chỉ thay đổi khi thiết bị được thêm vào hoặc gỡ bỏ. Crush được thiết kế để tối ưu phân phối dữ liệu để tận dụng nguồn lực sẵn có, hiệu quả tổ chức lại dữ liệu khi thiết bị lưu trữ được thêm hoặc gỡ bỏ, và thực thi các chế linh hoạt về vị trí đặt bản sao đối tượng đó tối đa hóa an toàn dữ liệu trong sự hiện diện của trùng hoặc lỗi phần cứng liên quan. Một loạt các cơ chế an toàn dữ liệu được hỗ trợ, bao gồm sao chép n chiều (mi

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.