6.4 ARCHITECTURAL CHOICESAnalytical

6.4 ARCHITECTURAL CHOICES
Analytical environments are deployed in different architectural models. Even on parallel platforms, many databases are built on a sharedeverything approach in which the persistent storage and memory components are all shared by the different processing units. A shared-disk approach may have isolated processors, each with its own memory, but the persistent storage on disk is still shared across the system.
These types of architectures are layered on top of SMP machines. While there may be applications that are suited to this approach, there are bottlenecks that exist because of the sharing, because all I/O and memory requests are transferred (and satisfied) over the same bus. As more processors are added, the synchronization and communication needs increase exponentially, and therefore the bus is less able to handle the increased need for bandwidth. This means that unless the need for bandwidth is satisfied, there will be limits to the degree of scalability.
In contrast, in a shared-nothing approach, each processor has its own dedicated disk storage. This approach, which maps nicely to an MPP architecture, is not only more suitable to discrete allocation and distribution of the data, it enables more effective parallelization, and consequently does not introduce the same kind of bus bottlenecks from which the SMP/shared-memory and shared-disk approaches suffer.

6.5 CONSIDERING PERFORMANCE CHARACTERISTICS (55)
When it comes to big data, both the software and the hardware approaches are appealing to the nascent large-scale data analysts. However, there may be contrary perceptions of the benefits of selecting one of these approaches over the other, and Table 6.2 looks at how each supports some of the desired characteristics listed earlier in this book.

6.6 ROW- VERSUS COLUMN-ORIENTED DATA LAYOUTS AND APPLICATION PERFORMANCE
Awareness of the different latency costs associated with the different levels of the memory hierarchy inform the different ways that data can be stored and shared, especially because the alignment and orientation of data on disk can significantly impact the performance of analytical applications. Most traditional database systems employ a row-oriented layout, in which all the values associated with a specific row are laid out consecutively in memory. That layout may work well for transaction processing applications that focus on updating specific records associated with a limited number of transactions (or transaction steps) at a time.
On the other hand, big data analytics applications scan, aggregate, and summarize over massive datasets. These are manifested as algorithmic scans of are performed using multiway joins; accessing whole rows at a time when only the values of a smaller set of columns are needed may flood the network with extraneous data that is not immediately needed and ultimately will increase the execution time.
In other words, analytical applications and queries will only need to access the data elements needed to satisfy join conditions. With roworiented layouts, the entire record must be read in order to access the required attributes, with significantly more data read than is needed to satisfy the request. Also, the row-oriented layout is often misaligned with the characteristics of the different types of memory systems (core, cache, disk, etc.), leading to increased access latencies. Subsequently, row-oriented data layouts will not enable the types of joins or aggregations typical of analytic queries to execute with the anticipated level of performance (Figure 6.1).
That is why a number of software appliances for big data use a database management system that uses an alternate, columnar layout for data that can help to reduce the negative performance impacts of data latency that plague databases with a row-oriented data layout. The values for each column can be stored separately, and because of this, for any query, the system is able to selectively access the specific

column values requested to evaluate the join conditions. Instead of requiring separate indexes to tune queries, the data values themselves within each column form the index. This speeds up data access while reducing the overall database footprint, while dramatically improving query performance (Figure 6.2).
The simplicity of the columnar approach provides many benefits, especially for those seeking a high-performance environment to meet the growing needs of extremely large analytic datasets, as can be seen by the example facets of performance discussed in Table 6.3.

6.7 CONSIDERING PLATFORM ALTERNATIVES
When considering the different ways of deploying an analytics environment, the key decisions for investing in infrastructure focus on how the platform best meets the expected performance needs. One must be willing to specify key measures for system performance to properly assess scalability requirements for the intended analytical applications to help select a specific architectural approach.
The benefits of using hardware appliances for big data center on engineering and integration. They are engineered for high-performance reporting and analytics, yet have a flexible architecture allowing integrated components to be configured to meet specific application needs.

(tr58)
And while there is a capital investment in machinery, hardware appliances are low cost when compared to massive data warehouse hardware systems.
One benefit of using software appliances, meanwhile, is that they can take advantage of low-cost commodity hardware components. In addition, the reliance on commodity hardware allows a software appliance to be elastic and extensible.
However, you must consider all aspects of the performance needs of the different types of applications: data scalability, user scalability, access and loading speed, the need for workload isolation, reliance on parallelization and optimization, reliability in the presence of failures, the dependence on storage duplication or data distribution and replication, among other performance expectations. Then examine how the performance needs of the different types of applications are addressed by each of the architectures. This will provide a measurable methodology for assessing technology suitability.

6.8 THOUGHT EXERCISES
Given the premise of approaches to appliance architectures, here are some questions and exercises to ponder:
• For the three most typical big data application types, describe your expectations for data storage needs, what type of appliance is best, and what are the data management needs?
• Develop a scoring scale between 1 and 5 (where 1 represents a low need and 5 represents a great need) for each of the variables considered for storage requirements (extensibility, accessibility, fault tolerance, I/O speed, integratability). Rate your three applications using your defined scale.
• What are the variables you would consider for assessing the comparable costs and benefits of a software appliance versus a hardware appliance?

6.5 CONSIDERING PERFORMANCE CHARACTERISTICS (55)
When it comes to big data, both the software and the hardware approaches are appealing to the nascent large-scale data analysts. However, there may be contrary perceptions of the benefits of selecting one of these approaches over the other, and Table 6.2 looks at how each supports some of the desired characteristics listed earlier in this book.

6.6 ROW- VERSUS COLUMN-ORIENTED DATA LAYOUTS AND APPLICATION PERFORMANCE
Awareness of the different latency costs associated with the different levels of the memory hierarchy inform the different ways that data can be stored and shared, especially because the alignment and orientation of data on disk can significantly impact the performance of analytical applications. Most traditional database systems employ a row-oriented layout, in which all the values associated with a specific row are laid out consecutively in memory. That layout may work well for transaction processing applications that focus on updating specific records associated with a limited number of transactions (or transaction steps) at a time.
 On the other hand, big data analytics applications scan, aggregate, and summarize over massive datasets. These are manifested as algorithmic scans of are performed using multiway joins; accessing whole rows at a time when only the values of a smaller set of columns are needed may flood the network with extraneous data that is not immediately needed and ultimately will increase the execution time.
In other words, analytical applications and queries will only need to access the data elements needed to satisfy join conditions. With roworiented layouts, the entire record must be read in order to access the required attributes, with significantly more data read than is needed to satisfy the request. Also, the row-oriented layout is often misaligned with the characteristics of the different types of memory systems (core, cache, disk, etc.), leading to increased access latencies. Subsequently, row-oriented data layouts will not enable the types of joins or aggregations typical of analytic queries to execute with the anticipated level of performance (Figure 6.1).
That is why a number of software appliances for big data use a database management system that uses an alternate, columnar layout for data that can help to reduce the negative performance impacts of data latency that plague databases with a row-oriented data layout. The values for each column can be stored separately, and because of this, for any query, the system is able to selectively access the specific

column values requested to evaluate the join conditions. Instead of requiring separate indexes to tune queries, the data values themselves within each column form the index. This speeds up data access while reducing the overall database footprint, while dramatically improving query performance (Figure 6.2).
The simplicity of the columnar approach provides many benefits, especially for those seeking a high-performance environment to meet the growing needs of extremely large analytic datasets, as can be seen by the example facets of performance discussed in Table 6.3.

6.7 CONSIDERING PLATFORM ALTERNATIVES
When considering the different ways of deploying an analytics environment, the key decisions for investing in infrastructure focus on how the platform best meets the expected performance needs. One must be willing to specify key measures for system performance to properly assess scalability requirements for the intended analytical applications to help select a specific architectural approach.
The benefits of using hardware appliances for big data center on engineering and integration. They are engineered for high-performance reporting and analytics, yet have a flexible architecture allowing integrated components to be configured to meet specific application needs.

(tr58)
And while there is a capital investment in machinery, hardware appliances are low cost when compared to massive data warehouse hardware systems.
One benefit of using software appliances, meanwhile, is that they can take advantage of low-cost commodity hardware components. In addition, the reliance on commodity hardware allows a software appliance to be elastic and extensible.
However, you must consider all aspects of the performance needs of the different types of applications: data scalability, user scalability, access and loading speed, the need for workload isolation, reliance on parallelization and optimization, reliability in the presence of failures, the dependence on storage duplication or data distribution and replication, among other performance expectations. Then examine how the performance needs of the different types of applications are addressed by each of the architectures. This will provide a measurable methodology for assessing technology suitability.

6.8 THOUGHT EXERCISES
Given the premise of approaches to appliance architectures, here are some questions and exercises to ponder:
• For the three most typical big data application types, describe your expectations for data storage needs, what type of appliance is best, and what are the data management needs?
• Develop a scoring scale between 1 and 5 (where 1 represents a low need and 5 represents a great need) for each of the variables considered for storage requirements (extensibility, accessibility, fault tolerance, I/O speed, integratability). Rate your three applications using your defined scale.
• What are the variables you would consider for assessing the comparable costs and benefits of a software appliance versus a hardware appliance?

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

6.4 SỰ LỰA CHỌN KIẾN TRÚCPhân tích môi trường đã được triển khai trong mô hình kiến trúc khác nhau. Ngay cả trên nền tảng song song, nhiều cơ sở dữ liệu được xây dựng trên một cách tiếp cận sharedeverything trong đó các thành phần liên tục lưu trữ và bộ nhớ được tất cả được chia sẻ bởi các đơn vị xử lý khác nhau. Một cách tiếp cận chia sẻ đĩa có thể đã cô lập bộ vi xử lý, mỗi với bộ nhớ của riêng của nó, nhưng lưu trữ liên tục trên đĩa vẫn còn được chia sẻ trên hệ thống.Các loại kiến trúc lớp trên đầu trang của SMP máy. Trong khi có thể có ứng dụng đó phù hợp với cách tiếp cận này, không có tắc nghẽn tồn tại vì các chia sẻ, bởi vì tất cả các yêu cầu I/O và bộ nhớ được chuyển (và hài lòng) trên xe buýt cùng. Khi thêm bộ vi xử lý được bổ sung, đồng bộ hóa và truyền thông nhu cầu tăng theo cấp số nhân, và do đó xe buýt là ít có khả năng xử lý nhu cầu tăng cho băng thông. Điều này có nghĩa rằng trừ khi cần thiết cho băng thông là hài lòng, sẽ có giới hạn cho mức độ khả năng mở rộng.Ngược lại, trong một cách tiếp cận chia sẻ-không có gì, mỗi bộ vi xử lý đã lưu trữ chuyên dụng đĩa riêng của mình. Cách tiếp cận này, bản đồ độc đáo với một kiến trúc MPP, là không chỉ phù hợp hơn để rời rạc phân bổ và phân phối dữ liệu, nó cho phép hiệu quả hơn parallelization, và do đó không giới thiệu các loại tương tự của tắc nghẽn xe buýt từ đó phương pháp tiếp cận SMP/chia sẻ-bộ nhớ và chia sẻ đĩa bị.6,5 XEM XÉT HIỆU SUẤT ĐẶC TÍNH (55)Khi nói đến dữ liệu lớn, cả hai phần mềm và phần cứng phương pháp tiếp cận hấp dẫn cho các dữ liệu mới xuất hiện quy mô lớn nhà phân tích. Tuy nhiên, để xem nếu có trái nhận thức về những lợi ích của việc lựa chọn một trong những phương pháp tiếp cận hơn khác, và bảng 6.2 nhìn như thế nào mỗi hỗ trợ một số các đặc tính mong muốn được liệt kê trước đó trong cuốn sách này.6.6 HÀNG - SO VỚI BỐ TRÍ CỘT THEO ĐỊNH HƯỚNG DỮ LIỆU VÀ ỨNG DỤNG HIỆU SUẤTNâng cao nhận thức của các chi phí khác nhau độ trễ liên kết với các cấp độ khác nhau của hệ thống phân cấp bộ nhớ thông báo những cách khác nhau rằng dữ liệu có thể được lưu trữ và chia sẻ, đặc biệt là bởi vì các liên kết và định hướng của các dữ liệu trên đĩa có thể đáng kể ảnh hưởng đến hiệu suất của ứng dụng phân tích. Truyền thống đặt cơ sở dữ liệu hệ thống sử dụng một bố trí hàng theo định hướng, trong đó tất cả các giá trị liên quan đến một hàng cụ thể được đặt ra liên tiếp trong bộ nhớ. Bố trí đó có thể làm việc tốt cho các ứng dụng xử lý giao dịch tập trung vào việc Cập Nhật bản ghi cụ thể liên quan đến một số giới hạn giao dịch (hoặc các bước giao dịch) tại một thời điểm. Mặt khác, lớn dữ liệu phân tích ứng dụng quét, tổng hợp, và tóm tắt trong lớn datasets. Chúng được thể hiện như thuật toán quét được thực hiện bằng cách sử dụng tham gia multiway; truy cập vào toàn bộ hàng tại một thời điểm khi chỉ là các giá trị của một tập hợp nhỏ các cột là cần thiết có thể lũ lụt mạng với dữ liệu không liên quan mà không ngay lập tức cần và cuối cùng sẽ làm tăng thời gian thực hiện.Nói cách khác, phân tích ứng dụng và truy vấn sẽ chỉ cần truy cập vào các yếu tố dữ liệu cần thiết để đáp ứng điều kiện tham gia. Với bố trí roworiented, toàn bộ hồ sơ phải được hiểu để truy cập các thuộc tính bắt buộc, với đáng kể thêm dữ liệu đọc hơn cần thiết để đáp ứng yêu cầu. Ngoài ra, bố trí theo định hướng hàng thường là thẳng với các đặc tính của các loại khác nhau của bộ nhớ hệ thống (lõi, bộ nhớ cache, đĩa, vv), dẫn đến tăng truy cập vào latencies. Sau đó, bố trí theo định hướng hàng dữ liệu sẽ không cho phép loại tham gia hoặc lót máy ngành màu điển hình của phân tích truy vấn để thực hiện với mức độ dự đoán hiệu suất (hình 6.1).Đó là lý do tại sao một số phần mềm thiết bị gia dụng lớn dữ liệu sử dụng một hệ thống quản lý cơ sở dữ liệu sử dụng một thay thế, cột bố trí cho dữ liệu có thể giúp làm giảm các tác động tiêu cực hiệu suất của độ trễ dữ liệu cơ sở dữ liệu mà bệnh dịch hạch với một bố trí theo định hướng hàng dữ liệu. Các giá trị cho mỗi cột có thể được lưu trữ một cách riêng biệt, và bởi vì điều này, cho bất kỳ truy vấn, Hệ thống có thể có chọn lọc truy cập cụ thể cột giá trị yêu cầu để đánh giá các điều kiện tham gia. Thay vì yêu cầu riêng biệt chỉ số để điều chỉnh truy vấn, các dữ liệu giá trị bản thân trong mỗi cột thành lập chỉ mục. Điều này tăng tốc độ truy cập dữ liệu trong khi làm giảm dấu chân cơ sở dữ liệu tổng thể, trong khi đáng kể cải thiện hiệu suất truy vấn (con số 6.2).Sự đơn giản của cách tiếp cận cột cung cấp nhiều lợi ích, đặc biệt là cho những người tìm kiếm một môi trường hiệu suất cao đáp ứng nhu cầu ngày càng tăng của rất lớn phân tích datasets, như có thể được nhìn thấy bởi các khía cạnh ví dụ của hiệu suất đã thảo luận ở bảng 6.3.6.7 XEM XÉT LỰA CHỌN THAY THẾ NỀN TẢNGKhi xem xét những cách khác nhau của việc triển khai một môi trường phân tích, các quyết định quan trọng cho đầu tư vào cơ sở hạ tầng tập trung vào làm thế nào nền tảng tốt nhất đáp ứng nhu cầu dự kiến hiệu suất. Một trong những phải được sẵn sàng để xác định các biện pháp quan trọng cho hệ thống hiệu suất để đúng cách đánh giá khả năng mở rộng yêu cầu cho các ứng dụng phân tích dự định để giúp chọn một cách tiếp cận cụ thể kiến trúc.Những lợi ích của việc sử dụng thiết bị phần cứng cho Trung tâm dữ liệu lớn về kỹ thuật và hội nhập. Họ được thiết kế cho hiệu suất cao báo cáo và phân tích, nhưng có một kiến trúc linh hoạt cho phép tích hợp các thành phần được cấu hình để đáp ứng nhu cầu ứng dụng cụ thể. (tr58)Và trong khi không có vốn đầu tư trong máy móc, thiết bị phần cứng là thấp chi phí khi so sánh với hệ thống phần cứng lớn dữ liệu kho bãi.Một lợi ích của việc sử dụng phần mềm thiết bị gia dụng, trong khi đó, là rằng họ có thể tận dụng lợi thế chi phí thấp hàng thành phần phần cứng. Ngoài ra, sự phụ thuộc vào hàng hóa phần cứng cho phép một thiết bị phần mềm đàn hồi và mở rộng.Tuy nhiên, bạn phải xem xét tất cả các khía cạnh của nhu cầu hiệu suất của các loại khác nhau của ứng dụng: dữ liệu khả năng mở rộng, người sử dụng khả năng mở rộng, truy cập và tải tốc độ, cần thiết cho sự cô lập khối lượng công việc, sự phụ thuộc vào parallelization và tối ưu hóa, độ tin cậy sự hiện diện của thất bại, sự phụ thuộc vào phân phối sao chép hoặc dữ liệu lưu trữ và bản sao, trong số khác mong đợi hiệu suất. Sau đó kiểm tra như thế nào nhu cầu hiệu suất của các loại khác nhau của ứng dụng được địa chỉ của mỗi người trong số những kiến trúc. Điều này sẽ cung cấp một phương pháp đo để đánh giá công nghệ phù hợp.6.8 BÀI TẬP TƯ TƯỞNGNhững tiền đề của phương pháp tiếp cận cho kiến trúc thiết bị, dưới đây là một số câu hỏi và bài tập để suy nghĩ về:• Đối với các ba tiêu biểu nhất lớn dữ liệu ứng dụng loại, mô tả mong đợi của bạn cho nhu cầu lưu trữ dữ liệu, loại thiết bị là tốt nhất, và nhu cầu quản lý dữ liệu là gì?• Phát triển quy mô ghi bàn từ 1 đến 5 (nơi 1 đại diện cho một nhu cầu thấp và 5 đại diện cho một nhu cầu rất lớn) cho mỗi của các biến được coi là cầu lí (khả năng mở rộng, khả năng tiếp cận, lỗi khoan dung, tốc độ I/O, integratability). Tỷ lệ của bạn ba ứng dụng bằng cách sử dụng quy mô được xác định của bạn.• Biến bạn sẽ xem xét để đánh giá so sánh chi phí và lợi ích của một thiết bị phần mềm so với một thiết bị phần cứng là gì?

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.