Introduction to High-Performance Ap

Introduction to High-Performance Appliances for Big Data Management

Big data analytics applications combine the means for developing and implementing algorithms that must access, consume, and manage data. In essence, the framework relies on a technology ecosystem of components that must be combined in a variety of ways to address each application’s requirements, which can range from general information technology (IT) performance scalability to detailed performance improvement objectives associated with specific algorithmic demands. For example, some algorithms expect that massive amounts of data are immediately available quickly, necessitating large amounts of core memory. Other applications may need numerous iterative exchanges of data between different computing nodes, which would require highspeed networks.
The big data technology ecosystem stack may include:
• Scalable storage systems that are used for capturing, manipulating, and analyzing massive datasets.
• A computing platform, sometimes configured specifically for largescale analytics, often composed of multiple (typically multicore) processing nodes connected via a high-speed network to memory and disk storage subsystems. These are often referred to as appliances.
• A data management environment, whose configurations may range from a traditional database management system scaled to massive parallelism to databases configured with alternative distributions and layouts, to newer graph-based or other NoSQL data management schemes.
• An application development framework to simplify the process of developing, executing, testing, and debugging new application code. This framework should include programming models, development tools, program execution and scheduling, and system configuration and management capabilities.

• Layering packaged methods of scalable analytics (including statistical and data mining models) that can be configured by the analysts and other business consumers to help improve the ability to design and build analytical and predictive models.
• Oversight and management processes and tools that are necessary to ensure alignment with the enterprise analytics infrastructure and collaboration among the developers, analysts, and other business users.
In this chapter, we examine the storage, appliance, and data management aspects of this ecosystem.
6.1 USE CASES
To motivate the discussion, it is worth looking at four typical big data analytics use cases chosen from among the characteristics implementations discussed in Chapter 2:
1. Targeted customer marketing, in which customer profiles are analyzed for the purpose of formulating customized marketing campaigns to influence customer purchase behaviors.
2. Social media analytics applications that scan through streams of social media channels looking for positive or negative sentiments that are correlated to the behavior of a collective of individuals.
3. Fraud detection algorithms that analyze historical patterns of activity looking for suspicious behaviors that are indicative of fraud or abuse, as well as scanning transactions in real time looking for aberrant behavior requiring further investigation.
4. Web site recommendation engines that lever large sets of historical transaction patterns combined with customer profiles to identify suggested additional items to be presented to the customer as potential add-on purchases.
Table 6.1 provides some considerations for storage, appliance hardware, and data management related to the use case.
6.2 STORAGE CONSIDERATIONS: INFRASTRUCTURE BEDROCK FOR THE DATA LIFECYCLE (51)
In any environment intended to support the analysis of massive amounts of data, there must be the infrastructure supporting the data lifecycle from acquisition, preparation, integration, and execution. The need to acquire and manage massive amounts of data suggests a need for specialty storage systems to accommodate the big data applications.
When evaluating specialty storage offerings, some variables to consider include:
• Scalability, which looks at whether expectations for performance improvement are aligned with the additional of storage resources, and the degree to which the storage subsystem can support massive data volumes of increasing size.
• Extensibility, which examines how flexible the storage system’s architecture is in allowing the system to be grown without the constraint of artificial limits.
• Accessibility, which looks at any limitations or constraints in providing simultaneous access to an expanding user community without compromising performance.
• Fault tolerance, which imbues the storage environment with the capability to recover from intermittent failures.
• High-speed I/O capacity, which measures whether the input/output channels can satisfy the demanding timing requirements for absorbing, storing, and sharing large data volumes.
• Integratability, which measures how well the storage environment can be integrated into the production environment.
Often, the storage framework involves a software layer for managing a collection of storage resources and providing much of these capabilities. The software configures storage for replication to provide a level of fault tolerance, as well as managing communications using standard protocols (such as UDP or TCP/IP) among the different processing nodes. In addition, some frameworks will replicate stored data, providing redundancy in the event of a fault or failure.

6.3 BIG DATA APPLIANCES: HARDWARE AND SOFTWARE TUNED FOR ANALYTICS
Because big data applications and analytics demand a high level of system performance that exceeds the capabilities of typical systems, there is a general need for using scalable multiprocessor configurations tuned to meet mixed-used demand for reporting, ad hoc analysis, and more complex analytical models. And as can be seen in relation to the example use cases in Table 6.1, there are going to be a plethora of performance drivers for computational scalability, with respect to data volumes and the number of simultaneous users. Naturally, the technical leaders must assess the end-users’ scalability requirements to help in selecting a specific architectural approach.
There are essentially two approaches to configuring a highperformance architecture platform. One (the hardware appliance approach) employs specialty-hardware configurations, while the other (the software appliance approach) uses software to manage a collection of commodity hardware components.
Hardware appliances are often configured as multiprocessor systems, although the architectures may vary in relation to the ways that different memory components are configured. There are different facets of the system that contribute to maximizing system performance, including CPU/core configurations, cache memory, core memory, flash memory, temporary disk storage areas, and persistent disk storage. Hardware architects consider the varying configurations of these levels of the memory hierarchy to find the right combination of memory devices with varying sizes, costs, and speed to achieve the right level of performance and scalability and provide optimal results by satisfying the ability to respond to increasingly complex queries, while enabling simultaneous analyses.
Different architectural configurations address different scalability and performance issues in different ways, so when it comes to deciding which type of architecture is best for your analytics needs, consider different alternatives including symmetric multiprocessor (SMP) systems, massively parallel processing (MPP), as well as software appliances that adapt to parallel hardware system models.
Hardware appliances are designed for big data applications. They often will incorporate multiple (multicore) processing nodes and multiple storage nodes linked via a high-speed interconnect. Support tools are usually included as well to manage high-speed integration connectivity and enable mixed configurations of computing and storage nodes.
A software appliance for big data is essentially a suite of highperformance software components that can be layered on commodity hardware. Software appliances can incorporate database management software coupled with a high-performance execution engine and query optimization to support and take advantage of parallelization and data distribution. Vendors may round out the offering by providing application development tools, analytics capabilities, as well as enable direct user tuning with alternate data layouts for improved performance.

Introduction to High-Performance Appliances for Big Data Management

Big data analytics applications combine the means for developing and implementing algorithms that must access, consume, and manage data. In essence, the framework relies on a technology ecosystem of components that must be combined in a variety of ways to address each application’s requirements, which can range from general information technology (IT) performance scalability to detailed performance improvement objectives associated with specific algorithmic demands. For example, some algorithms expect that massive amounts of data are immediately available quickly, necessitating large amounts of core memory. Other applications may need numerous iterative exchanges of data between different computing nodes, which would require highspeed networks.
The big data technology ecosystem stack may include:
• Scalable storage systems that are used for capturing, manipulating, and analyzing massive datasets.
• A computing platform, sometimes configured specifically for largescale analytics, often composed of multiple (typically multicore) processing nodes connected via a high-speed network to memory and disk storage subsystems. These are often referred to as appliances.
• A data management environment, whose configurations may range from a traditional database management system scaled to massive parallelism to databases configured with alternative distributions and layouts, to newer graph-based or other NoSQL data management schemes.
• An application development framework to simplify the process of developing, executing, testing, and debugging new application code. This framework should include programming models, development tools, program execution and scheduling, and system configuration and management capabilities.

• Layering packaged methods of scalable analytics (including statistical and data mining models) that can be configured by the analysts and other business consumers to help improve the ability to design and build analytical and predictive models.
• Oversight and management processes and tools that are necessary to ensure alignment with the enterprise analytics infrastructure and collaboration among the developers, analysts, and other business users.
In this chapter, we examine the storage, appliance, and data management aspects of this ecosystem.
6.1 USE CASES
To motivate the discussion, it is worth looking at four typical big data analytics use cases chosen from among the characteristics implementations discussed in Chapter 2:
1. Targeted customer marketing, in which customer profiles are analyzed for the purpose of formulating customized marketing campaigns to influence customer purchase behaviors.
2. Social media analytics applications that scan through streams of social media channels looking for positive or negative sentiments that are correlated to the behavior of a collective of individuals.
3. Fraud detection algorithms that analyze historical patterns of activity looking for suspicious behaviors that are indicative of fraud or abuse, as well as scanning transactions in real time looking for aberrant behavior requiring further investigation.
4. Web site recommendation engines that lever large sets of historical transaction patterns combined with customer profiles to identify suggested additional items to be presented to the customer as potential add-on purchases.
Table 6.1 provides some considerations for storage, appliance hardware, and data management related to the use case.
6.2 STORAGE CONSIDERATIONS: INFRASTRUCTURE BEDROCK FOR THE DATA LIFECYCLE (51)
In any environment intended to support the analysis of massive amounts of data, there must be the infrastructure supporting the data lifecycle from acquisition, preparation, integration, and execution. The need to acquire and manage massive amounts of data suggests a need for specialty storage systems to accommodate the big data applications.
When evaluating specialty storage offerings, some variables to consider include:
• Scalability, which looks at whether expectations for performance improvement are aligned with the additional of storage resources, and the degree to which the storage subsystem can support massive data volumes of increasing size.
• Extensibility, which examines how flexible the storage system’s architecture is in allowing the system to be grown without the constraint of artificial limits.
• Accessibility, which looks at any limitations or constraints in providing simultaneous access to an expanding user community without compromising performance.
• Fault tolerance, which imbues the storage environment with the capability to recover from intermittent failures.
• High-speed I/O capacity, which measures whether the input/output channels can satisfy the demanding timing requirements for absorbing, storing, and sharing large data volumes.
• Integratability, which measures how well the storage environment can be integrated into the production environment.
Often, the storage framework involves a software layer for managing a collection of storage resources and providing much of these capabilities. The software configures storage for replication to provide a level of fault tolerance, as well as managing communications using standard protocols (such as UDP or TCP/IP) among the different processing nodes. In addition, some frameworks will replicate stored data, providing redundancy in the event of a fault or failure.

6.3 BIG DATA APPLIANCES: HARDWARE AND SOFTWARE TUNED FOR ANALYTICS
Because big data applications and analytics demand a high level of system performance that exceeds the capabilities of typical systems, there is a general need for using scalable multiprocessor configurations tuned to meet mixed-used demand for reporting, ad hoc analysis, and more complex analytical models. And as can be seen in relation to the example use cases in Table 6.1, there are going to be a plethora of performance drivers for computational scalability, with respect to data volumes and the number of simultaneous users. Naturally, the technical leaders must assess the end-users’ scalability requirements to help in selecting a specific architectural approach.
There are essentially two approaches to configuring a highperformance architecture platform. One (the hardware appliance approach) employs specialty-hardware configurations, while the other (the software appliance approach) uses software to manage a collection of commodity hardware components. 
Hardware appliances are often configured as multiprocessor systems, although the architectures may vary in relation to the ways that different memory components are configured. There are different facets of the system that contribute to maximizing system performance, including CPU/core configurations, cache memory, core memory, flash memory, temporary disk storage areas, and persistent disk storage. Hardware architects consider the varying configurations of these levels of the memory hierarchy to find the right combination of memory devices with varying sizes, costs, and speed to achieve the right level of performance and scalability and provide optimal results by satisfying the ability to respond to increasingly complex queries, while enabling simultaneous analyses.
Different architectural configurations address different scalability and performance issues in different ways, so when it comes to deciding which type of architecture is best for your analytics needs, consider different alternatives including symmetric multiprocessor (SMP) systems, massively parallel processing (MPP), as well as software appliances that adapt to parallel hardware system models.
Hardware appliances are designed for big data applications. They often will incorporate multiple (multicore) processing nodes and multiple storage nodes linked via a high-speed interconnect. Support tools are usually included as well to manage high-speed integration connectivity and enable mixed configurations of computing and storage nodes.
A software appliance for big data is essentially a suite of highperformance software components that can be layered on commodity hardware. Software appliances can incorporate database management software coupled with a high-performance execution engine and query optimization to support and take advantage of parallelization and data distribution. Vendors may round out the offering by providing application development tools, analytics capabilities, as well as enable direct user tuning with alternate data layouts for improved performance.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

Introduction to High-Performance Appliances for Big Data ManagementBig data analytics applications combine the means for developing and implementing algorithms that must access, consume, and manage data. In essence, the framework relies on a technology ecosystem of components that must be combined in a variety of ways to address each application’s requirements, which can range from general information technology (IT) performance scalability to detailed performance improvement objectives associated with specific algorithmic demands. For example, some algorithms expect that massive amounts of data are immediately available quickly, necessitating large amounts of core memory. Other applications may need numerous iterative exchanges of data between different computing nodes, which would require highspeed networks.The big data technology ecosystem stack may include:• Scalable storage systems that are used for capturing, manipulating, and analyzing massive datasets.• A computing platform, sometimes configured specifically for largescale analytics, often composed of multiple (typically multicore) processing nodes connected via a high-speed network to memory and disk storage subsystems. These are often referred to as appliances.• A data management environment, whose configurations may range from a traditional database management system scaled to massive parallelism to databases configured with alternative distributions and layouts, to newer graph-based or other NoSQL data management schemes.• Một khuôn khổ phát triển ứng dụng để đơn giản hóa quá trình phát triển, thực hiện, thử nghiệm và gỡ lỗi mã ứng dụng mới. Khuôn khổ này nên bao gồm lập trình mô hình, công cụ phát triển, thực hiện chương trình và lập kế hoạch, và cấu hình hệ thống và khả năng quản lý.• Lớp đóng gói các phương pháp của khả năng mở rộng phân tích (bao gồm thống kê và dữ liệu khai thác mô hình) có thể được cấu hình bởi các nhà phân tích và người tiêu dùng kinh doanh khác để giúp cải thiện khả năng để thiết kế và xây dựng mô hình phân tích và dự báo.• Giám sát và quản lý quy trình và các công cụ cần thiết để đảm bảo sự liên kết với các doanh nghiệp phân tích cơ sở hạ tầng và hợp tác giữa các nhà phát triển, các nhà phân tích, và người dùng doanh nghiệp khác.Trong chương này, chúng tôi xem xét các khía cạnh quản lý lưu trữ, thiết bị, và dữ liệu của hệ sinh thái này.6.1 TRƯỜNG HỢP SỬ DỤNGĐể khuyến khích các cuộc thảo luận, nó là giá trị nhìn vào bốn điển hình dữ liệu lớn analytics sử dụng trường hợp được lựa chọn trong số các đặc tính hiện thực thảo luận trong chương 2:1. được nhắm mục tiêu khách hàng tiếp thị, trong đó khách hàng cấu hình được phân tích cho các mục đích xây dựng tùy chỉnh các chiến dịch tiếp thị để ảnh hưởng đến khách hàng mua hành vi.2. xã hội truyền thông analytics ứng dụng quét qua dòng kênh truyền thông xã hội tìm kiếm tình cảm tích cực hay tiêu cực được tương quan đến hành vi của một tập thể của cá nhân.3. Fraud detection algorithms that analyze historical patterns of activity looking for suspicious behaviors that are indicative of fraud or abuse, as well as scanning transactions in real time looking for aberrant behavior requiring further investigation.4. Web site recommendation engines that lever large sets of historical transaction patterns combined with customer profiles to identify suggested additional items to be presented to the customer as potential add-on purchases.Table 6.1 provides some considerations for storage, appliance hardware, and data management related to the use case.6.2 STORAGE CONSIDERATIONS: INFRASTRUCTURE BEDROCK FOR THE DATA LIFECYCLE (51)In any environment intended to support the analysis of massive amounts of data, there must be the infrastructure supporting the data lifecycle from acquisition, preparation, integration, and execution. The need to acquire and manage massive amounts of data suggests a need for specialty storage systems to accommodate the big data applications.When evaluating specialty storage offerings, some variables to consider include:• Scalability, which looks at whether expectations for performance improvement are aligned with the additional of storage resources, and the degree to which the storage subsystem can support massive data volumes of increasing size.• Extensibility, which examines how flexible the storage system’s architecture is in allowing the system to be grown without the constraint of artificial limits.• Accessibility, which looks at any limitations or constraints in providing simultaneous access to an expanding user community without compromising performance.• Fault tolerance, which imbues the storage environment with the capability to recover from intermittent failures.• High-speed I/O capacity, which measures whether the input/output channels can satisfy the demanding timing requirements for absorbing, storing, and sharing large data volumes.• Integratability, which measures how well the storage environment can be integrated into the production environment.Often, the storage framework involves a software layer for managing a collection of storage resources and providing much of these capabilities. The software configures storage for replication to provide a level of fault tolerance, as well as managing communications using standard protocols (such as UDP or TCP/IP) among the different processing nodes. In addition, some frameworks will replicate stored data, providing redundancy in the event of a fault or failure.6.3 BIG DATA APPLIANCES: HARDWARE AND SOFTWARE TUNED FOR ANALYTICSBecause big data applications and analytics demand a high level of system performance that exceeds the capabilities of typical systems, there is a general need for using scalable multiprocessor configurations tuned to meet mixed-used demand for reporting, ad hoc analysis, and more complex analytical models. And as can be seen in relation to the example use cases in Table 6.1, there are going to be a plethora of performance drivers for computational scalability, with respect to data volumes and the number of simultaneous users. Naturally, the technical leaders must assess the end-users’ scalability requirements to help in selecting a specific architectural approach.There are essentially two approaches to configuring a highperformance architecture platform. One (the hardware appliance approach) employs specialty-hardware configurations, while the other (the software appliance approach) uses software to manage a collection of commodity hardware components. Hardware appliances are often configured as multiprocessor systems, although the architectures may vary in relation to the ways that different memory components are configured. There are different facets of the system that contribute to maximizing system performance, including CPU/core configurations, cache memory, core memory, flash memory, temporary disk storage areas, and persistent disk storage. Hardware architects consider the varying configurations of these levels of the memory hierarchy to find the right combination of memory devices with varying sizes, costs, and speed to achieve the right level of performance and scalability and provide optimal results by satisfying the ability to respond to increasingly complex queries, while enabling simultaneous analyses.Different architectural configurations address different scalability and performance issues in different ways, so when it comes to deciding which type of architecture is best for your analytics needs, consider different alternatives including symmetric multiprocessor (SMP) systems, massively parallel processing (MPP), as well as software appliances that adapt to parallel hardware system models.
Hardware appliances are designed for big data applications. They often will incorporate multiple (multicore) processing nodes and multiple storage nodes linked via a high-speed interconnect. Support tools are usually included as well to manage high-speed integration connectivity and enable mixed configurations of computing and storage nodes.
A software appliance for big data is essentially a suite of highperformance software components that can be layered on commodity hardware. Software appliances can incorporate database management software coupled with a high-performance execution engine and query optimization to support and take advantage of parallelization and data distribution. Vendors may round out the offering by providing application development tools, analytics capabilities, as well as enable direct user tuning with alternate data layouts for improved performance.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.