As we move from petascale systems t

As we move from petascale systems to exascale systems, the number of system components will be
increasing faster than component reliability, with projections in the minutes or seconds for exascale systems. From the current knowledge and observations of existing large systems, it is anticipated that exascale systems will experience various kind of faults many times per day. Increasing evidence points to a rise in silent errors (faults that never get detected, or get detected long after they generated erroneous results), causing havoc which will only get more problematic as the number of components rise with exascale systems. Systems running 100 million cores will continually see core failures and the tools for dealing with them will have to be rethought.The current approach for resilience, which relies on automatic or application level checkpoint/ restart, will not work because the time for checkpointing and restarting will exceed the mean time to failure (MTTF) of a full system. This set of projections presents a difficult challenge: finding new approaches to run applications until their normal termination, despite the projected unstable nature of exascale systems. The ability for a scientist to make forward progress will be difficult unless alternative methods to fault recovery are provided that do not involve checkpoint/restart.Currently, there is technical progress started in several areas. These include improving hardware and software reliability, better understanding of the root cause of RAS collection and analysis and, additionally, fault resilient algorithms and applications to assist the application developer, and local recovery and migration. The goal of this research is to improve the mean time for interrupts (MTTI) by >100x, so that applications can run for many hours. An additional goal is to improve by a factor of 10X the hardware reliability and improve by a factor of 10X the local recovery and migration of data.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

Như chúng tôi di chuyển từ hệ thống petascale để exascale hệ thống, số lượng các thành phần hệ thống sẽincreasing faster than component reliability, with projections in the minutes or seconds for exascale systems. From the current knowledge and observations of existing large systems, it is anticipated that exascale systems will experience various kind of faults many times per day. Increasing evidence points to a rise in silent errors (faults that never get detected, or get detected long after they generated erroneous results), causing havoc which will only get more problematic as the number of components rise with exascale systems. Systems running 100 million cores will continually see core failures and the tools for dealing with them will have to be rethought.The current approach for resilience, which relies on automatic or application level checkpoint/ restart, will not work because the time for checkpointing and restarting will exceed the mean time to failure (MTTF) of a full system. This set of projections presents a difficult challenge: finding new approaches to run applications until their normal termination, despite the projected unstable nature of exascale systems. The ability for a scientist to make forward progress will be difficult unless alternative methods to fault recovery are provided that do not involve checkpoint/restart.Currently, there is technical progress started in several areas. These include improving hardware and software reliability, better understanding of the root cause of RAS collection and analysis and, additionally, fault resilient algorithms and applications to assist the application developer, and local recovery and migration. The goal of this research is to improve the mean time for interrupts (MTTI) by >100x, so that applications can run for many hours. An additional goal is to improve by a factor of 10X the hardware reliability and improve by a factor of 10X the local recovery and migration of data.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

Như chúng ta chuyển từ hệ thống petascale để hệ thống exascale, số lượng các thành phần hệ thống sẽ được
tăng nhanh hơn so với độ tin cậy phần, với dự trong phút hoặc giây cho hệ thống exascale. Từ những kiến thức và quan sát của hệ thống lớn hiện có hiện nay, người ta dự đoán rằng hệ thống exascale sẽ được trải nghiệm các loại khác nhau của các đứt gãy nhiều lần mỗi ngày. Tăng điểm bằng chứng để nhiều lỗi im lặng (Sửa lỗi mà không bao giờ bị phát hiện, hoặc được phát hiện rất lâu sau khi họ đã tạo ra kết quả sai), gây ra sự tàn phá mà sẽ chỉ nhận được nhiều vấn đề như số lượng các thành phần tăng với hệ thống exascale. Hệ thống chạy 100 triệu nhân sẽ liên tục nhìn thấy những thất bại cốt lõi và các công cụ để đối phó với họ sẽ phải rethought.The tiếp cận hiện nay cho khả năng phục hồi, mà dựa vào trạm kiểm soát mức độ tự động hoặc ứng dụng / khởi động lại, sẽ không làm việc vì thời gian cho checkpointing và khởi động lại sẽ vượt quá thời gian trung bình để thất bại (MTTF) của một hệ thống đầy đủ. Điều này đặt các dự là một thách thức khó khăn: việc tìm kiếm những phương cách mới để chạy các ứng dụng cho đến khi chấm dứt bình thường của họ, mặc dù bản chất không ổn định dự của các hệ thống exascale. Khả năng cho một nhà khoa học để thực hiện tiến bộ về phía trước sẽ rất khó khăn trừ khi phương pháp thay thế để phục hồi lỗi được cung cấp mà không liên quan đến trạm kiểm soát / restart.Currently, có tiến bộ kỹ thuật bắt đầu trong một số lĩnh vực. Chúng bao gồm các cải tiến phần cứng và độ tin cậy phần mềm, sự hiểu biết tốt hơn về các nguyên nhân gốc rễ của việc thu thập và phân tích RAS và, thêm vào đó, thuật toán đàn hồi lỗi và các ứng dụng để hỗ trợ các nhà phát triển ứng dụng, và phục hồi địa phương và di cư. Mục tiêu của nghiên cứu này là để cải thiện thời gian trung bình cho ngắt (MTTI) bởi> 100x, để các ứng dụng có thể chạy trong nhiều giờ. Một mục tiêu khác là cải thiện bởi một nhân tố của 10X độ tin cậy phần cứng và cải thiện bởi một nhân tố của 10X sự phục hồi của địa phương và di chuyển dữ liệu.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.