CrawlingDoc robots URLFP's template

Crawling
Doc robots URL
FP's templates set

Figure 20.1 Basic crawler architecture.
The basic operation of any hypertext crawler (whether for the Web, an in¬tranet, or other hypertext document collection) is as follows. The crawler begins with one or more URLs that constitute a seed set. It picks a URL from this seed set, then fetches the web page at that URL. The fetched page is then parsed, to extract both the text and the links from the page (each of which points to another URL). The extracted text is fed to a text indexer (described in Chapters 4 and 5). The extracted links (URLs) are then added to a URL frontier, which at all times consists of URLs whose corresponding pages have yet to be fetched by the crawler. Initially, the URL frontier contains the seed set; as pages are fetched, the corresponding URLs are deleted from the URL frontier. The entire process may be viewed as traversing the web graph (see Chapter 19). In continuous crawling, the URL of a fetched page is added back to the frontier for fetching again in the future.

Crawling
Doc robots URL
FP's templates set
 
Figure 20.1 Basic crawler architecture.
The basic operation of any hypertext crawler (whether for the Web, an in¬tranet, or other hypertext document collection) is as follows. The crawler begins with one or more URLs that constitute a seed set. It picks a URL from this seed set, then fetches the web page at that URL. The fetched page is then parsed, to extract both the text and the links from the page (each of which points to another URL). The extracted text is fed to a text indexer (described in Chapters 4 and 5). The extracted links (URLs) are then added to a URL frontier, which at all times consists of URLs whose corresponding pages have yet to be fetched by the crawler. Initially, the URL frontier contains the seed set; as pages are fetched, the corresponding URLs are deleted from the URL frontier. The entire process may be viewed as traversing the web graph (see Chapter 19). In continuous crawling, the URL of a fetched page is added back to the frontier for fetching again in the future.

0/5000

Từ: -

Sang: -

Kết quả (Việt) 1: [Sao chép]

Sao chép!

CrawlingDoc robots URLFP's templates set Figure 20.1 Basic crawler architecture.The basic operation of any hypertext crawler (whether for the Web, an in¬tranet, or other hypertext document collection) is as follows. The crawler begins with one or more URLs that constitute a seed set. It picks a URL from this seed set, then fetches the web page at that URL. The fetched page is then parsed, to extract both the text and the links from the page (each of which points to another URL). The extracted text is fed to a text indexer (described in Chapters 4 and 5). The extracted links (URLs) are then added to a URL frontier, which at all times consists of URLs whose corresponding pages have yet to be fetched by the crawler. Initially, the URL frontier contains the seed set; as pages are fetched, the corresponding URLs are deleted from the URL frontier. The entire process may be viewed as traversing the web graph (see Chapter 19). In continuous crawling, the URL of a fetched page is added back to the frontier for fetching again in the future.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 2:[Sao chép]

Sao chép!

Crawling
robot Doc URL
mẫu FP của thiết Hình 20.1 trình thu thập kiến trúc cơ bản. Các hoạt động cơ bản của bất kỳ trình thu thập siêu văn bản (dù cho Web, một in¬tranet, hoặc siêu văn bản thu thập tài liệu khác) như sau. Việc thu thập thông tin bắt đầu với một hoặc nhiều các URL mà tạo thành một bộ hạt giống. Nó chọn một URL từ bộ hạt giống này, sau đó tìm nạp trang web tại URL đó. Các trang lấy sau đó được phân tích cú pháp, để trích xuất cả văn bản và các liên kết từ trang (mỗi điểm đến một URL khác). Các văn bản chiết xuất được chuyển tới một indexer văn bản (được mô tả trong Chương 4 và 5). Các liên kết chiết xuất (URL) sau đó thêm vào một URL biên giới, mà ở tất cả các lần bao gồm các URL mà các trang tương ứng vẫn chưa được lấy bằng cách thu thập thông tin. Ban đầu, biên giới URL chứa các tập hạt giống; như các trang đang được lấy về, các URL tương ứng sẽ bị xóa khỏi biên giới URL. Toàn bộ quá trình có thể được xem như là đi qua đồ thị web (xem Chương 19). Trong bò liên tục, các URL của một trang lấy được thêm lại đến biên giới để lấy lại trong tương lai.

đang được dịch, vui lòng đợi..

Kết quả (Việt) 3:[Sao chép]

Sao chép!

đang được dịch, vui lòng đợi..

Các ngôn ngữ khác

Hỗ trợ công cụ dịch thuật: Albania, Amharic, Anh, Armenia, Azerbaijan, Ba Lan, Ba Tư, Bantu, Basque, Belarus, Bengal, Bosnia, Bulgaria, Bồ Đào Nha, Catalan, Cebuano, Chichewa, Corsi, Creole (Haiti), Croatia, Do Thái, Estonia, Filipino, Frisia, Gael Scotland, Galicia, George, Gujarat, Hausa, Hawaii, Hindi, Hmong, Hungary, Hy Lạp, Hà Lan, Hà Lan (Nam Phi), Hàn, Iceland, Igbo, Ireland, Java, Kannada, Kazakh, Khmer, Kinyarwanda, Klingon, Kurd, Kyrgyz, Latinh, Latvia, Litva, Luxembourg, Lào, Macedonia, Malagasy, Malayalam, Malta, Maori, Marathi, Myanmar, Mã Lai, Mông Cổ, Na Uy, Nepal, Nga, Nhật, Odia (Oriya), Pashto, Pháp, Phát hiện ngôn ngữ, Phần Lan, Punjab, Quốc tế ngữ, Rumani, Samoa, Serbia, Sesotho, Shona, Sindhi, Sinhala, Slovak, Slovenia, Somali, Sunda, Swahili, Séc, Tajik, Tamil, Tatar, Telugu, Thái, Thổ Nhĩ Kỳ, Thụy Điển, Tiếng Indonesia, Tiếng Ý, Trung, Trung (Phồn thể), Turkmen, Tây Ban Nha, Ukraina, Urdu, Uyghur, Uzbek, Việt, Xứ Wales, Yiddish, Yoruba, Zulu, Đan Mạch, Đức, Ả Rập, dịch ngôn ngữ.