29Feb 2016

Automation of resolving CAPTCHAs for web crawling.

Renuka Sakhare , Abhay Bhagat and Anil Bhadagle.

Student, PVG?s college of engineering, Pune.
Student, PVG?s college of engineering, Pune.
Assistant professor, department of computer engineering, PVG?s college of engineering, Pune.

Abstract
Keywords
Cite This Article as
Corresponding Author

A web crawler is an automated computer program used by search engine to collect data of web pages from World Wide Web and the web crawler perform this by process called web crawling. To keep data updated crawler need frequent caching of web pages. But performance of web server gets affected as crawler retrieve data frequently in greater depth than human searchers. Thus to balance load and for authentication server asks crawler to verify themselves against CAPTCHAs. It is not feasible for human to solve and enter CAPTCHAs for more than two billion web pages exist on www every now and then. Thus to automate CAPTCHA solving, we describe a system for text recognition from CAPTCHA images. Our particular focus is reliable text extraction, recognition, feeding resolved CAPTCHA characters to crawler system in order to continue with crawling process without human involvement.

[Renuka Sakhare, Abhay Bhagat and Anil Bhadagle. (2016); Automation of resolving CAPTCHAs for web crawling. Int. J. of Adv. Res. 4 (Feb). 1224-1232] (ISSN 2320-5407). www.journalijar.com

R. H. Sakhare

Download Full Paper

Download PDF No. of Downloads: 76 | No. of Views: 683

This work is licensed under a Creative Commons Attribution 4.0 International License.

Automation of resolving CAPTCHAs for web crawling.

Download Full Paper

Share this article