Automation of resolving CAPTCHAs for web crawling.

  • Student, PVG?s college of engineering, Pune.
  • Student, PVG?s college of engineering, Pune.
  • Assistant professor, department of computer engineering, PVG?s college of engineering, Pune.
  • Abstract
  • Keywords
  • Cite This Article as
  • Corresponding Author

A web crawler is an automated computer program used by search engine to collect data of web pages from World Wide Web and the web crawler perform this by process called web crawling. To keep data updated crawler need frequent caching of web pages. But performance of web server gets affected as crawler retrieve data frequently in greater depth than human searchers. Thus to balance load and for authentication server asks crawler to verify themselves against CAPTCHAs. It is not feasible for human to solve and enter CAPTCHAs for more than two billion web pages exist on www every now and then. Thus to automate CAPTCHA solving, we describe a system for text recognition from CAPTCHA images. Our particular focus is reliable text extraction, recognition, feeding resolved CAPTCHA characters to crawler system in order to continue with crawling process without human involvement.


[Renuka Sakhare, Abhay Bhagat and Anil Bhadagle. (2016); Automation of resolving CAPTCHAs for web crawling. Int. J. of Adv. Res. 4 (Feb). 1224-1232] (ISSN 2320-5407). www.journalijar.com


R. H. Sakhare