AbotX : How do you create a parallel crawler that stays on and can be added to at run time from new requests -


i have parallelcrawlerengine setup singleton , have alwaysonsitetocrawlprovider set singleton , passed parallelcrawlerengine.

i can instantiate , leave nothing ok. can add site , crawl ok. if add site not crawl second site.

i have looked @ example on site doesn't appear show how work , have new items added after initial execution. using .addsitestocrawl() adds them list seems stay in purgatory state of not being read.

looking through logs 'site completed' message though site has not been recrawled

[2016-07-11 11:17:18,361] [20 ] [info ] - crawl domain [http://www.existingsite.com/] completed in [0.0001118] seconds, crawled [361] pages 

and error if add new site

[2016-07-11 11:17:33,365] [23 ] [error] - crawl domain [http://www.newsite.com/] failed after [0.0066498] seconds, crawled [361] pages [2016-07-11 11:17:33,365] [23 ] [error] - system.invalidoperationexception: cannot call dowork() after abortall() or dispose() have been called.    @ abot.util.threadmanager.dowork(action action)    @ abot.crawler.webcrawler.crawlsite()    @ abot.crawler.webcrawler.crawl(uri uri, cancellationtokensource cancellationtokensource) 


Comments

  1. The article has nicely shown the skills of this java programming software tool, SonarQube With extensive OWASP Top 10 insurance for Java, SonarQube brings troubles to developers early in the method to help you defend your structures, your information, and your users. Push notifications and open APIs make integrating with your different structures painless.

    ReplyDelete

Post a Comment

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

How to start daemon on android by adb -