I'm also very keen to see what you've suggested here - I've got a dedi with BL and XR installed - have been playing around with BL and CS and managed to get them working but I'm lost with XR. A little help on how to get started would go a long way and the 'education of a newbie xrumer user' would be perfect for this. Thanks.
Unfortunately not. I had to learn the hard way and I'm still learning. All I can say from my experience is to start with guest books and not to use the sieve filter. Use footprints in additive keywords and make your own word database using the google word scraper in hrefer. This alone will kick you off!
Hrefer comes with a set of sieve and additive words out of the box that works pretty decently.
If you need a wordlist, these aren't the best, but you will get huge lists by using this Google query:
dictionary filetype.txt - Google Search
What you are doing is simple looking for word lists that are in the .txt format already. One of the top results was: https://www.cs.indiana.edu/l/www/cla.../word_list.txt
That would make for a perfectly suitable words list for most people just getting started.
If you need a more specific words list quickly and easily just change your query.
For instance: sport wordlist filetype:txt
One of the results that pop's up is: http://ada.rg16.asn-wien.ac.at/~python/wordlist.txt
So that's a really quick way to get wordlists that won't involve much work on your end. Every now and then you will get a file with text on top, but you can take that out with a text editor. If you have one with blank links, you can take those out with excel or notepad++ (I think on np++ - I use excel)
For proxies, I like using Kensai's proxy service from The SEO Bay - Xrumer Forum, SEO and Money Making Discussions Community
Failing that there is also the inurlroxyc/engine.php trick. That is old, but still works. Basically you go hunting for someone else's proxy list and when you find a good one, use it until for some reason, it stops being so effective.
So, you really should have the three basic requirements for hrefer now. A wordllist, a proxy source, and the program has decent footprints already
Yeah thats what I did. I used someone elses list. It was shithouse so I managed to get my own and I just manually scrape and test with SB now. Its a bit slower but I dont mind doing it.
Ive been scraping all sorts of platforms too which has been good. I also have been doing mass guestbook scraping using multiple footprints instead of just doing one at a time. It worked fairly well.
Web 2.0 Explosion - Hand Made Web 2.0 Creation on Sites Tools Don't Touch.
Empty sieve means that you go to prasing options and tick the box responsible for disabling the sieve filtration. Ticking that box should remove the error.
Failing that, just think a bit outside the box on your list.
Is phoca what you want me to look into? I have a list for that CMS I think...
personally I would always use the sieve filter when scraping for platforms like phoca. otherwise you will end up with lots of other stuff, blog posts regarding how to remove the 'powered by...' footprint and so on, stuff you don't want in your list.