top button
Flag Notify
    Connect to us
      Facebook Login
      Site Registration Why to Join

Facebook Login
Site Registration
Print Preview

Site crawler of a site (quora) in python.

0 votes
200 views

I am writing a crawler in python, which crawl quora. I can't read the content of quora without login. But google/bing crawls quora. One thing i can do is use browser automation and login in my account and the go links by link and crawl content, but this method is slow. So can any one tell me how should i start in writing this crawler.

posted Aug 3, 2013 by Salil Agrawal

Share this question
Facebook Share Button Twitter Share Button Google+ Share Button LinkedIn Share Button Multiple Social Share Button

1 Answer

+1 vote

You start with reading the page: http://www.quora.com/about/tos

which you agreed to when you created your account with them. At one place it seems pretty clear that unless you make specific arrangements with Quora, you're limited to using their API.

I suspect that they bend over backwards to get Google and the other big names to index their stuff. But that doesn't make it legal for you to do the same.

In particular, the section labeled "Rules" makes constraints on automated crawling. And so do other parts of the TOS. Crawling is permissible, but not scraping. What's that mean? I dunno. Perhaps scraping is what you're describing above as "method is slow."

answer Aug 3, 2013 by Deepak Dasgupta
Similar Questions
–1 vote

I'm working on a new project and i want to receive a request from a user and to redirect him to a third party site, but on the page after i redirect my users i want to them to see injected html (on the third party site.)

i'm not really sure how to approach this problem..

0 votes

Is there a way to block .php from being indexed by crawlers, but allow other type files to be indexed? When the crawlers access the php files, they are executed, creating lots of error messages (and taking up cpu cycles).

+1 vote

I want to find the maximal number of elements contained in a nested dictionary, e.g.

data = {
 'violations':
 {
 'col1': {'err': [elem1, elem2, elem3]},
 'col2': {'err': [elem1, elem2]}
 }
 }

so to find the maximal number of elements in the lists for key 'err' in key 'col1' and 'col2'. Also key 'violations' may contain many keys (e.g. 'col1' , 'col2', 'col3' etc), so what's the best way to do this (using a loop)?

max = 0for col in data.violations:
 if max < len(data.violations.col.err):
 max = len(data.violations.col.err)
+3 votes

I am trying to count the number of files in a folder with python script but not getting the appropriate script. In which I can change the path and can find out the count of other folder files.

It doesn't matter for me how long the script is but need the clear script and it would be better if you can explain it also.


Useful Links with Similar Problem
Contact Us
+91 9880187415
sales@queryhome.net
support@queryhome.net
#470/147, 3rd Floor, 5th Main,
HSR Layout Sector 7,
Bangalore - 560102,
Karnataka INDIA.
QUERY HOME
...