python - Requests: Explanation of the .text format -
i'm using requests module along python 2.7 build basic web crawler.
source_code = requests.get(url) plain_text = source_code.text now, in above lines of code, i'm storing source code of specified url , other metadata inside source_code variable. now, in source_code.text, .text attribute? not function. couldn't find in documentation explains origin or feature of .text either.
requests.get() returns response object; object has .text attribute; not 'source code' of url, object lets access source code (the body) of response, other information. response.text attribute gives body of response, decoded unicode.
see response content section of quickstart documentation:
when make request, requests makes educated guesses encoding of response based on http headers. text encoding guessed requests used when access
r.text.
further information can found in api documentation, see response.text entry:
content of response, in unicode.
if response.encoding none, encoding guessed using
chardet.the encoding of response content determined based solely on http headers, following rfc 2616 letter. if can take advantage of non-http knowledge make better guess @ encoding, should set
r.encodingappropriately before accessing property.
you can use response.content access response body undecoded, raw bytes.
Comments
Post a Comment