python - Requests: Explanation of the .text format -

February 15, 2015

i'm using requests module along python 2.7 build basic web crawler.

source_code = requests.get(url) plain_text = source_code.text

now, in above lines of code, i'm storing source code of specified url , other metadata inside source_code variable. now, in source_code.text, .text attribute? not function. couldn't find in documentation explains origin or feature of .text either.

requests.get() returns response object; object has .text attribute; not 'source code' of url, object lets access source code (the body) of response, other information. response.text attribute gives body of response, decoded unicode.

see response content section of quickstart documentation:

when make request, requests makes educated guesses encoding of response based on http headers. text encoding guessed requests used when access r.text.

further information can found in api documentation, see response.text entry:

content of response, in unicode.

if response.encoding none, encoding guessed using chardet.

the encoding of response content determined based solely on http headers, following rfc 2616 letter. if can take advantage of non-http knowledge make better guess @ encoding, should set r.encoding appropriately before accessing property.

you can use response.content access response body undecoded, raw bytes.

Search This Blog

Two

python - Requests: Explanation of the .text format -

Comments

Post a Comment

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -