[ACCEPTED]-Read file object as string in python-urllib2

Accepted answer
Score: 77

You can use Python in interactive mode to 5 search for solutions.

if f is your object, you 4 can enter dir(f) to see all methods and attributes. There's 3 one called read. Enter help(f.read) and it tells you that 2 f.read() is the way to retrieve a string from an 1 file object.

Score: 14

From the doc file.read() (my emphasis):

file.read([size])

Read 18 at most size bytes from the file (less if 17 the read hits EOF before obtaining size 16 bytes). If the size argument is negative 15 or omitted, read all data until EOF is reached. The bytes are returned as a string object. An 14 empty string is returned when EOF is encountered 13 immediately. (For certain files, like ttys, it 12 makes sense to continue reading after an 11 EOF is hit.) Note that this method may call 10 the underlying C function fread more than 9 once in an effort to acquire as close to 8 size bytes as possible. Also note that when 7 in non-blocking mode, less data than was 6 requested may be returned, even if no size 5 parameter was given.

Be aware that a regexp 4 search on a large string object may not 3 be efficient, and consider doing the search 2 line-by-line, using file.next() (a file object is its 1 own iterator).

Score: 5

Michael Foord, aka Voidspace has an excellent 3 tutorial on urllib2 which you can find here: urllib2 - The Missing Manual

What 2 you are doing should be pretty straightforward, observe 1 this sample code:

import urllib2
import re
response = urllib2.urlopen("http://www.voidspace.org.uk/python/articles/urllib2.shtml")
html = response.read()
pattern = '(V.+space)'
wordPattern = re.compile(pattern, re.IGNORECASE)
results = wordPattern.search(html)
print results.groups()

More Related questions