python - subclass str, and make new method with same effect as += -
i'm trying subclass str
- not important, experiment learn more python built-in types. i've subclassed str
way (using __new__
because str
immutable):
class mystring(str): def __new__(cls, value=''): return str.__new__(cls, value) def __radd__(self, value): # method should use?? return mystring(self + value) # goes here?? def write(self, data): self.__radd__(data)
it initializes right, far can tell. cant modify in-place using += operator. i've tried overriding __add__
, __radd__
, __iadd__
, variety of other configurations. using return
statement, ive managed return new instance of correct appended mystring
, not modify in place. success like:
b = mystring('g') b.write('h') # b should 'gh'
any thoughts?
update
to possibly add reason why might want this, followed suggestion of creating following mutable class uses plain string internally:
class stringinside(object): def __init__(self, data=''): self.data = data def write(self, data): self.data += data def read(self): return self.data
and tested timeit:
timeit.timeit("arr+='1234567890'", setup="arr = ''", number=10000) 0.004415035247802734 timeit.timeit("arr.write('1234567890')", setup="from hard import stringinside; arr = stringinside()", number=10000) 0.0331270694732666
the difference increases rapidly @ number
goes - @ 1 million interactions, stringinside
took longer willing wait return, while pure str
version returned in ~100ms.
update 2
for posterity, decided write cython class wrapping c++ string see if performance improved compared 1 loosely based on mike müller's updated version below, , managed succeed. realize cython "cheating" provide fun.
python version:
class mike(object): def __init__(self, data=''): self._data = [] self._data.extend(data) def write(self, data): self._data.extend(data) def read(self, stop=none): return ''.join(self._data[0:stop]) def pop(self, stop=none): if not stop: stop = len(self._data) try: return ''.join(self._data[0:stop]) finally: self._data = self._data[stop:] def __getitem__(self, key): return ''.join(self._data[key])
cython version:
from libcpp.string cimport string cdef class cystring: cdef string buff cdef public int length def __cinit__(self, string data=''): self.length = len(data) self.buff = data def write(self, string new_data): self.length += len(new_data) self.buff += new_data def read(self, int length=0): if not length: length = self.length return self.buff.substr(0, length) def pop(self, int length=0): if not length: length = self.length ans = self.buff.substr(0, length) self.buff.erase(0, length) return ans
performance:
writing
>>> timeit.timeit("arr.write('1234567890')", setup="from pyversion import mike; arr = mike()", number=1000000) 0.5992741584777832 >>> timeit.timeit("arr.write('1234567890')", setup="from cyversion import cybuff; arr = cybuff()", number=1000000) 0.17381906509399414
reading
>>> timeit.timeit("arr.write('1234567890'); arr.read(5)", setup="from pyversion import mike; arr = mike()", number=1000000) 1.1499049663543701 >>> timeit.timeit("arr.write('1234567890'); arr.read(5)", setup="from cyversion import cybuff; arr = cybuff()", number=1000000) 0.2894480228424072
popping
>>> # note i'm using 10e3 iterations - python version wouldn't return otherwise >>> timeit.timeit("arr.write('1234567890'); arr.pop(5)", setup="from pyversion import mike; arr = mike()", number=10000) 0.7390561103820801 >>> timeit.timeit("arr.write('1234567890'); arr.pop(5)", setup="from cyversion import cybuff; arr = cybuff()", number=10000) 0.01501607894897461
solution
this answer updated question.
you can use list hold data , construct string when reading it:
class stringinside(object): def __init__(self, data=''): self._data = [] self._data.append(data) def write(self, data): self._data.append(data) def read(self): return ''.join(self._data)
performance
the performance of class:
%%timeit arr = stringinside() arr.write('1234567890') 1000000 loops, best of 3: 352 ns per loop
is closer of native str
:
%%timeit str_arr = '' str_arr+='1234567890' 1000000 loops, best of 3: 222 ns per loop
compare version:
%%timeit arr = stringinsideplusequal() arr.write('1234567890') 100000 loops, best of 3: 87 µs per loop
reason
the my_string += another_string
way of building string has been anti-pattern performance wise long time. cpython has optimizations case. seems cpython cannot detect pattern used here. because bit hidden inside class.
not implementations have optimization various reasons. example. pypy, in general faster cpython, considerably slower use case:
pypy 2.6.0 (python 2.7.9)
>>>> import timeit >>>> timeit.timeit("arr+='1234567890'", setup="arr = ''", number=10000) 0.08312582969665527
cpython 2.7.11
>>> import timeit >>> timeit.timeit("arr+='1234567890'", setup="arr = ''", number=10000) 0.002151966094970703
slice-able version
this version supports slicing:
class stringinside(object): def __init__(self, data=''): self._data = [] self._data.extend(data) def write(self, data): self._data.extend(data) def read(self, start=none, stop=none): return ''.join(self._data[start:stop]) def __getitem__(self, key): return ''.join(self._data[key])
you can slice normal way:
>>> arr = stringinside('abcdefg') >>> arr[2] 'c' >>> arr[1:3] 'bc'
now, read()
supports optional start , stop indices:
>>> arr.read() 'abcdefg' >>> arr.read(1, 3) 'bc' >>> arr.read(1) 'bcdefg'
Comments
Post a Comment