python - subclass str, and make new method with same effect as += -

i'm trying subclass str - not important, experiment learn more python built-in types. i've subclassed str way (using __new__ because str immutable):

class mystring(str):     def __new__(cls, value=''):         return str.__new__(cls, value)     def __radd__(self, value):  # method should use??         return mystring(self + value)  # goes here??     def write(self, data):         self.__radd__(data)

it initializes right, far can tell. cant modify in-place using += operator. i've tried overriding __add__, __radd__, __iadd__ , variety of other configurations. using return statement, ive managed return new instance of correct appended mystring, not modify in place. success like:

b = mystring('g') b.write('h')  # b should 'gh'

any thoughts?

update

to possibly add reason why might want this, followed suggestion of creating following mutable class uses plain string internally:

class stringinside(object):      def __init__(self, data=''):         self.data = data      def write(self, data):         self.data += data      def read(self):         return self.data

and tested timeit:

timeit.timeit("arr+='1234567890'", setup="arr = ''", number=10000) 0.004415035247802734 timeit.timeit("arr.write('1234567890')", setup="from hard import stringinside; arr = stringinside()", number=10000) 0.0331270694732666

the difference increases rapidly @ number goes - @ 1 million interactions, stringinside took longer willing wait return, while pure str version returned in ~100ms.

update 2

for posterity, decided write cython class wrapping c++ string see if performance improved compared 1 loosely based on mike müller's updated version below, , managed succeed. realize cython "cheating" provide fun.

python version:

class mike(object):      def __init__(self, data=''):         self._data = []         self._data.extend(data)      def write(self, data):         self._data.extend(data)      def read(self, stop=none):         return ''.join(self._data[0:stop])      def pop(self, stop=none):         if not stop:             stop = len(self._data)         try:             return ''.join(self._data[0:stop])         finally:             self._data = self._data[stop:]      def __getitem__(self, key):         return ''.join(self._data[key])

cython version:

from libcpp.string cimport string  cdef class cystring:     cdef string buff     cdef public int length      def __cinit__(self, string data=''):         self.length = len(data)         self.buff = data      def write(self, string new_data):         self.length += len(new_data)         self.buff += new_data      def read(self, int length=0):         if not length:             length = self.length         return self.buff.substr(0, length)        def pop(self, int length=0):         if not length:             length = self.length         ans = self.buff.substr(0, length)         self.buff.erase(0, length)         return ans

performance:

writing

>>> timeit.timeit("arr.write('1234567890')", setup="from pyversion import mike; arr = mike()", number=1000000) 0.5992741584777832 >>> timeit.timeit("arr.write('1234567890')", setup="from cyversion import cybuff; arr = cybuff()", number=1000000) 0.17381906509399414

reading

>>> timeit.timeit("arr.write('1234567890'); arr.read(5)", setup="from pyversion import mike; arr = mike()", number=1000000) 1.1499049663543701 >>> timeit.timeit("arr.write('1234567890'); arr.read(5)", setup="from cyversion import cybuff; arr = cybuff()", number=1000000) 0.2894480228424072

popping

>>> # note i'm using 10e3 iterations - python version wouldn't return otherwise >>> timeit.timeit("arr.write('1234567890'); arr.pop(5)", setup="from pyversion import mike; arr = mike()", number=10000) 0.7390561103820801 >>> timeit.timeit("arr.write('1234567890'); arr.pop(5)", setup="from cyversion import cybuff; arr = cybuff()", number=10000) 0.01501607894897461

solution

this answer updated question.

you can use list hold data , construct string when reading it:

class stringinside(object):      def __init__(self, data=''):         self._data = []         self._data.append(data)      def write(self, data):         self._data.append(data)      def read(self):         return ''.join(self._data)

performance

the performance of class:

%%timeit arr = stringinside() arr.write('1234567890') 1000000 loops, best of 3: 352 ns per loop

is closer of native str:

%%timeit str_arr = '' str_arr+='1234567890' 1000000 loops, best of 3: 222 ns per loop

compare version:

%%timeit arr = stringinsideplusequal() arr.write('1234567890') 100000 loops, best of 3: 87 µs per loop

reason

the my_string += another_string way of building string has been anti-pattern performance wise long time. cpython has optimizations case. seems cpython cannot detect pattern used here. because bit hidden inside class.

not implementations have optimization various reasons. example. pypy, in general faster cpython, considerably slower use case:

pypy 2.6.0 (python 2.7.9)

>>>> import timeit >>>> timeit.timeit("arr+='1234567890'", setup="arr = ''", number=10000) 0.08312582969665527

cpython 2.7.11

>>> import timeit >>> timeit.timeit("arr+='1234567890'", setup="arr = ''", number=10000) 0.002151966094970703

slice-able version

this version supports slicing:

class stringinside(object):      def __init__(self, data=''):         self._data = []         self._data.extend(data)      def write(self, data):         self._data.extend(data)      def read(self, start=none, stop=none):         return ''.join(self._data[start:stop])      def __getitem__(self, key):         return ''.join(self._data[key])

you can slice normal way:

>>> arr = stringinside('abcdefg') >>> arr[2] 'c' >>> arr[1:3] 'bc'

now, read() supports optional start , stop indices:

>>>  arr.read() 'abcdefg' >>> arr.read(1, 3) 'bc'  >>> arr.read(1) 'bcdefg'

Search This Blog

Two