bash - share variable (data from file) among multiple python scripts with not loaded duplicates -


i load big matrix contained in matrix_file.mtx. load must made once. once variable matrix loaded memory, many python scripts share not duplicates in order have memory efficient multiscript program in bash (or python itself). can imagine pseudocode this:

# loading , sharing script: import share matrix = open("matrix_file.mtx","r") share.send_to_shared_ram(matrix, as_variable('matrix'))  # shared matrix variable processing script_1 import share pointer_to_matrix = share.share_variable_from_ram('matrix') type(pointer_to_matrix) # output: <type 'numpy.ndarray'>  # shared matrix variable processing script_2 import share pointer_to_matrix = share.share_variable_from_ram('matrix') type(pointer_to_matrix) # output: <type 'numpy.ndarray'> ... 

the idea pointer_to_matrix point matrix in ram, once loaded n scripts (not n times). separately called bash script (or if possible form python main):

$ python load_and_share.py $ python script_1.py -args string & $ python script_2.py -args string & $ ... $ python script_n.py -args string & 

i'd interested in solutions via hard disk, i.e. matrix stored @ disk while share object access being required. nonetheless, object (a kind of pointer) in ram can seen whole matrix.

thank help.

between mmap module , numpy.frombuffer, easy:

import mmap import numpy np  open("matrix_file.mtx","rb") matfile:     mm = mmap.mmap(matfile.fileno(), 0, access=mmap.access_read)     # optionally, on unix-like systems in py3.3+, add:     # os.posix_fadvise(matfile.fileno(), 0, len(mm), os.posix_fadv_willneed)     # trigger background read in of file system cache,     # minimizing page faults when use  matrix = np.frombuffer(mm, np.uint8) 

each process perform work separately, , read view of same memory. you'd change dtype other uint8 needed. switching access_write allow modifications shared data, though require synchronization , possibly explicit calls mm.flush ensure data reflected in other processes.

a more complex solution follows initial design more closely might uses multiprocessing.syncmanager create connectable shared "server" data, allowing single common store of data registered manager , returned many users desired; creating array (based on ctypes types) correct type on manager, register-ing function returns same shared array callers work (each caller convert returned array via numpy.frombuffer before). it's more involved (it easier have single python process initialize array, launch processes share automatically fork semantics), it's closest concept describe.


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -