bash - share variable (data from file) among multiple python scripts with not loaded duplicates -
i load big matrix contained in matrix_file.mtx
. load must made once. once variable matrix
loaded memory, many python scripts share not duplicates in order have memory efficient multiscript program in bash (or python itself). can imagine pseudocode this:
# loading , sharing script: import share matrix = open("matrix_file.mtx","r") share.send_to_shared_ram(matrix, as_variable('matrix')) # shared matrix variable processing script_1 import share pointer_to_matrix = share.share_variable_from_ram('matrix') type(pointer_to_matrix) # output: <type 'numpy.ndarray'> # shared matrix variable processing script_2 import share pointer_to_matrix = share.share_variable_from_ram('matrix') type(pointer_to_matrix) # output: <type 'numpy.ndarray'> ...
the idea pointer_to_matrix
point matrix
in ram, once loaded n scripts (not n times). separately called bash script (or if possible form python main):
$ python load_and_share.py $ python script_1.py -args string & $ python script_2.py -args string & $ ... $ python script_n.py -args string &
i'd interested in solutions via hard disk, i.e. matrix
stored @ disk while share
object access being required. nonetheless, object (a kind of pointer) in ram can seen whole matrix.
thank help.
between mmap
module , numpy.frombuffer
, easy:
import mmap import numpy np open("matrix_file.mtx","rb") matfile: mm = mmap.mmap(matfile.fileno(), 0, access=mmap.access_read) # optionally, on unix-like systems in py3.3+, add: # os.posix_fadvise(matfile.fileno(), 0, len(mm), os.posix_fadv_willneed) # trigger background read in of file system cache, # minimizing page faults when use matrix = np.frombuffer(mm, np.uint8)
each process perform work separately, , read view of same memory. you'd change dtype
other uint8
needed. switching access_write
allow modifications shared data, though require synchronization , possibly explicit calls mm.flush
ensure data reflected in other processes.
a more complex solution follows initial design more closely might uses multiprocessing.syncmanager
create connectable shared "server" data, allowing single common store of data registered manager , returned many users desired; creating array
(based on ctypes
types) correct type on manager, register
-ing function returns same shared array
callers work (each caller convert returned array
via numpy.frombuffer
before). it's more involved (it easier have single python process initialize array
, launch process
es share automatically fork
semantics), it's closest concept describe.
Comments
Post a Comment