Alternative versions: LabNotebook, nbviewer, slideviewer, pdf
Summary of Results¶
Overview¶
The following notes address a practical issue that arose in my research, namely that I had variously been using two different in-house tools for estimating MSE, inherited from two different individuals (i.e. I did not code them), and they were giving divergent results. One of these functions was written in matlab and one in python, and I needed a python option, as the MSE computation formed part of a suite of jobs intended for use with a large compute cluster, and these jobs were already using python to run tvb simulations. However, I also needed the results to be directly comparable to previous results with empirical data that had used the matlab code. So, I have ported the matlab code to python. The following sections present the new implementation, and some tests and comparisons.
Test Data¶
We'll look at three test datasets:
- Random noise
- EEG data
- Synthetic neural data
Here's what the time series look like:
MSE Curves¶
Random noise:
EEG data:
Synthetic data:
Performance¶
Matlab and the TVB sample entropy function seem to be comparable, but at the moment my implementation of natasa's code is really slow.
all_durs = pd.concat({'td1': df_td1_1k_durs,
'td2 avtr': df_td2_avtr_durs,
'td3 avt': df_td3_avr_durs})
d(all_durs.unstack())
Conclusions¶
(to add...)
Analysis Documentation + Code¶
Notebook Setup¶
Enable matlab in the notebook
%load_ext pymatbridge
Define some variables
# put some system-specific variables in the namespace ('le' dict)
%run ~/set_localenv_vars.py
nb_name = 'mse_estimation_comparison'
outdir = '%s/notebooks/%s' %(le['data_dir'], nb_name)
!mkdir -p $outdir
# Analysis stuff
tvb_dir = le['code_dir'] + '/libraries_of_others/github/tvb-pack'
wfdb_dir = le['code_dir'] + '/libraries_of_others/misc'\
'/WFDB/wfdb-app-toolbox-0-9-9/mcode'
eeglab_dir = '/usr/share/matlab/site/m/eeglab11'
empdat_dir = le['data_dir'] + '/copied_from_pales_jheisz_eeg_jen_Sternberg'\
'/Artifact_removal_results'
empdat_file_orig = empdat_dir + '/st_01_exp10ym/del1.set'
empdat_file = outdir + '/st_01_exp10ym_del1.mat'
simdat_dir = le['data_dir'] + '/notebooks/mse_brute_force_exploration/g2do_sims'
simdat_file = simdat_dir + '/mbfe_#182_ad6d923b-13c8-4aac-abe3-59b91645bb7d_res.h5'
m2u = 2
r2u = 0.5
s2u = range(1,51)
mse_tools_file = outdir + '/mse_tools.py'
# stuff for workdocs-cloudfiles
aws_key = 'drowssaperucesyreva'
aws_secret = '?teytidettopsuoyevah'
Importage
# generic imports
import os,sys,glob,h5py,inspect,numpy as np,pandas as pd
from copy import copy,deepcopy
from datetime import datetime
from scipy.io import savemat,loadmat
from scipy.stats import zscore
# visualization stuff
%matplotlib inline
from matplotlib import pyplot as plt
from IPython.display import Image,HTML,IFrame,clear_output,display as d
import seaborn as sns
# tvb stuff
sys.path += [tvb_dir + '/' + t for t in ['tvb-library', 'tvb-data', 'tvb-framework',
'matlab', 'external-geodesic']]
from tvb.analyzers.info import sampen
# aws api and workdocs-cloudfiles stuff
sys.path.append(le['ipynb_workdocs_dir'])
from new_utils import nb_fig,cloudfiles_nb
Grab Natasa's matlab function and put a copy in this notebook's output folder
nkmatfuncdir = le['data_dir'] + '/notebooks/reproducing_Heisz2015'
!cp $nkmatfuncdir/get_multiple_mse_curves_matlab.m $outdir/
Add eeglab and wfdb toolboxes, plus outdir with some custom .m files, to matlab path
%%matlab -i wfdb_dir,eeglab_dir,outdir
addpath(wfdb_dir)
addpath(genpath(eeglab_dir))
addpath(outdir)
Initialize aws api and workdocs-cloudfiles folder
cnb = cloudfiles_nb('aws', [aws_key,aws_secret])
cnb.initialize_folder(nb_name)
Calico document tools
%%javascript
IPython.load_extensions('calico-spell-check', 'calico-document-tools',
'calico-cell-tools');
Go to output folder
os.chdir(outdir)
%%matlab
cd(outdir)
Ok, let's get cracking...
Define some functions¶
Here is my python port of Natasa's get_multiple_mse_curves_matlab.m
function.
For reference, the original matlab code is also given below.
def get_multiple_mse_curves_py(data,m=2,r=0.5,scales=None,override=True):
"""
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Port of get_multiple_mse_curves_matlab.m by Natasa Kovacevic.
With additional annotations from JG explaining what's going on.
Usage: mse, scales = get_multiple_mse_curves_py(data,m,r,scales)
Inputs:
data = time series data in (time variable) format,
e.g (time trial) or (time voxel)
m = pattern length (default 2)
r = similarity criterion (defualt 0.5)
scales = vector of scales (default [1:(floor(size(data,1)/50))])
override = 0 or 1 (default is 0) - in some rare cases we want
to override this min50-rule
Outputs:
mse = entropy in (scale variable) format
scales = return vector of scales actually used by the program
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
JG Notes:
---------
***Unfinished!***: not yet giving *exactly* the same outputs as the matlab code.
The error in my port is in the construction of the 'cont' array.
Speed: currently the python function runs incredibly slow compared to the
matlab function. I don't see any reason why this should be the case;
so need to identify where the slow parts are and improve.
Redesign: once this function is working satisfactorily, I will probably move to a
modified version that separate the estimation of sample entropy from
the loops over scales and input variables.
"""
# Import numpy functions all individually so that the function calls
# are as similar to the original matlab code as possible
# (...so be warned this port is going to me more 'matlabic' than 'pythonic'...)
from numpy import (unique,floor,size,intersect1d,sort,
r_,std,mean,zeros,log,linspace,arange,round)
num_tpts = size(data,0)
num_vars = size(data,1)
if scales is None:
scales = arange(1,floor(num_tpts/50)+1)
# make sure that scales are positive integers
scales = sort(unique(scales)); # % sort scales in increasing order
if sum(scales == round(scales)) != len(scales):
raise ValueError('error: scales vector must contain positive integers')
if override is False:
# make sure that scales are < (num_tpts/50)
scales = sort(intersect1d(scales,r_[0:floor(size(data,0)/50)]))
# normalize data and r
sd = std(data,axis=0)
r_new = r*sd
mse = zeros([len(scales), num_vars])
# Note:
# The matlab code uses the pattern length variable 'm'
# in two different ways: one related to array sizes,
# and one related to array indexes. For the former,
# the usage in python and matlab will be identicaly,
# but for the latter python will use 'm-1' (because
# like most proper programming langauges, python
# uses 0-based indexing).
# This looks a little unusal in the indexing of the
# 'cont' array constructed at each scale, which has
# size m+1, giving python array indices (but not
# sizes) 'm-1+1'. I has this rather than simplifying
# it to 'm' for the sake of clarity when comparing the
# matlab and python code versions.
for s,sc in enumerate(scales):
# coarse-grain time series at this scale
num_cg_tpts = floor(num_tpts/sc)
y = zeros([num_cg_tpts, num_vars])
for t in r_[0:num_cg_tpts]:
y[t,:] = mean(data[(t*sc + r_[0:sc]).astype(int),:],0)
# calculate sample entropy of coarse grained time series y
nlin_sc = num_cg_tpts - m
cont = zeros([m+1,num_vars]);
# The i and l loops define shifting pairs of points with incrementally increasing
# starting points and gaps between
for var in r_[0:num_vars]:
for i in r_[0:nlin_sc]:
for l in r_[(i+1):nlin_sc]: # self-matches are not counted
#% calculate sample entropy of coarse grained time series y
k = -1;
while ((k <= m-1) & (abs(y[i+k,var] - y[l+k,var]) <= r_new[var])):
# c1,c2,targ = 0,0,0
# while c1 & c2 <= targ:
# targ = r_new[var]
# c1 = (k <= m-1)
# c2 = abs(y[i+k,var] - y[l+k,var])
# #while ((k <= m-1) & (abs(y[i+k,var] - y[l+k,var]) <= r_new[var])):
k+=1
cont[k,var] +=1
# calculate mse at this scale
for var in r_[0:num_vars]:
if cont[m-1+1,var] == 0 or cont[m-1,var] == 0:
mse[s,var] = -log(1/((nlin_sc)*(nlin_sc -1)));
else:
mse[s,var] = -log(cont[m-1+1,var]/cont[m-1,var])
return mse,scales
def gmmcs_nkpy(data,m=2,r=0.5,scales=None,override=True):
from datetime import datetime
from numpy import r_
if scales is None: scales = r_[:20]
start = datetime.now()
mse,scales = get_multiple_mse_curves_py(data,m=m,r=r,scales=scales,override=override)
finish = datetime.now()
dur = str(finish-start)
return mse,scales,dur
def gmmcs_tvb(data,m=2,r=0.5,scales=None,qse=False):
from datetime import datetime
from numpy import r_,newaxis,ndim
if scales is None: scales = r_[:20]
start = datetime.now()
if ndim(data) == 1: data = deepcopy(data)[:,newaxis]
mse = np.array([sampen(data[:,roi], m=m,taus=scales,r=r,qse=qse)
for roi in range(data.shape[1])])#,qse=False,m=2.)
finish = datetime.now()
dur = str(finish-start)
return mse,scales,dur
function [mse, scales]= get_multiple_mse_curves_matlab(data,m,r,scales,override)
% $$$ Written by Natasha Kovacevic - based on M. Costa's C code - simplified - skipped all sorts of command line options - here we simultaneously get mse curves across trials or voxels
% $$$ Usage: [mse, scales]= get_multiple_mse_curves_matlab(data,m,r,scales)
% $$$ Inputs:
% $$$ data = time sereis data in (time variable) format, e.g (time trial) or (time voxel)
% $$$ m = pattern length (default 2)
% $$$ r = similarity criterion (defualt 0.5)
% $$$ scales = vector of scales (default [1:(floor(size(data,1)/50))])
% $$$ override = 0 or 1 (default is 0) - in some rare cases we want to oveeride this min50-rule
% $$$ Outputs:
% $$$ mse = entropy in (scale variable) format
% $$$ scales = return vector of scales actually used by the program
num_tpts = size(data,1);
num_vars = size(data,2);
if ~exist('m','var'), m=2; end
if ~exist('r','var'), r=0.5; end
if ~exist('scales','var'), scales = [1:(floor(num_tpts/50))] ; end
% make sure that scales are positive integers
scales = sort(unique(scales)); % sort scales in increasing order
if sum(scales == round(scales)) ~= numel(scales)
error('scales vector must contain positive integers');
end
if exist('override','var')
if ~override
% make sure that scales are < (num_tpts/50)
scales = sort(intersect(scales,[1:(floor(size(data,1)/50))]));
end
end
% mormalize data and r
sd = std(data,[],1);
r_new = r * sd;
mse = zeros(numel(scales),num_vars); %(scale var)
for s=1:numel(scales)
sc = scales(s);
% coarse grind time series at this scale
num_cg_tpts = floor(num_tpts/sc);
y = zeros(num_cg_tpts, num_vars);
for t = 1:num_cg_tpts
y(t,:) = mean(data((t-1)*sc + [1:sc],:),1);
end
% calculate sample entropy of coarse ground time series y
nlin_sc = num_cg_tpts - m;
cont = zeros(m+1,num_vars);
for var = 1:num_vars
for i = 1:nlin_sc
for l = (i+1):nlin_sc % self-matches are not counted
k = 0;
while ((k < m) & (abs(y(i+k,var) - y(l+k,var)) <= r_new(var)))
k = k + 1;
cont(k,var) = cont(k,var) + 1;
end
if ((k == m) & (abs(y(i+m,var) - y(l+m,var)) <= r_new(var)))
cont(m+1,var) = cont(m+1,var) + 1;
end
end
end
end
% calculate mse at this scale
for var = 1:num_vars
if (cont(m+1,var) == 0 | cont(m,var) == 0)
mse(s,var) = -log(1/((nlin_sc)*(nlin_sc -1)));
else
mse(s,var) = -log(cont(m+1,var)/cont(m,var));
end
end
end % for s=1:numel(scales)
Test 1: Random sample¶
Define the data
td1_10k_dat = np.random.randn(3*10000)[:,np.newaxis]
df_td1_10k_dat = pd.DataFrame(td1_10k_dat)
td1_1k_dat = np.random.randn(3*1000)[:,np.newaxis]
df_td1_1k_dat = pd.DataFrame(td1_1k_dat)
Plot the data
Compute MSE in matlab with original NK function
td1_1k_nkml_matfile = outdir + '/td1_1k_nkml_test.mat'
%%matlab -i td1_1k_dat,td1_1k_nkml_matfile,m2u,r2u,s2u -o td1_1k_nkml_mse,td1_1k_nkml_scales,td1_1k_nkml_dur
tic
[td1_1k_nkml_mse,td1_1k_nkml_scales] = get_multiple_mse_curves_matlab(td1_1k_dat,m2u,r2u,s2u);
td1_1k_nkml_dur=toc;
save(td1_1k_nkml_matfile,'td1_1k_dat', 'td1_1k_nkml_mse', 'td1_1k_nkml_scales', 'td1_1k_nkml_dur')
df_td1_1k_nkml_mse = pd.DataFrame(td1_1k_nkml_mse.copy(),
index=np.squeeze(td1_1k_nkml_scales.copy())).T
df_td1_1k_nkml_mse.columns.names = ['scale']
Compute MSE in python with ported NK function
td1_1k_nkpy_mse,td1_1k_nkpy_sc,td1_1k_nkpy_dur = gmmcs_nkpy(td1_1k_dat,m=m2u,
r=r2u,scales=s2u)
df_td1_1k_nkpy_mse = pd.DataFrame(td1_1k_nkpy_mse, index=td1_1k_nkpy_sc).T
df_td1_1k_nkpy_mse.columns.names = ['scale']
Compute MSE in python using TVB sampen
function
td1_1k_tvb_mse,td1_1k_tvb_sc,td1_1k_tvb_dur = gmmcs_tvb(td1_1k_dat,m=m2u,r=r2u,scales=s2u)
df_td1_1k_tvb_mse = pd.DataFrame(td1_1k_tvb_mse,columns=td1_1k_tvb_sc)
df_td1_1k_tvb_mse.columns.names = ['scale']
Plot the three MSE curves together
Durations:
# Something like this
tmp1 = '00:%.2f' %td1_1k_nkml_dur
tmp2 = ':'.join(td1_1k_nkpy_dur.split(':')[1:]).split('.')[0]
tmp3 = ':'.join(td1_1k_tvb_dur.split(':')[1:]).split('.')[0]
df_td1_1k_durs = pd.DataFrame(np.array([tmp1,tmp2,tmp3]), index=['nkml', 'nkpy', 'tvb'])
df_td1_1k_durs.columns = ['duration']
d(df_td1_1k_durs)
Save to file
df_td1_1k_dat.to_pickle(outdir + '/df_td1_1k_dat.pkl')
df_td1_1k_nkml_mse.to_pickle(outdir + '/df_td1_1k_nkml_mse.pkl')
df_td1_1k_nkpy_mse.to_pickle(outdir + '/df_td1_1k_nkpy_mse.pkl')
df_td1_1k_tvb_mse.to_pickle(outdir + '/df_td1_1k_tvb_mse.pkl')
df_td1_1k_durs.to_pickle(outdir + '/df_td1_1k_durs.pkl')
Test 2: Empirical EEG Data¶
Grab the data, convert from original eeglab format, and save as a standard matlab array.
fdir,fname = os.path.split(empdat_file_orig)
%%matlab -i eeglab_dir,fdir,fname,empdat_file
addpath(genpath(eeglab_dir))
eeg = pop_loadset(fname,fdir);
eeg = eeg_checkset(eeg);
data = double(eeg.data);
save(empdat_file, 'data')
td2_dat = loadmat(empdat_file, struct_as_record=False,
squeeze_me=True)['data']
td2_avt_dat = np.mean(td2_dat,2).T
td2_avtr_dat = np.mean(td2_avt_dat,1)[:,np.newaxis]
df_td2_avt_dat = pd.DataFrame(td2_avt_dat)
df_td2_avtr_dat = pd.DataFrame(td2_avtr_dat)
df_td2_avtch1_dat = pd.DataFrame(td2_avt_dat[:,0])
# Z-scored ones; not sure if using...
Z_td2_dat = zscore(td2_dat,1)
Z_td2_avt_dat = np.mean(Z_td2_dat,2).T
Z_td2_avtr_dat = np.mean(Z_td2_avt_dat,1)[:,np.newaxis]
df_Z_td2_avt_dat = pd.DataFrame(Z_td2_avt_dat)
df_Z_td2_avtr_dat = pd.DataFrame(Z_td2_avtr_dat)
df_Z_td2_avtch1_dat = pd.DataFrame(Z_td2_avt_dat[:,0])
Compute MSE with matlab
td2_avtr_nkml_matfile = outdir + '/td2_avtr_nkml_test.mat'
%%matlab -i td2_avtr_dat,td2_avtr_nkml_matfile,m2u,r2u,s2u -o td2_avtr_nkml_mse,td2_avtr_nkml_scales,td2_avtr_nkml_dur
tic
[td2_avtr_nkml_mse,td2_avtr_nkml_scales] = get_multiple_mse_curves_matlab(td2_avtr_dat,m2u,r2u,s2u);
td2_avtr_nkml_dur=toc;
save(td2_avtr_nkml_matfile,'td2_avtr_dat', 'td2_avtr_nkml_mse', 'td2_avtr_nkml_scales', 'td2_avtr_nkml_dur', 'm2u', 'r2u', 's2u');
df_td2_avtr_nkml_mse = pd.DataFrame(td2_avtr_nkml_mse.copy(),index=np.squeeze(td2_avtr_nkml_scales.copy())).T
df_td2_avtr_nkml_mse.columns.names = ['scale']
Compute MSE with ported python function
td2_avtr_nkpy_mse,td2_avtr_nkpy_sc,td2_avtr_nkpy_dur = gmmcs_nkpy(td2_avtr_dat,m=m2u,
r=r2u,scales=s2u)
df_td2_avtr_nkpy_mse = pd.DataFrame(td2_avtr_nkpy_mse, index=td2_avtr_nkpy_sc).T
df_td2_avtr_nkpy_mse.columns.names = ['scale']
Compute MSE with TVB
td2_avtr_tvb_mse,td2_avtr_tvb_sc,td2_avtr_tvb_dur = gmmcs_tvb(td2_avtr_dat,m=m2u,r=r2u,scales=s2u)
df_td2_avtr_tvb_mse = pd.DataFrame(td2_avtr_tvb_mse,columns=td2_avtr_tvb_sc)
df_td2_avtr_tvb_mse.columns.names = ['scale']
Plot the three sets of MSE curves
Durations
# Something like this
tmp1 = '00:%.2f' %td2_avtr_nkml_dur
tmp2 = ':'.join(td2_avtr_nkpy_dur.split(':')[1:]).split('.')[0]
tmp3 = ':'.join(td2_avtr_tvb_dur.split(':')[1:]).split('.')[0]
df_td2_avtr_durs = pd.DataFrame(np.array([tmp1,tmp2,tmp3]), index=['nkml', 'nkpy', 'tvb'])
df_td2_avtr_durs.columns = ['duration']
d(df_td2_avtr_durs)
Save to file
df_td2_avtr_dat.to_pickle(outdir + '/df_td2_avtr_dat.pkl')
df_td2_avtr_nkml_mse.to_pickle(outdir + '/df_td2_avtr_nkml_mse.pkl')
df_td2_avtr_nkpy_mse.to_pickle(outdir + '/df_td2_avtr_nkpy_mse.pkl')
df_td2_avtr_tvb_mse.to_pickle(outdir + '/df_td2_avtr_tvb_mse.pkl')
df_td2_avtr_durs.to_pickle(outdir + '/df_td2_avtr_durs.pkl')
Test 3: Simulated data¶
Load in the data
f = h5py.File(simdat_file, 'r')
td3_dat = f['tavg_dat'][:]
td3_ts = f['tavg_ts'][:]
f.close()
td3_avr_dat = np.mean(td3_dat,1)[:,np.newaxis]
df_td3_dat = pd.DataFrame(td3_dat,index=td3_ts)
df_td3_avr_dat = pd.DataFrame(td3_avr_dat,index=td3_ts)
Compute MSE with matlab
td3_avr_nkml_matfile = outdir + '/td3_avr_nkml_test.mat'
%%matlab -i td3_avr_dat,td3_avr_nkml_matfile,m2u,r2u,s2u -o td3_avr_nkml_mse,td3_avr_nkml_scales,td3_avr_nkml_dur
tic
[td3_avr_nkml_mse,td3_avr_nkml_scales] = get_multiple_mse_curves_matlab(td3_avr_dat,m2u,r2u,s2u);
td3_avr_nkml_dur=toc;
save(td3_avr_nkml_matfile,'td3_avr_dat', 'td3_avr_nkml_mse', 'td3_avr_nkml_scales', 'td3_avr_nkml_dur', 'm2u', 'r2u', 's2u');
df_td3_avr_nkml_mse = pd.DataFrame(td3_avr_nkml_mse.copy(),index=np.squeeze(td3_avr_nkml_scales.copy())).T
df_td3_avr_nkml_mse.columns.names = ['scale']
Compute MSE with python-ported function
td3_avr_nkpy_mse,td3_avr_nkpy_sc,td3_avr_nkpy_dur = gmmcs_nkpy(td3_avr_dat,m=m2u,
r=r2u,scales=s2u)
df_td3_avr_nkpy_mse = pd.DataFrame(td3_avr_nkpy_mse, index=td3_avr_nkpy_sc).T
df_td3_avr_nkpy_mse.columns.names = ['scale']
Compute MSE with TVB function
td3_avr_tvb_mse,td3_avr_tvb_sc,td3_avr_tvb_dur = gmmcs_tvb(td3_avr_dat,m=m2u,r=r2u,scales=s2u)
df_td3_avr_tvb_mse = pd.DataFrame(td3_avr_tvb_mse,columns=td3_avr_tvb_sc)
df_td3_avr_tvb_mse.columns.names = ['scale']
Plot the three sets of curves together
Durations
# Something like this
tmp1 = '00:%.2f' %td3_avr_nkml_dur
tmp2 = ':'.join(td3_avr_nkpy_dur.split(':')[1:]).split('.')[0]
tmp3 = ':'.join(td3_avr_tvb_dur.split(':')[1:]).split('.')[0]
df_td3_avr_durs = pd.DataFrame(np.array([tmp1,tmp2,tmp3]), index=['nkml', 'nkpy', 'tvb'])
df_td3_avr_durs.columns = ['duration']
d(df_td3_avr_durs)
Save to file
df_td3_avr_dat.to_pickle(outdir + '/df_td3_avr_dat.pkl')
df_td3_avr_nkml_mse.to_pickle(outdir + '/df_td3_avr_nkml_mse.pkl')
df_td3_avr_nkpy_mse.to_pickle(outdir + '/df_td3_avr_nkpy_mse.pkl')
df_td3_avr_tvb_mse.to_pickle(outdir + '/df_td3_avr_tvb_mse.pkl')
df_td3_avr_durs.to_pickle(outdir + '/df_td3_avr_durs.pkl')
Benchmarking¶
%load_ext line_profiler
At the moment get_multiple_mse_curves_py
is taking an obscenely long amount of time to run.
To figure out where the problems are we can use the line_profile
module (and corresponding notebook extension), which will spit out a report of how many times each liner was visited, how long they took to execute each time, and what percentage of the overall run time was spent on each line.
This requires the source code to be defined in a text file, rather than from the command prompt as we have done above.
So first let's use the inspect
modoule to collect the source from our main functions of interest above and write them to file.
_txt = """\
# mse tools
# =========
#
# (see mse_estimation_comparison notebook entry for
# further details on these functions )
#
#
# Imports
import numpy as np
import sys
tvb_dir = '/alexandra/mcintosh_lab/john/Code/libraries_of_mine/github/tvb-pack'
sys.path.append(tvb_dir)
from tvb.analyzers.info import sampen
"""
_txt += inspect.getsource(get_multiple_mse_curves_py) + '\n\n\n\n'
_txt += inspect.getsource(gmmcs_nkpy) + '\n\n\n\n'
_txt += inspect.getsource(gmmcs_tvb) + '\n\n\n\n'
with open (mse_tools_file, 'w') as f: f.write (_txt)
Now import the functions from the new source files, but give them different names (_lp
suffix) to the original ones.
import mse_tools
reload(mse_tools)
gmmcs_tvb_lp = mse_tools.gmmcs_tvb
gmmcs_nkpy_lp = mse_tools.gmmcs_nkpy
get_multiple_mse_curves_py_lp = mse_tools.get_multiple_mse_curves_py
gmmcs_nkpy_avtr_lpreport = outdir + '/gmmcs_nkpy_avtr_lpreport.txt'
%lprun -f get_multiple_mse_curves_py_lp -T $gmmcs_nkpy_avtr_lpreport td2_avtr_nkpy_mse,td2_avtr_nkpy_sc,td2_avtr_nkpy_dur = gmmcs_nkpy_lp(td2_avtr_dat,m=m2u,r=r2u,scales=s2u)
cat $gmmcs_nkpy_avtr_lpreport
So, the vast majority (69%) of time is being spent on on the line
while ((k <= m-1) & (abs(y[i+k,var] - y[l+k,var]) <= r_new[var])):
which is run 1664096 times, at 27532032 ms (?) per hit.