Pandas and itertools are two very handy python libraries that will make your life easier and make you a happier person.

The following are some notes on how to wield these tools in managing and analyzing tvb simulations.

Notebook setup¶

Define some variables

# define system-specific filepaths etc
%run ~/set_localenv_vars.py

# output dir
outdir = le['data_dir'] + '/notebooks/tvb_pandas_itertools'
!mkdir -p $outdir

# stuff for dropbox api
nb_name = 'tvb_pandas_itertools'

# Root path to tvb-library, tvb-data, etc. libraries. 
tvb_folder = le['code_dir'] + '/libraries_of_others/github/tvb-pack'

sim_df_outfile_base = outdir + '/tvb_pandit_sims_'

(Note: once you have worked through the simulation section and saved the results to file using the above filename, you can open this notebook from fresh later on in another session and continue from the subsequent section onwards - as long as you also execute the setup cells here. If you have the ipython 'init cell' extension installed, you can simply click the init button (all the code cells in this section are tagged as initialization cells))

Importage

# Generic stuff

import os,sys,datetime,itertools
from copy import deepcopy

import numpy as np
from scipy.io.matlab import loadmat
import pandas as pd


# Visualization stuff


%matplotlib inline
from matplotlib import pyplot as plt

import seaborn as sns

import plotly 

from IPython.display import display as d,HTML,Image,clear_output
from IPython.html.widgets import interact, interactive

#from ipywidgets import StaticInteract, RangeWidget, RadioWidget


# TVB stuff

# note on these imports: I recently (02/05/2015) tried running this in a fresh installation, and found that 
# a) I had to first call these imports from a python shell in the tvb-pack/scripts folder for them to then 
#   work in from arbitrary locations in subsequent python sessions
# b) the nlatest ipython/jupyter notebook was still crashing. pip install ipython==2.4 gives the most stable 
#    notebook kernels at present. 

sys.path += [tvb_folder + '/' + t for t in ['tvb-library', 'tvb-data', 'tvb-framework',
                                            'matlab', 'external-geodesic']]

from tvb.simulator import models,monitors,coupling,integrators,noise,simulator
from tvb.datatypes import connectivity,projections
from tvb.datatypes.time_series import TimeSeriesRegion

from tvb.analyzers import metric_variance_of_node_variance as metric_varnodevar
from tvb.analyzers import metric_variance_global as metric_varglob

from tvb.analyzers import metric_proxy_metastability
from tvb.analyzers import metric_kuramoto_index

from tvb.analyzers import (pca,node_coherence,node_complex_coherence,
                           node_covariance,correlation_coefficient,
                           cross_correlation,ica)



# Neuroimaging analysis stuff

import nitime
from nitime.timeseries import TimeSeries
from nitime.analysis import CorrelationAnalyzer, CoherenceAnalyzer
from nitime.utils import percent_change
from nitime.viz import drawmatrix_channels, drawgraph_channels, plot_xcorr


# Workdocs-cloudfiles stuff

sys.path.append(le['ipynb_workdocs_dir'])
from new_utils import nb_fig,cloudfiles_nb

Initialize aws api and workdocs-cloudfiles folder

aws_key = 'drowssaperucesyreva'
aws_secret = '?teytidettopsuoyevah'

cnb = cloudfiles_nb('aws', [aws_key,aws_secret])
cnb.initialize_folder(nb_name)

Calico document tools

%%javascript
IPython.load_extensions('calico-spell-check',
                        'calico-document-tools','calico-cell-tools');

First some brief comments on the whats and the whys. These are not tutorials. Newbies are strongly encouraged to read the bloody documentation and google promiscuously when troubleshooting or trying to understand stuff.

[Handy tip when researching python type things: including 'nbviewer' in your search terms. This often brings up nbviewer renders of notebooks. The quality of the hits are often variable, but in my experience this strategy frequently brings up excellent tutorials and useful snippets, which tend to be readable and usable without be excessively technical (unlike e.g. most python documentation pages ). ]

Background - pandas¶

Documentation here. Make sure you check the section on multi-indexing.

Tutorials:

Julia Evans' pandas-cookbook (github, nbviewer )
Chris Fonnesbeck's statistical analysis in python tutorial (github; follow the links for nbviewer gists).
Marcel Caraciolo's big-data tutorial (github, nbviewer )

~~~

Manipulating sets of numbers is a central task in many walks of life. In computational research science it is your bread and butter. You generate blocks of numbers, place them in some data structure, and then will want to be able to flexible access that data structure, access that data structure, write to file, read in again, etc. etc. Over and over again, day after day after day. When working in python the three basic types of data structures you might use for this are lists, dictionaries, and numpy arrays. But I put it to to you: if you're building a wall, why use a wheel-barrow and mud when there's a fork-lift truck and cement mixer available to you at no extra cost?

Pandas is a library that provides some simple but powerful data structures for reading, writing, storing, indexing, manipulating, and visualizing blocks of numbers. At its core is the the DataFrame structure, which are essentially 2D arrays of numbers with many special powers. However their multi-indexing capabilities, which we shall be making use of, make them more like a hybrid between a spreadsheet and a hierarchically nested python dictionary or matlab structure. Very similar to Microsoft Excel pivot tables in that respect (which are actually very neat), but as we all know Excel should be avoided like the plague whenever large amounts of data, and/or anything more than trivial numerical operations are involved.

Pandas dataframes are explicitly based on the dataframe objects in R, which deserve a large amount of the credit for the (still unmatched) power and popularity of the R language for data analysis. An added bonus is therefore that pandas dataframes are highly compatible with R (via the Ipython R magics and rpy2 library), and are a must-have for any dual python/R practioners.

The most useful aspects of pandas Datframes are the ability to easily read from and write to a wide variety of file formats (c.f. documentation), and the tight integration with the plotting capabilities of matplotlib. However here we shall be focusing more on how to exploit multi-indexing when doing simulations or data analyses that are repeated across multiple parameter combinations.

Run TVB Simulations¶

For demonstration purposes, we shall use a very basic example from the tvb demos folder, the generate_region_demo_data.py demo, with a few minor modifications.

For reference: to make the modifications, I simply loaded in the .py file with the following two lines:

tvb_folder = '/media/sf_SharedFolder/Code/git_repos_of_mine/tvb-pack/tvb-library'
%load $tvb_folder/tvb/simulator/demos/generate_region_demo_data.py

(if I didn't have it locally, I could also have loaded it directly from the interweb with the same command):

fileincloud  = 'https://raw.githubusercontent.com/the-virtual-brain/tvb-library/trunk/tvb/simulator/demos/generate_region_demo_data.py'
%load $fileincloud

After loading came a bit of hacking, and the result is the below. This is a very useful and (for me, at least) common procedure for modifying and re-using code snippets.

Caveat: as this is primarily intended as a demonstration, the emphasis in this example has primarily been on explicitness and simplicity, over elegance and generality. For more general (if not more elegant) alternatives, see e.g. the the classes in the 'runner.py' function in tvb-scripting.

Run simulation function¶

def run_sim(sim_len=250,lincouple_a = 0.033,
            speed = 4.0,add_noise_nsig = 2** -10,
            monitors2use = ['tavg', 'fmri', 'eeg'], 
            monitor_periods = {'tavg': 0.48828125,'fmri': 500., 'eeg': 1e3/4096},
            stochast_int_dt = 0.06103515625,
            g2do_a=1.05,g2do_b=-1.,g2do_c=0.0,g2do_d=0.1,
            g2do_e=0.0,g2do_f=1/3.,g2do_g=1.0,g2do_alpha=1.0,
            g2do_beta=0.2,g2do_tau=1.25,g2do_gamma=-1.0):

  oscillator = models.Generic2dOscillator(a=g2do_a,b=g2do_b,c=g2do_c,d=g2do_d,
                                        e=g2do_e,f=g2do_f,g=g2do_g,
                                        alpha=g2do_alpha,beta=g2do_beta,
                                        tau=g2do_tau,gamma=g2do_gamma)

  white_matter = connectivity.Connectivity().default()
  white_matter.speed = np.array(speed)
  white_matter_coupling = coupling.Linear(a=lincouple_a)

  #Initialise an Integrator
  hiss = noise.Additive(nsig=np.array([add_noise_nsig]))
  heunint = integrators.HeunStochastic(dt=stochast_int_dt, noise=hiss) 

 
  mon_list = []
  if 'tavg' in monitors2use:    
    mon_tavg = monitors.TemporalAverage(period=monitor_periods['tavg'])
    mon_list.append(mon_tavg)
 
  if 'fmri' in monitors2use:  
    mon_fmri = monitors.Bold(period=monitor_periods['fmri'])
    mon_list.append(mon_fmri)
    
  if 'eeg' in monitors2use:
    proj_mat_path = tvb_folder+ '/tvb-data/tvb_data/projectionMatrix/'\
                                'region_conn_74_eeg_1020_62.mat'
    eeg_projection = loadmat(proj_mat_path)["ProjectionMatrix"]
    pr = projections.ProjectionRegionEEG()
    pr.projection_data = eeg_projection
    mon_eeg = monitors.EEG(projection_matrix_data=pr, period=monitor_periods['eeg'])
    mon_list.append(mon_eeg)
    
  what_to_watch = tuple(mon_list)
    
    
  #Initialise a Simulator -- Model, Connectivity, Integrator, and Monitors.
  sim = simulator.Simulator(model=oscillator, connectivity=white_matter,
                          coupling=white_matter_coupling,
                          integrator=heunint, monitors=mon_list)
    
  sim.configure()

  #Perform the simulation

  sim_time_lists = {}; sim_data_lists = {}
  for m in monitors2use: 
      sim_time_lists[m] = []
      sim_data_lists[m] = []
 
  for sim_it in sim(simulation_length = sim_len): 
    for m_it, m in enumerate(monitors2use):
      if sim_it[m_it] is not None:
        sim_time_lists[m].append(sim_it[m_it][0])
        sim_data_lists[m].append(sim_it[m_it][1]) 
 
  sim_data_arrs = {}; sim_time_arrs = {}
  for m in monitors2use:
    sim_data_arrs[m]  =  np.array(sim_data_lists[m])
    sim_time_arrs[m]  =  np.array(sim_time_lists[m])
            
  sim_analyzers = {}            
  for m in monitors2use:
    #  for nam,dat,per in zip(['tavg', 'eeg'], 
    #                         [tavg_data_arr, eeg_data_arr ],
    #                         [mon_tavg_period, mon_eeg_period]):
    if len(sim_data_arrs[m]) != 0:
      tsr = TimeSeriesRegion(connectivity=sim.connectivity,
                           data=sim_data_arrs[m],
                           sample_period=monitor_periods[m])
      tsr.configure()
    
      vnv = metric_varnodevar.VarianceNodeVariance(time_series=tsr).evaluate()
      vglob = metric_varglob.GlobalVariance(time_series=tsr).evaluate()
    
      sim_analyzers[m] = {'tsr': tsr, 'vnv': vnv, 'vglob': vglob}

    # other analyzers?
        
        
  returntome = ['sim',
                'sim_analyzers', 
                'sim_data_arrs', 
                'sim_time_arrs']
                
  res = {}
  for r in returntome: res[r] = eval(r)
        
  return res

Parameter iteration¶

In this demonstration we shall vary two simulation parameters: conduction velocity ('speed'), the magnitude of linear coupling function 'a';

myparams = {'speed': np.arange(1,10,1),
            'lincouple_a': np.arange(0.012,0.042,0.002)}

d(myparams)

{'lincouple_a': array([ 0.012,  0.014,  0.016,  0.018,  0.02 ,  0.022,  0.024,  0.026,
         0.028,  0.03 ,  0.032,  0.034,  0.036,  0.038,  0.04 ]),
 'speed': array([1, 2, 3, 4, 5, 6, 7, 8, 9])}

Use the 'product' function from the 'itertools' library to create a list of all possible permutations of these two parameters

myparams_combs = list(itertools.product(*myparams.values()))

d(myparams_combs[0:5])

[(1, 0.012), (1, 0.014), (1, 0.016), (1, 0.018000000000000002), (1, 0.02)]

d(len(myparams_combs))

Now put these parameter combos into a dataframe

df_mpc = pd.DataFrame(myparams_combs, columns=myparams.keys())

d(df_mpc.head())

	speed	lincouple_a
0	1	0.012
1	1	0.014
2	1	0.016
3	1	0.018
4	1	0.020

d(df_mpc.shape)

(135, 2)

Because this is an example notebook, I will only run sims for a subset of the full parameter ranges identified above. The subset is defined as follows:

params_df = df_mpc.ix[np.arange(0,df_mpc.shape[0], 10)]
d(params_df)

	speed	lincouple_a
0	1	0.012
10	1	0.032
20	2	0.022
30	3	0.012
40	3	0.032
50	4	0.022
60	5	0.012
70	5	0.032
80	6	0.022
90	7	0.012
100	7	0.032
110	8	0.022
120	9	0.012
130	9	0.032

Do it¶

Now run the simulations: iterate through the parameters, and store in a dataframe:

Note: depending on the number of parameters and the duration of the simulations, this could take several minutes or several hours, or more.

# this takes a few hours to run

sim_len = 30000 # 30ms
monitors2use = ['tavg', 'fmri', 'eeg']      
all_res = {}
all_dfs = {m: [] for m in monitors2use}

for pdf_it, pdf in enumerate(params_df.index):
    
  print '%s of %s' %(pdf_it+1, params_df.index.shape)
  #_s,_a,_b = df_mpc.ix[d][['speed', 'alpha', 'beta']]

  varied_params = {c: params_df[c].ix[pdf] for c in params_df.columns}    
  params = deepcopy(varied_params)
  params['monitors2use'] = monitors2use
  params['sim_len'] = sim_len
  res = run_sim(**params)
  
  all_res[pdf] = res
    
  for m_it, m in enumerate(monitors2use):

    if len(res['sim_data_arrs'][m])!=0:    # (mainly for fmri...)
    
      if m is ('tavg' or 'fmri'): labs = res['sim'].connectivity.region_labels
      elif m is 'eeg': labs = ['chan%s' %s \
                               for s in np.arange(1,\
                               np.squeeze(res['sim_data_arrs']['eeg']).shape[1]+1)]
    
      mi_arrays_list = [labs] + [np.repeat(v,len(labs)) for v in varied_params.values()]
      mi_names_list = ['label'] + [v for v in varied_params.keys()]  
      mi = pd.MultiIndex.from_arrays(mi_arrays_list, names=mi_names_list)

      df = pd.DataFrame(np.squeeze(res['sim_data_arrs'][m]),
                      index=res['sim_time_arrs'][m],
                      columns=mi)
      df.index.name = 'time'
    
      all_dfs[m].append(df)
        
    #if d_it==0: _merged = _df
    #else: _merged = pd.merge(_merged,_df,join='inner')
        
    #_df = collate_results(_r,_t,_sim)
    #_r,_t

1 of (14,)
2 of (14,)
3 of (14,)
4 of (14,)
5 of (14,)
6 of (14,)
7 of (14,)
8 of (14,)
9 of (14,)
10 of (14,)
11 of (14,)
12 of (14,)
13 of (14,)
14 of (14,)

Side note: the following will suppress warning messages in the notebook. Normally warnings should most definitely eb paid attention to, but once one has done that and is then trying to make clean documentation, they rather cramp one's style.

import logging
l = logging.getLogger()
l.getEffectiveLevel()
l.setLevel(50)

Concatenate sim results and write to file.¶

Final step in assembling the simulation data is to concatenate the dataframes that we've been appending to the list all_dfs on each iteration of the loop.

There are several options when writing to file. .pkl is the easiest for intermediate work. For finished products .csv is a useful option, as is .hdf5.

We will take a look at these data structures after we have saved them to file and re-loaded them.

First the analyzers data

mons2do = ['tavg', 'fmri', 'eeg'] # not using fmri monitor here
new_thing = []
for a in all_res:
 sp = params_df.ix[a]['speed']
 la = params_df.ix[a]['lincouple_a'] 
 for v in ['vnv', 'vglob']:
  for te in mons2do:
    new_thing.append([a, sp, la, te, v, all_res[a]['sim_analyzers'][te][v] ])
  
df_analyzers = pd.DataFrame(new_thing, columns=['sim_num', 'speed', 'lincouple_a', 
                                                'monitor', 'metric', 'val'])
df_analyzers.set_index(['sim_num', 'speed', 'lincouple_a', 'monitor', 'metric'],
                       inplace=True)    

df_analyzers.to_pickle(sim_df_outfile_base + '_analyzers.pkl')

df_sim_tavg = pd.concat(all_dfs['tavg'], axis=1)
df_sim_tavg.to_pickle(sim_df_outfile_base + '_tavg.pkl')

df_sim_fmri = pd.concat(all_dfs['fmri'], axis=1)
df_sim_fmri.to_pickle(sim_df_outfile_base + '_fmri.pkl')

df_sim_eeg = pd.concat(all_dfs['eeg'], axis=1)
df_sim_eeg.to_pickle(sim_df_outfile_base + '_eeg.pkl')

Note: depending on your system and the duration of your simulation / size of parameter space, you may not be allowed to do this concatenation step, and get 'Memory Errors'. This is most likely to be a problem for simulated EEG data. The only real solution to this if it occurs is to chop your data into chunks.

We only need to run all of the above once. Once written to file, we can take a break and pick up where we left off in a different session by loading the saved files back in.

Manipulating the sim dataframe¶

df_sim_tavg = pd.read_pickle(sim_df_outfile_base + '_tavg.pkl')
df_sim_fmri = pd.read_pickle(sim_df_outfile_base + '_fmri.pkl')
df_sim_eeg = pd.read_pickle(sim_df_outfile_base + '_eeg.pkl')

In a lot of the following I want to show snippets of dataframes. Normally the .head() method is adequate for that, but right now it's a little more effective to specify some indices for showing small snippets of dataframes.

ters = df_sim_tavg.index[0:5]   # example rows
tecs = df_sim_tavg.columns[0:5] # example columns

eers = df_sim_eeg.index[0:5]   # example rows
eecs = df_sim_eeg.columns[0:5] # example columns

fers = df_sim_fmri.index[0:5]   # example rows
fecs = df_sim_fmri.columns[0:5] # example columns

So let's take a quick peek now at these dataframes

d(df_sim_tavg.shape)

(61440, 1036)

d(df_sim_tavg[tecs].ix[ters])

label	lA1	lA2	lAMYG	lCCA	lCCP
speed	1	1	1	1	1
lincouple_a	0.012	0.012	0.012	0.012	0.012
time
0.244141	0.974873	1.002680	1.005106	0.914935	1.019114
0.732422	0.969507	1.012972	1.049227	0.896829	1.060719
1.220703	0.994259	0.966510	1.084911	0.852414	1.100583
1.708984	0.991722	0.921848	1.117038	0.840954	1.132655
2.197266	0.975444	0.875518	1.135900	0.878228	1.179683

d(df_sim_eeg.shape)

(122880, 868)

d(df_sim_eeg[eecs].ix[eers])

label	chan1	chan2	chan3	chan4	chan5
speed	1	1	1	1	1
lincouple_a	0.012	0.012	0.012	0.012	0.012
time
0.122070	-0.509663	0.125987	0.752483	0.057992	-1.252670
0.366211	-0.247632	0.159637	0.599065	0.073594	-1.563508
0.610352	-0.335572	-0.006121	0.401648	0.345336	-1.765863
0.854492	-0.200264	-0.008887	0.473726	0.360605	-1.980310
1.098633	-0.301833	0.022276	0.335948	0.170228	-1.957351

d(df_sim_fmri.shape)

(60, 1036)

d(df_sim_fmri[fecs].ix[fers])

label	lA1	lA2	lAMYG	lCCA	lCCP
speed	1	1	1	1	1
lincouple_a	0.012	0.012	0.012	0.012	0.012
time
500	5.130601	-5.220680	4.610563	6.558027	14.498668
1000	4.434944	0.145685	4.218770	4.639515	7.990363
1500	3.836063	4.694045	3.847215	3.239050	2.603489
2000	3.519187	6.843039	3.659793	2.658710	0.173350
2500	3.523891	6.646291	3.697028	2.785391	0.486498

For the simulated data, one of the things we are primarily interested in is correlation matrices.

Pandas has some convenient in-built functions for computing correlations. But first we need isolate the simulation results for individual parameter sets, as the 'df_sim' dataframes each contain sim results from all combinations of the 'speed' and 'lincouple_a' parameters.

There are a few ways to do this, but the most convenient is the 'query' method:

dsf1 = df_sim_fmri.T.query('lincouple_a == 0.012 & speed == 1.0')
dse1 = df_sim_eeg.T.query('lincouple_a == 0.012 & speed == 1.0')
dst1 = df_sim_tavg.T.query('lincouple_a == 0.012 & speed == 1.0')

(Note the transpose (.T) means I need to switch around the example rows and example columns)

d(dsf1[fers].ix[fecs])

		time	500.0	1000.0	1500.0	2000.0	2500.0
label	speed	lincouple_a
lA1	1	0.012	5.130601	4.434944	3.836063	3.519187	3.523891
lA2	1	0.012	-5.220680	0.145685	4.694045	6.843039	6.646291
lAMYG	1	0.012	4.610563	4.218770	3.847215	3.659793	3.697028
lCCA	1	0.012	6.558027	4.639515	3.239050	2.658710	2.785391
lCCP	1	0.012	14.498668	7.990363	2.603489	0.173350	0.486498

Don't worry about the transpose; it's just the case that some pandas operations are more conveniently done on indices and some on columns, so it can be convenient to transpose the dataframe accordingly.

To prove that what we've done here is pick out N rois by N data points time series, take a look at the shape

d(dsf1.shape)

(74, 60)

...N data points is naturally longer for the tavg or eeg data than the fmri data

d(dse1.shape)

(62, 122880)

We can also easily remove the now-redundant multi-index from these single-parameter-set dataframes;

dsf1dl = dsf1.copy()
dsf1dl.index = dsf1dl.index.droplevel(['speed', 'lincouple_a'])
#dsf1dl[fers].ix[fecs]

(to take a look at these we do need to make some example row and column indices though...)

# (do need to make some new example )
fersdl = dsf1dl.T.index[0:5]
fecsdl = dsf1dl.T.columns[0:5]

d(dsf1dl[fersdl].ix[fecsdl])

time	500.0	1000.0	1500.0	2000.0	2500.0
label
lA1	5.130601	4.434944	3.836063	3.519187	3.523891
lA2	-5.220680	0.145685	4.694045	6.843039	6.646291
lAMYG	4.610563	4.218770	3.847215	3.659793	3.697028
lCCA	6.558027	4.639515	3.239050	2.658710	2.785391
lCCP	14.498668	7.990363	2.603489	0.173350	0.486498

Correlation matrices¶

In pandas we can compute correlations using the .corr() method.

Again, because the correlations are computed column-wise, we'll do a transpose first;

d(dsf1.T[fecs].ix[fers].corr())

		label	lA1	lA2	lAMYG	lCCA	lCCP
		speed	1	1	1	1	1
		lincouple_a	0.012	0.012	0.012	0.012	0.012
label	speed	lincouple_a
lA1	1	0.012	1.000000	-0.999589	0.999331	0.996317	0.999074
lA2	1	0.012	-0.999589	1.000000	-0.999431	-0.998253	-0.999893
lAMYG	1	0.012	0.999331	-0.999431	1.000000	0.996442	0.999085
lCCA	1	0.012	0.996317	-0.998253	0.996442	1.000000	0.998989
lCCP	1	0.012	0.999074	-0.999893	0.999085	0.998989	1.000000

Show that this is now an N-region by N-region correlation matrix.

(As we're not rendering a snippet of the dataframe we don't need to include the example row and column indices. This is what your code would normally look like! )

d(dsf1.T.corr().shape)

(74, 74)

We could plot this directly using matplotlib's 'imshow' method; but there are actually some better tools available from the seaborn, statsmodels and nitime libraries.

For present purposes we do want to remove the redundant multi-index levels, because the matrix entries will be indexed by the dataframe index labels, which will include values for speed and lincouple_a if we don't remove them.

dsf1dl = dsf1.copy()
dsf1dl.index = dsf1dl.index.droplevel(['speed', 'lincouple_a'])

dse1dl = dse1.copy()
dse1dl.index = dse1dl.index.droplevel(['speed', 'lincouple_a'])

dst1dl = dst1.copy()
dst1dl.index = dst1dl.index.droplevel(['speed', 'lincouple_a'])

Seaborn¶

fig, ax = plt.subplots(figsize=(10,10))
sns.corrplot(dst1dl.T, annot=False, sig_stars=False, diag_names=False, cmap='coolwarm',ax=ax);
plt.tight_layout()
plt.savefig(f, bbox_inches='tight')
plt.close()

f = outdir + '/seaborn_eeg_corrplot.png'

fig, ax = plt.subplots(figsize=(10,10))
sns.corrplot(dse1dl.T, annot=False, sig_stars=False, diag_names=False,ax=ax);
plt.tight_layout()
plt.savefig(f, bbox_inches='tight')
plt.close()

(0_0)

Figure 2. Seaborn eeg correlation plot.

f = outdir + '/seaborn_fmri_corrplot.png'

fig, ax = plt.subplots(figsize=(10,10))
sns.corrplot(dsf1dl.T, annot=False, sig_stars=False, diag_names=False, ax=ax);
plt.tight_layout()
plt.savefig(f, bbox_inches='tight')
plt.close()

(0_0)

Figure 3. Seaborn fmri correlation plot.

Nitime¶

f = outdir + '/nitime_tavg_corrplot.png'

#reminder: these were the monitor_periods specified above: 
# {'tavg': 0.48828125,'fmri': 500., 'eeg': 1e3/4096}

T_dst1dl = TimeSeries(dst1dl.T.corr(), sampling_interval=500.)
rnames = dst1dl.T.columns
T_dst1dl.metadata['roi'] = rnames

C_dst1dl = CorrelationAnalyzer(T_dst1dl)

#Display the correlation matrix
fig = drawmatrix_channels(C_dst1dl.corrcoef, rnames, size=[10., 10.], color_anchor=0)

fig.savefig(f,bbox_inches='tight')

plt.close()

(0_0)

Figure 4. Nitime tavg correlation plot.

f = outdir + '/nitime_eeg_corrplot.png'

#reminder: these were the monitor_periods specified above: 
# {'tavg': 0.48828125,'fmri': 500., 'eeg': 1e3/4096}

T_dse1dl = TimeSeries(dse1dl.T.corr(), sampling_interval=500.)
rnames = dse1dl.T.columns
T_dse1dl.metadata['roi'] = rnames

C_dse1dl = CorrelationAnalyzer(T_dse1dl)

#Display the correlation matrix
fig = drawmatrix_channels(C_dse1dl.corrcoef, rnames, size=[10., 10.], color_anchor=0)

fig.savefig(f, bbox_inches='tight')

plt.close()

(0_0)

Figure 5. Nitime eeg correlation plot.

f = outdir + '/nitime_fmri_corrplot.png'

#reminder: these were the monitor_periods specified above: 
# {'tavg': 0.48828125,'fmri': 500., 'eeg': 1e3/4096}

T_dsf1dl = TimeSeries(dsf1dl.T.corr(), sampling_interval=500.)
rnames = dsf1dl.T.columns
T_dsf1dl.metadata['roi'] = rnames

C_dsf1dl = CorrelationAnalyzer(T_dsf1dl)

#Display the correlation matrix
fig = drawmatrix_channels(C_dsf1dl.corrcoef, rnames, size=[10., 10.], color_anchor=0)


fig.savefig(f, bbox_inches='tight')


plt.close()

Split the df into a list of individual parameter dfs¶

There are a few ways this can be done. Here is a fairly dumb but straightforward one:

First get the param combos by removing the label level from the index

sl_combs = df_sim_fmri.T.unstack('label').index.values
d(sl_combs)

array([(1, 0.012), (1, 0.032), (2, 0.022), (3, 0.012), (3, 0.032),
       (4, 0.022), (5, 0.012), (5, 0.032), (6, 0.022), (7, 0.012),
       (7, 0.032), (8, 0.022), (9, 0.012), (9, 0.032)], dtype=object)

Combine into a dict

thesedfs = {}
for s in sl_combs:
  thisdf = df_sim_fmri.T.query('speed == %s & lincouple_a == %s' %(s[0],s[1])).T
  thesedfs[s] = thisdf

d(thesedfs.keys()[0:5])

[(7, 0.012), (1, 0.032), (6, 0.022), (4, 0.022), (2, 0.022)]

Comparing simulated and empirical FC matrices¶

Comparing a simulated FC matrix with an empirical FC matrix is simple a matter of correlating the values in one matrix with those in another;

Using the dict produced in the previous section, we can do this as follows:

mat1 = thesedfs.values()[0].corr()
mat2 = thesedfs.values()[1].corr()

acorr = np.corrcoef(np.reshape(mat1.values, [1, mat1.shape[0]*mat1.shape[1] ]),
                    np.reshape(mat2.values, [1, mat2.shape[0]*mat2.shape[1] ]))


d(acorr)

array([[ 1.        ,  0.01196324],
       [ 0.01196324,  1.        ]])

Looking at tvb analyzer outputs¶

Looking at analyzers:

df_analyzers = pd.read_pickle(sim_df_outfile_base + '_analyzers.pkl')

We can turn a hierarchically-indexed dataframe into a matrix with one index on the row and one on the columns using 'unstack'.

First filter the dataframe to include only the variance-node-variance metric for the tavg sim data:

d(df_analyzers.query('monitor == "tavg" & metric == "vnv"'))

					val
sim_num	speed	lincouple_a	monitor	metric
0	1	0.012	tavg	vnv	0.001788
130	9	0.032	tavg	vnv	0.155218
100	7	0.032	tavg	vnv	0.146675
70	5	0.032	tavg	vnv	0.132532
40	3	0.032	tavg	vnv	0.318760
10	1	0.032	tavg	vnv	0.283644
110	8	0.022	tavg	vnv	0.094234
80	6	0.022	tavg	vnv	0.132028
50	4	0.022	tavg	vnv	0.113636
20	2	0.022	tavg	vnv	0.220607
120	9	0.012	tavg	vnv	0.001276
90	7	0.012	tavg	vnv	0.001255
60	5	0.012	tavg	vnv	0.001256
30	3	0.012	tavg	vnv	0.001562

Then 'unstack' to turn it into a matrix, with column variable 'lincouple_a'

d(df_analyzers.query('monitor == "tavg" & metric == "vnv"').unstack('lincouple_a'))

				val
			lincouple_a	0.012	0.022	0.032
sim_num	speed	monitor	metric
0	1	tavg	vnv	0.001788	NaN	NaN
10	1	tavg	vnv	NaN	NaN	0.283644
20	2	tavg	vnv	NaN	0.220607	NaN
30	3	tavg	vnv	0.001562	NaN	NaN
40	3	tavg	vnv	NaN	NaN	0.318760
50	4	tavg	vnv	NaN	0.113636	NaN
60	5	tavg	vnv	0.001256	NaN	NaN
70	5	tavg	vnv	NaN	NaN	0.132532
80	6	tavg	vnv	NaN	0.132028	NaN
90	7	tavg	vnv	0.001255	NaN	NaN
100	7	tavg	vnv	NaN	NaN	0.146675
110	8	tavg	vnv	NaN	0.094234	NaN
120	9	tavg	vnv	0.001276	NaN	NaN
130	9	tavg	vnv	NaN	NaN	0.155218

The NaNs here are simply because those entries aren't present yet, because we only carried out the simulation for a subset of the results of the itertools.product command;

(We can do the same without filtering; the result is the same type of data structure, but with sub-sections for each row)

d(df_analyzers.unstack('lincouple_a').head())

				val
			lincouple_a	0.012	0.022	0.032
sim_num	speed	monitor	metric
0	1	eeg	vglob	10.381337	NaN	NaN
		eeg	vnv	35.928272	NaN	NaN
		fmri	vglob	0.237802	NaN	NaN
		fmri	vnv	0.089762	NaN	NaN
		tavg	vglob	0.137043	NaN	NaN

...but that won't look right for the 2D matrix we want to plot

f = outdir + '/tavg_vnv_imshow.png'

fig, ax = plt.subplots(figsize=(6,6))
ax.imshow(df_analyzers.query('monitor == "tavg" & metric == "vnv"').\
           unstack('lincouple_a').T, cmap='hot')

plt.savefig(f, bbox_inches='tight')
plt.close()

(0_0)

Figure 7. Temporal average variance of node variances - imshow.

or with pcolor (setting nans to 0)

f = outdir + '/tavg_vnv_pcolor.png'

plt.pcolor(df_analyzers.query('monitor == "tavg" & metric == "vnv"').\
           unstack('lincouple_a').T.fillna(0), cmap='hot')
plt.savefig(f, bbox_inches='tight')
plt.close()

(0_0)

Figure 8. Temporal average variance of node variances - pcolor.

adq = df_analyzers.query('monitor == "tavg" & metric == "vnv"')

In the TVB GUI, parameter space explorations are often summarized with single scalara metrics such as the variance of node variances, which may be visualized as a table of circles of variable sizes.

Do that for the following table:

d(adq.reset_index())

	sim_num	speed	lincouple_a	monitor	metric	val
0	0	1	0.012	tavg	vnv	0.001788
1	130	9	0.032	tavg	vnv	0.155218
2	100	7	0.032	tavg	vnv	0.146675
3	70	5	0.032	tavg	vnv	0.132532
4	40	3	0.032	tavg	vnv	0.318760
5	10	1	0.032	tavg	vnv	0.283644
6	110	8	0.022	tavg	vnv	0.094234
7	80	6	0.022	tavg	vnv	0.132028
8	50	4	0.022	tavg	vnv	0.113636
9	20	2	0.022	tavg	vnv	0.220607
10	120	9	0.012	tavg	vnv	0.001276
11	90	7	0.012	tavg	vnv	0.001255
12	60	5	0.012	tavg	vnv	0.001256
13	30	3	0.012	tavg	vnv	0.001562

f = outdir + '/tavg_vnv_scatter_circles.png'

adq.reset_index().plot(kind='scatter', x='lincouple_a', y='speed', 
                       s=adq['val']*1000, figsize=(10,6));
plt.savefig(f, bbox_inches='tight')
plt.close()

(0_0)

Figure 9. Temporal average variance of node variances - scatter circles plot.

I think that'll do for now.

JDG Lab Notebook

Using pandas and itertools to manage and run tvb simulations

Contents

Notebook setup¶

Background - pandas¶

Run TVB Simulations¶

Run simulation function¶

Parameter iteration¶

Do it¶

Concatenate sim results and write to file.¶

Manipulating the sim dataframe¶

Correlation matrices¶

Seaborn¶

Nitime¶

Split the df into a list of individual parameter dfs¶

Comparing simulated and empirical FC matrices¶

Looking at tvb analyzer outputs¶