[0, 1]: Tidying up

Function definition is the first step towards code organization and maintainable work. In Python is as simple as writing “def yourfunctionName(arguments):” and then indent the code inside the function. As with other languages the function can return something, and the keyword is return. A good thing that comes with Python is the ease with which we can return multiple variables without resorting to unusual data types:

def myFunction(argument [, otherCommaSeparatedArguments]):
"""this function creates var1 and var2, execute some code and returns them
"""
var1=0.0
var2=[]
codeToExecute
#returning more than one variable is easy, and they can be of different data types
return var1, var2
#since myFunction returns two variables, I simply assign two variables to the output: get1 will receive var1, get2 will receive var2
get1,get2 = myFunction(args)
#a useful gimmick with functions that return many variables where you only need some of them is assigning the output values to dummies:
interstingVar, _, anotherInteresingVar, _, _ = aFunctionWith5Outputs()



Typical examples are statistical tests, they usually return both the statistic and the pvalue, but most of the times we are only interested in the significance:

from scipy.stats import kstest
x=[some data I want to test against a normal distribution]
_, pval= kstest(x, 'norm')


Function definition must be put before the actual calling of the function, otherwise the interpreter wouldn’t know what to call, throwing a NameError.
A small number of small functions is acceptable in a single script, but times may come when bigger designs are needed and the only fact that your main code lies beneath thousands lines of function definition code becomes annoying. That is where the modularization kicks in:
to create our own modules or packs of functions.
The steps are the following:

  1. Create a .py file with all the functions you need (this will be your library, and should be placed at the same level of your main script)
  2. In your main script import the library by omitting the .py extension
  3. Call your functions by using libraryName.function, as with other Python modules

The fact that the module (library) should be in the same folder as the main script is because python automatically consider the script folder as in the PATH, and can therefore check if the imported module exists. Outside its path, python is blind, except when you tell it not to (but it is unnecessary tricky, so we will stick with having libraries and main scripts together).

###statistics.py
"""a collection of useful methods"""
from scipy import stats
def multipleKS(arrayOfData):
"""tests if array of distributions is significantly different from normal, with Kolmogorov Smirnov"""
pvals=[]
for data in arrayOfData:
_, pval =scipy.stats.kstest(data, 'norm')
pvals.append(pval)
return pvals
###main script (main.py)
import statistics
import numpy as np
np.random.seed(seed=233233)
myData=[stats.t.rvs(3,size=1000), stats.norm.rvs(3,1, size=1000)]
print(statistics.multipleKS(myData))
[0.0014383979689356341, 0.0]

Clearly it is possible to split your modules into many files in order to divide functions by type.