Seminar 004: Native functions and visualisation¶

Notes¶

In [1]:

Copied!

# C Style string formatting
number = 100
sentence = "This number is " + str(number) + "!"

print("C Style: ", sentence)
# C Style string formatting
number = 100
sentence = "This number is " + str(number) + "!"

print("C Style: ", sentence)

C Style:  This number is 100!

In [3]:

Copied!

# Python-style string formatting
number = 100
sentence = f"This number is {number}!"

print(f"Python style: {sentence}")
# Python-style string formatting
number = 100
sentence = f"This number is {number}!"

print(f"Python style: {sentence}")

Python style: This number is 10!

In [5]:

Copied!

# Python-style string formatting (complex)
number, char, boolean = 100, "C", True
sentence = f"The number is {number}, while the char is {char} and the expression is {boolean}"

print(sentence)
# Python-style string formatting (complex)
number, char, boolean = 100, "C", True
sentence = f"The number is {number}, while the char is {char} and the expression is {boolean}"

print(sentence)

The number is 100, while the char is C and the expression is True

In [7]:

Copied!

# C-Style string formatting (complex)
number, char, boolean = 100, "C", True
sentence = "The number is " + str(number) + ", while the char is " + char + " and the expression is " + str(boolean)

print(sentence)
# C-Style string formatting (complex)
number, char, boolean = 100, "C", True
sentence = "The number is " + str(number) + ", while the char is " + char + " and the expression is " + str(boolean)

print(sentence)

The number is 100, while the char is C and the expression is True

Native Python Functions¶

Python possess many native functions that help you to efficiently reduce the amount of code. For instance, there are dedicated functions to count sub-strings in strings.

In [8]:

Copied!

my_string = "This is a test string with a many t's. How many t's?"
amount_t = my_string.count("t")

print(f"There are in total {amount_t} t`s")
my_string = "This is a test string with a many t's. How many t's?"
amount_t = my_string.count("t")

print(f"There are in total {amount_t} t`s")

There are in total 6 t`s

There are also many other functions and it is advidable to first look if there is already a solution given in Python. Sometimes it can help to conduct a web search, but there are also collections that help give an overview:

Another useful example: Joining strings¶

The "".join()-method allows us to combine a list of strings into a single string. This can be quite handy, when you are dynamically want to generate outputs or reports.

In [10]:

Copied!

import random

words = ["Hello", "this", "is", "a", "sentence"]

sentence = random.choices(words, k=5)

" ".join(sentence)
import random

words = ["Hello", "this", "is", "a", "sentence"]

sentence = random.choices(words, k=5)

" ".join(sentence)

Out[10]:

'is a Hello Hello sentence'

In [11]:

Copied!

# The join method also allows to determine how each
# sub-string will be joined

print("\nfollows ".join(sentence))
# The join method also allows to determine how each
# sub-string will be joined

print("\nfollows ".join(sentence))

is
follows a
follows Hello
follows Hello
follows sentence

In [12]:

Copied!

print("\n".join(sentence))
print("\n".join(sentence))

is
a
Hello
Hello
sentence

Lambda functions¶

Basics¶

Lambda functions are basically a "lazy" type of function that can be defined on the fly. Usually these should be avoided, since normal function definitions can be documented and are much better to read.

In [ ]:

Copied!

# Function definition
my_fun = lambda argument1, argument2: argument1 * 10 / argument2

# Using the function
my_fun(10, 5)
# Function definition
my_fun = lambda argument1, argument2: argument1 * 10 / argument2

# Using the function
my_fun(10, 5)

In [ ]:

Copied!

get_gc = lambda sequence: (sequence.count("G") + sequence.count("C")) / len(sequence)
get_gc = lambda sequence: (sequence.count("G") + sequence.count("C")) / len(sequence)

In [ ]:

Copied!

get_gc("GCGCGGGAGGCT")
get_gc("GCGCGGGAGGCT")

Common application: Keys to native functions¶

In [ ]:

Copied!

# Create a list of tuples
tup_list = [("a", 1, 2), ("b", 2, 100), ("c", 3, 878237)]
# Create a list of tuples
tup_list = [("a", 1, 2), ("b", 2, 100), ("c", 3, 878237)]

In [ ]:

Copied!

# Sort function
sorted(tup_list, key=lambda tup: tup[1], reverse=True)
# Sort function
sorted(tup_list, key=lambda tup: tup[1], reverse=True)

In [ ]:

Copied!

# Max/Min function
max(tup_list, key=lambda tup: tup[-1])
# Max/Min function
max(tup_list, key=lambda tup: tup[-1])

In [ ]:

Copied!

# Filter data
filtered = filter(lambda tup: tup[0] == "a", tup_list)

list(filtered)
# Filter data
filtered = filter(lambda tup: tup[0] == "a", tup_list)

list(filtered)

In [ ]:

Copied!





filter_function = lambda tup: tup[0] == "a"
new_list = []

for entry in tup_list:
    if filter_function(entry):
        new_list.append(entry)
    else:
        print(f"{entry} is not valid!")
filter_function = lambda tup: tup[0] == "a"
new_list = []

for entry in tup_list:
    if filter_function(entry):
        new_list.append(entry)
    else:
        print(f"{entry} is not valid!")

Data visualisation¶

In [ ]:

Copied!

!pip install seaborn matplotlib pandas
!pip install seaborn matplotlib pandas

In [ ]:

Copied!

# Import an example dataset
import seaborn as sns

iris = sns.load_dataset('iris')
iris.head()
# Import an example dataset
import seaborn as sns

iris = sns.load_dataset('iris')
iris.head()

There exist numerous packages for data visualisation in Python, yet Matplotlib and Searborn are two of the most prominent ones. Both offer various plotting styles such as scatter plots, histograms, time-course plots and many more. Seaborn offers a lightweight interface which is very beginner-friendly, while Matplotlib requires more code to achieve the same result. The latter is very handy for complex and custom visualisations, while Seaborn is best to be used for simple and quick data visualisation.

In the following, find two examples to generate a scatter plot for the above dataset:

In [ ]:

Copied!





# Using seaborn, you can pass in a Pandas DataFrame and the column names
# for axes and colouring
sns.scatterplot(
    data=iris,
    x="sepal_length",
    y="sepal_width",
    hue="species",
)
# Using seaborn, you can pass in a Pandas DataFrame and the column names
# for axes and colouring
sns.scatterplot(
    data=iris,
    x="sepal_length",
    y="sepal_width",
    hue="species",
)

Out[ ]:

<Axes: xlabel='sepal_length', ylabel='sepal_width'>

No description has been provided for this image

In [ ]:

Copied!





# Using matplotlib you need a bit more code to achieve the same
# but we can add more custom things, such as additional lines and text

import matplotlib.pyplot as plt

f, ax = plt.subplots()
handles = []

for species in set(iris.species):
    df_sub = iris[iris.species == species]
    handle = ax.scatter(x=df_sub.sepal_length, y=df_sub.sepal_width, label=species)

    handles.append(handle)

median_sepal_length = iris.sepal_length.median()
median_sepal_width = iris.sepal_width.median()

handles += [
    ax.axvline(
        x=median_sepal_length,
        linestyle="--",
        c="k",
        alpha=0.5,
        label="Median length",
    ),
    ax.axhline(
        y=median_sepal_width,
        linestyle=":",
        c="k",
        alpha=0.5,
        label="Median width"
    )
]

ax.set_xlabel("Sepal length $[cm]$")
ax.set_ylabel("Sepal width $[cm]$")
ax.set_title("Iris dataset")

ax.legend(
    handles=handles,
    loc='upper left',
    bbox_to_anchor=(1, 1.0),
    fancybox=True,
    shadow=True,
    ncol=1
)


plt.show()
# Using matplotlib you need a bit more code to achieve the same
# but we can add more custom things, such as additional lines and text

import matplotlib.pyplot as plt

f, ax = plt.subplots()
handles = []

for species in set(iris.species):
    df_sub = iris[iris.species == species]
    handle = ax.scatter(x=df_sub.sepal_length, y=df_sub.sepal_width, label=species)

    handles.append(handle)

median_sepal_length = iris.sepal_length.median()
median_sepal_width = iris.sepal_width.median()

handles += [
    ax.axvline(
        x=median_sepal_length,
        linestyle="--",
        c="k",
        alpha=0.5,
        label="Median length",
    ),
    ax.axhline(
        y=median_sepal_width,
        linestyle=":",
        c="k",
        alpha=0.5,
        label="Median width"
    )
]

ax.set_xlabel("Sepal length $[cm]$")
ax.set_ylabel("Sepal width $[cm]$")
ax.set_title("Iris dataset")

ax.legend(
    handles=handles,
    loc='upper left',
    bbox_to_anchor=(1, 1.0),
    fancybox=True,
    shadow=True,
    ncol=1
)


plt.show()

My own project¶

Pick a problem from your field or what you think should be solved using Python and apply the concepts we've just learned. I will also supply a couple of datasets that you can use and analyse. These will be available at the course's website.

Talking about code is a vital part of scripting and software development, thats why, at the end of this block, you will present your solution or analysis to the plenum. We will discuss problems and possible modifications to your solution. Finally, the same project will be analysed again, but using object-oriented programming which we will learn in the second block.