Seminar 004: Native functions and visualisation¶
Notes¶
# C Style string formatting
number = 100
sentence = "This number is " + str(number) + "!"
print("C Style: ", sentence)
C Style: This number is 100!
# Python-style string formatting
number = 100
sentence = f"This number is {number}!"
print(f"Python style: {sentence}")
Python style: This number is 10!
# Python-style string formatting (complex)
number, char, boolean = 100, "C", True
sentence = f"The number is {number}, while the char is {char} and the expression is {boolean}"
print(sentence)
The number is 100, while the char is C and the expression is True
# C-Style string formatting (complex)
number, char, boolean = 100, "C", True
sentence = "The number is " + str(number) + ", while the char is " + char + " and the expression is " + str(boolean)
print(sentence)
The number is 100, while the char is C and the expression is True
Native Python Functions¶
Python possess many native functions that help you to efficiently reduce the amount of code. For instance, there are dedicated functions to count sub-strings in strings.
my_string = "This is a test string with a many t's. How many t's?"
amount_t = my_string.count("t")
print(f"There are in total {amount_t} t`s")
There are in total 6 t`s
There are also many other functions and it is advidable to first look if there is already a solution given in Python. Sometimes it can help to conduct a web search, but there are also collections that help give an overview:
Another useful example: Joining strings¶
The "".join()
-method allows us to combine a list of strings into a single string. This can be quite handy, when you are dynamically want to generate outputs or reports.
import random
words = ["Hello", "this", "is", "a", "sentence"]
sentence = random.choices(words, k=5)
" ".join(sentence)
'is a Hello Hello sentence'
# The join method also allows to determine how each
# sub-string will be joined
print("\nfollows ".join(sentence))
is follows a follows Hello follows Hello follows sentence
print("\n".join(sentence))
is a Hello Hello sentence
Lambda functions¶
Basics¶
Lambda functions are basically a "lazy" type of function that can be defined on the fly. Usually these should be avoided, since normal function definitions can be documented and are much better to read.
# Function definition
my_fun = lambda argument1, argument2: argument1 * 10 / argument2
# Using the function
my_fun(10, 5)
get_gc = lambda sequence: (sequence.count("G") + sequence.count("C")) / len(sequence)
get_gc("GCGCGGGAGGCT")
Common application: Keys to native functions¶
# Create a list of tuples
tup_list = [("a", 1, 2), ("b", 2, 100), ("c", 3, 878237)]
# Sort function
sorted(tup_list, key=lambda tup: tup[1], reverse=True)
# Max/Min function
max(tup_list, key=lambda tup: tup[-1])
# Filter data
filtered = filter(lambda tup: tup[0] == "a", tup_list)
list(filtered)
filter_function = lambda tup: tup[0] == "a"
new_list = []
for entry in tup_list:
if filter_function(entry):
new_list.append(entry)
else:
print(f"{entry} is not valid!")
Data visualisation¶
!pip install seaborn matplotlib pandas
# Import an example dataset
import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()
There exist numerous packages for data visualisation in Python, yet Matplotlib and Searborn are two of the most prominent ones. Both offer various plotting styles such as scatter plots, histograms, time-course plots and many more. Seaborn offers a lightweight interface which is very beginner-friendly, while Matplotlib requires more code to achieve the same result. The latter is very handy for complex and custom visualisations, while Seaborn is best to be used for simple and quick data visualisation.
In the following, find two examples to generate a scatter plot for the above dataset:
# Using seaborn, you can pass in a Pandas DataFrame and the column names
# for axes and colouring
sns.scatterplot(
data=iris,
x="sepal_length",
y="sepal_width",
hue="species",
)
<Axes: xlabel='sepal_length', ylabel='sepal_width'>
# Using matplotlib you need a bit more code to achieve the same
# but we can add more custom things, such as additional lines and text
import matplotlib.pyplot as plt
f, ax = plt.subplots()
handles = []
for species in set(iris.species):
df_sub = iris[iris.species == species]
handle = ax.scatter(x=df_sub.sepal_length, y=df_sub.sepal_width, label=species)
handles.append(handle)
median_sepal_length = iris.sepal_length.median()
median_sepal_width = iris.sepal_width.median()
handles += [
ax.axvline(
x=median_sepal_length,
linestyle="--",
c="k",
alpha=0.5,
label="Median length",
),
ax.axhline(
y=median_sepal_width,
linestyle=":",
c="k",
alpha=0.5,
label="Median width"
)
]
ax.set_xlabel("Sepal length $[cm]$")
ax.set_ylabel("Sepal width $[cm]$")
ax.set_title("Iris dataset")
ax.legend(
handles=handles,
loc='upper left',
bbox_to_anchor=(1, 1.0),
fancybox=True,
shadow=True,
ncol=1
)
plt.show()
My own project¶
Pick a problem from your field or what you think should be solved using Python and apply the concepts we've just learned. I will also supply a couple of datasets that you can use and analyse. These will be available at the course's website.
Talking about code is a vital part of scripting and software development, thats why, at the end of this block, you will present your solution or analysis to the plenum. We will discuss problems and possible modifications to your solution. Finally, the same project will be analysed again, but using object-oriented programming which we will learn in the second block.