Seminar 002: Clean code and using data structures¶

Notes¶

In [1]:

Copied!

# Not important for seminar 002 --> ignore
import numpy as np
import pandas as pd
# Not important for seminar 002 --> ignore
import numpy as np
import pandas as pd

Clean code¶

Naming variables¶

The naming of variables is easily one of the most difficult tasks in programming - Believe it or not! Names provide context in almost all cases and help you and others to fully understand code. More often than not, poor naming can result in you or anyone else not understanding what code does and thus extra work.

Try to name as explicit as necessary and as short as possible. Abbreviations are not always a good choice, as not everyone is fully aware of the context. In the following, find two examples that demonstrate the importance of proper variable names.

Which one is easier to understand?¶

Example A¶

In [ ]:

Copied!





x = pd.read_csv("Test.csv")
s = x["s"]
i = x["i"]
c = x["c"]

lc = np.log(c)

print(lc.to_list())
x = pd.read_csv("Test.csv")
s = x["s"]
i = x["i"]
c = x["c"]

lc = np.log(c)

print(lc.to_list())

Example B¶

In [ ]:

Copied!





dataset = pd.read_csv("Test.csv")
sequences = dataset["s"]
identifiers = dataset["i"]
concentrations = dataset["c"]

log_conc = np.log(concentrations)

print(log_conc.to_list())
dataset = pd.read_csv("Test.csv")
sequences = dataset["s"]
identifiers = dataset["i"]
concentrations = dataset["c"]

log_conc = np.log(concentrations)

print(log_conc.to_list())

Overriding internals¶

Be careful not to use internals as names

Once a variable is named similar to any internal function, the functionality will be globally overriden. Here is an example:

dict = 10

# Now trying to use `dict`wont work
my_dict = dict(a=10)

--> Error!

Reduce redundancies¶

Organizing yourself with variables is a good thing! But as with everything else, there can be too much. Scattering many variables across a script/software can lead to chaos sometimes - Especially when a variable is only used once. In the latter case, try to keep them at their dedicated place and only define those variables that are crucial to the flow of a program. Also, make sure that you are not overriding variables by accident. See the following two examples as a demonstration.

Example A¶

In [ ]:

Copied!





start = 10
end = 20
increment = 1
value = 0
values = range(start, end, increment)

for i in values:
    value = i
    print(value)
start = 10
end = 20
increment = 1
value = 0
values = range(start, end, increment)

for i in values:
    value = i
    print(value)

Example B¶

In [ ]:

Copied!

start = 10
end = 20

for value in range(start=start, end=end, 1):
    print(value)
start = 10
end = 20

for value in range(start=start, end=end, 1):
    print(value)

Try to generalize¶

Most of the time it is fine to explicitly implement program logic for a special case, yet this greatly impacts the flexibility of your code. For instance, when we develop research code, we want others to reuse our work, but if it is only catered to our special case no one can make a use out of it. Thus, try to find a general theme in your code and develop it towards a bigger picture - If possible of course!

Attention though, generalisation is something to strive for, but takes multiple iterations. Thus, try to solve a specific problem first and work your way towards a general solution. Here is an example on how one could progress towards generalisation.

Example A: Works fine here!¶

In [ ]:

Copied!





# Our initial code
for i in range(1, 10):
    print(i % 2, end=" remainder - is ")

    if i % 2 == 1:
        print("odd")
    else:
        print("even")
# Our initial code
for i in range(1, 10):
    print(i % 2, end=" remainder - is ")

    if i % 2 == 1:
        print("odd")
    else:
        print("even")

Example B: Problematic¶

In [ ]:

Copied!





# We want to determine other remainders
for i in range(1, 10):
    print(i % 3, end=" remainder - is ")

    if i % 3 == 1:
        print("odd")
    else:
        print("even")
# We want to determine other remainders
for i in range(1, 10):
    print(i % 3, end=" remainder - is ")

    if i % 3 == 1:
        print("odd")
    else:
        print("even")

In [ ]:

Copied!





# We may modify
for i in range(1, 10):
    print(i % 3, end=" remainder - is ")

    if i % 3 == 1:
        print("odd")
    elif i % 3 == 2:
        print("odd")
    else:
        print("even")
# We may modify
for i in range(1, 10):
    print(i % 3, end=" remainder - is ")

    if i % 3 == 1:
        print("odd")
    elif i % 3 == 2:
        print("odd")
    else:
        print("even")

Example C: Generalization¶

In [ ]:

Copied!





# Works all the time
number = 7
for i in range(1, 20, 1):

    if i % number == 0:
        print(f"Divisible by {number}")
    else:
        print(f"Not divisible by {number}")
# Works all the time
number = 7
for i in range(1, 20, 1):

    if i % number == 0:
        print(f"Divisible by {number}")
    else:
        print(f"Not divisible by {number}")

Python technicalities¶

Variable scope¶

Be careful of variables that are overriden by other Python routines. For instance, defining a variable value before entering a loop with the same intermediate variable will end up resulting in overriding your original variable.

In [3]:

Copied!





value = "Nothing here"
for value in range(1, 5):
    pass

# What do you expect?
print(value)
value = "Nothing here"
for value in range(1, 5):
    pass

# What do you expect?
print(value)

File handling¶

It is important to understand, that when reading files in Python, these are usually returned as string or bytes. Hence, when you want to use i.e. numbers from a file, it is important to use type casting to retrieve the correct type.

In [ ]:

Copied!





with open("my_file.txt", "w") as file:
    # File is open now

    for number in [1, 2, 3, 4, 5]:
        file.write(f"{str(number)}\n")

# Read file
numbers = [line.strip() for line in open("my_file.txt", "r")]

# What do you expect?
print(type(numbers[0]))
with open("my_file.txt", "w") as file:
    # File is open now

    for number in [1, 2, 3, 4, 5]:
        file.write(f"{str(number)}\n")

# Read file
numbers = [line.strip() for line in open("my_file.txt", "r")]

# What do you expect?
print(type(numbers[0]))

In [ ]:

Copied!

# Type casting is important
numbers = [int(line.strip()) for line in open("my_file.txt", "r")]

# What do you expect?
print(type(numbers[0]))
# Type casting is important
numbers = [int(line.strip()) for line in open("my_file.txt", "r")]

# What do you expect?
print(type(numbers[0]))

List comprehensions¶

Writing for-loops can take up important space in code and luckily Python provides so called list comprehensions to help reduce the lines of code necessary. You can nest these as deep as you like, but be careful, there are limits that will result in even less readable code. In the following are two examples, where one is valid and the other invalid (hard to understand).

In [ ]:

Copied!

# Valid usage
my_list = [value for value in range(1, 10) if value % 2 == 0]
# Valid usage
my_list = [value for value in range(1, 10) if value % 2 == 0]

In [ ]:

Copied!





# Invalid usage
my_list = [
    (value**power - substraction)
    for value in [1, 2, 3, 4, 5]
    for power in [5, 2, 3, 4, 5]
    for substraction in [-1, 2, 10, 2, 10]
]

# Better
values = [1, 2, 3, 4, 5]
powers = [5, 2, 3, 4, 5]
substractions = [-1, 2, 10, 2, 10]
my_list = []

for val, powr, substr in zip(values, powers, substractions):
    my_list.append(val**powr - substr)
# Invalid usage
my_list = [
    (value**power - substraction)
    for value in [1, 2, 3, 4, 5]
    for power in [5, 2, 3, 4, 5]
    for substraction in [-1, 2, 10, 2, 10]
]

# Better
values = [1, 2, 3, 4, 5]
powers = [5, 2, 3, 4, 5]
substractions = [-1, 2, 10, 2, 10]
my_list = []

for val, powr, substr in zip(values, powers, substractions):
    my_list.append(val**powr - substr)

Tuples and for loops¶

You can make use of a very readable concept in Python by using tuple with for-loops. Essentially Python can "unpack" lists of tuples and you can explicitly use parts of it. Here is an example:

In [2]:

Copied!





tuple_list = [
    ("a", 0),
    ("b", 1),
    ("c", 2),
]

for char, number in tuple_list:
    
    # Stores the first part of the tuple in "char"
    # Stores the second part of the tuple in "number"
    
    print(char, number)
tuple_list = [
    ("a", 0),
    ("b", 1),
    ("c", 2),
]

for char, number in tuple_list:
    
    # Stores the first part of the tuple in "char"
    # Stores the second part of the tuple in "number"
    
    print(char, number)

a 0
b 1
c 2