As part of your Data Analytics journey, you will have to build knowledge of Python.
In this post I am collecting some of the must know tricks to start your projects:
Data Structures
Lists
Index-value pairs in Python. Their order is maintained, they are mutable, and allows duplicate values.
your_list = ['one', 'two', 'three']
your_list[0]
Dictionaries
Your key-value friend from now on. Remember that the dictionary is unordered and mutable, but the dictionary doesn’t allow duplicate values with the same key in it.
They are great to store large amounts of data for easy and quick access.
your_dictionary = {
'one': 1,
'two': 2,
'three': 3,
'four': 4
}
You can access an element with:
your_dictionary['two']
Sets
The sets are an unordered collection of data types. These are mutable, iterable, and do not allow of any duplicate elements.
Remember that they can’t be indexed.
your_set = {"hello", "fantastic", "world", True, 1, 2}
your_set
Tuples
Tuples are used to store multiple items in a single variable.
A tuple is a collection which is ordered and unchangeable, also they allow duplicates.
They are more space efficient, as they are inmutable, the memory allocation is better handdled.
your_tuple = ("hello", "fantastic", "world")
See a particular value with:
your_tuple[0]
Loops
Non-Pythonic approach
i = 0
new_list= []
while i < len(your_list):
if len(your_list[i]) >= 4:
new_list.append(your_list[i])
i += 1
print(new_list)
A more Pythonic approach would loop over the contents of names, rather than using an index variable.
Looping over the contents of a list
better_list = []
for item in your_list:
if len(item) >= 4:
better_list.append(item)
print(better_list)
Best - List Comprehension
# Print the list created by using list comprehension
best_list = [item for item in your_list if len(item) >= 4]
print(best_list)
Functions
Regular Functions
def my_function():
print("Hello World")
my_function()
def my_function(x,y):
return 5 * x + y
print(my_function(3,1))
Lambda Functions
your_lambda = lambda a : a + 1
print(your_lambda(5))
Or also, with multiple arguments
your_lambda = lambda a, b : a * b
print(your_lambda(2, 3))
Using them together
Here, we will be using the output of the UDF myfunc as the input of the lambda function:
def myfunc(n):
return (lambda a : a * n)
mydoubler = myfunc(2) #here you use a
print(mydoubler(11)) #here you are using n as input
Classes
Creating your first Class:
class Your_Class:
x = 1
p1 = Your_Class()
print(p1.x)
Important: the init() Function
All classes have a function called init(), which is always executed when the class is being initiated.
class Person:
def __init__(self, name, age):
#this function gets executed everytime you will create an object of the class Person
self.name = name
self.age = age
p1 = Person("Yosua", 31)
print(p1.name)
print(p1.age)
The str() Function
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def __str__(self):
return f"{self.name}({self.age})"
p1 = Person("Yosua", 31)
print(p1)
Methods - ‘Class’s Functions’
Methods are function that belong to the objects created of a certain Class.
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def myfunc(self):
print("Hello my name is " + self.name + " and I am " + str(self.age))
p1 = Person("Yosua", 31)
p1.myfunc()
Document your work
You want to make sure that your work is understandable, so make sure you write proper comments in your code. If you do this and also make sure that your code is modular enough, it is a good start:
Docstrings
def add(num1, num2):
"""
Add up two integer numbers.
This function simply wraps the ``+`` operator, and does not
do anything interesting, except for illustrating what
the docstring of a very simple function looks like.
Parameters
----------
num1 : int
First number to add.
num2 : int
Second number to add.
Returns
-------
int
The sum of ``num1`` and ``num2``.
See Also
--------
subtract : Subtract one integer from another.
Examples
--------
>>> add(2, 2)
4
>>> add(10, -10)
0
"""
return num1 + num2
Try me with Google Colaboratory
If you have a Google account, you can check these kind of snippets, as well as few useful UDF’s to work more efficiently with spark directly with your Google Colab account and the code I made available in Github: