As part of your Data Analytics journey, you will have to build knowledge of Python.

In this post I am collecting some of the must know tricks to start your projects:

Data Structures

Lists

Index-value pairs in Python. Their order is maintained, they are mutable, and allows duplicate values.

your_list = ['one', 'two', 'three']
your_list[0]

Dictionaries

Your key-value friend from now on. Remember that the dictionary is unordered and mutable, but the dictionary doesn’t allow duplicate values with the same key in it.

They are great to store large amounts of data for easy and quick access.

your_dictionary = {
                    'one': 1,
                    'two': 2,
                    'three': 3,
                    'four': 4
                }

You can access an element with:

your_dictionary['two']

Sets

The sets are an unordered collection of data types. These are mutable, iterable, and do not allow of any duplicate elements.

Remember that they can’t be indexed.

your_set = {"hello", "fantastic", "world", True, 1, 2}
your_set

Tuples

Tuples are used to store multiple items in a single variable.

A tuple is a collection which is ordered and unchangeable, also they allow duplicates.

They are more space efficient, as they are inmutable, the memory allocation is better handdled.

your_tuple = ("hello", "fantastic", "world")

See a particular value with:

your_tuple[0]

Loops

Non-Pythonic approach

i = 0
new_list= []
while i < len(your_list):
    if len(your_list[i]) >= 4:
        new_list.append(your_list[i])
    i += 1
print(new_list)

A more Pythonic approach would loop over the contents of names, rather than using an index variable.

Looping over the contents of a list

better_list = []

for item in your_list:
    if len(item) >= 4:
        better_list.append(item)
print(better_list)

Best - List Comprehension

# Print the list created by using list comprehension

best_list = [item for item in your_list if len(item) >= 4]
print(best_list)

Functions

Regular Functions

def my_function():
  print("Hello World")

my_function()
def my_function(x,y):
  return 5 * x + y

print(my_function(3,1))

Lambda Functions

your_lambda = lambda a : a + 1

print(your_lambda(5))

Or also, with multiple arguments

your_lambda = lambda a, b : a * b

print(your_lambda(2, 3))

Using them together

Here, we will be using the output of the UDF myfunc as the input of the lambda function:

def myfunc(n):
  return (lambda a : a * n)

mydoubler = myfunc(2) #here you use a

print(mydoubler(11)) #here you are using n as input

Classes

Creating your first Class:

class Your_Class:
  x = 1
p1 = Your_Class()
print(p1.x)

Important: the init() Function

All classes have a function called init(), which is always executed when the class is being initiated.

class Person:
  def __init__(self, name, age):
    #this function gets executed everytime you will create an object of the class Person
    self.name = name
    self.age = age

p1 = Person("Yosua", 31)

print(p1.name)
print(p1.age)

The str() Function

class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age  

  def __str__(self):
    return f"{self.name}({self.age})"

p1 = Person("Yosua", 31)

print(p1)

Methods - ‘Class’s Functions’

Methods are function that belong to the objects created of a certain Class.

class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

  def myfunc(self):
    print("Hello my name is " + self.name + " and I am " + str(self.age))

p1 = Person("Yosua", 31)
p1.myfunc()

Document your work

You want to make sure that your work is understandable, so make sure you write proper comments in your code. If you do this and also make sure that your code is modular enough, it is a good start:

Docstrings

def add(num1, num2):
    """
    Add up two integer numbers.

    This function simply wraps the ``+`` operator, and does not
    do anything interesting, except for illustrating what
    the docstring of a very simple function looks like.

    Parameters
    ----------
    num1 : int
        First number to add.
    num2 : int
        Second number to add.

    Returns
    -------
    int
        The sum of ``num1`` and ``num2``.

    See Also
    --------
    subtract : Subtract one integer from another.

    Examples
    --------
    >>> add(2, 2)
    4
    >>> add(10, -10)
    0
    """
    return num1 + num2

Try me with Google Colaboratory

If you have a Google account, you can check these kind of snippets, as well as few useful UDF’s to work more efficiently with spark directly with your Google Colab account and the code I made available in Github:

Example image