As part of your Data Analytics journey, you will have to build knowledge of Python.

In this post I am collecting some of the must know tricks to start your projects:

Data Structures

Lists

Index-value pairs in Python. Their order is maintained, they are mutable, and allows duplicate values.

your_list = ['one', 'two', 'three']
your_list[0]

Dictionaries

Your key-value friend from now on. Remember that the dictionary is unordered and mutable, but the dictionary doesn’t allow duplicate values with the same key in it.

They are great to store large amounts of data for easy and quick access.

your_dictionary = {
                    'one': 1,
                    'two': 2,
                    'three': 3,
                    'four': 4
                }

You can access an element with:

your_dictionary['two']

Sets

The sets are an unordered collection of data types. These are mutable, iterable, and do not allow of any duplicate elements.

Remember that they can’t be indexed.

your_set = {"hello", "fantastic", "world", True, 1, 2}
your_set

Tuples

Tuples are used to store multiple items in a single variable.

A tuple is a collection which is ordered and unchangeable, also they allow duplicates.

They are more space efficient, as they are inmutable, the memory allocation is better handdled.

your_tuple = ("hello", "fantastic", "world")

See a particular value with:

your_tuple[0]

Loops

Non-Pythonic approach

i = 0
new_list= []
while i < len(your_list):
    if len(your_list[i]) >= 4:
        new_list.append(your_list[i])
    i += 1
print(new_list)

A more Pythonic approach would loop over the contents of names, rather than using an index variable.

Looping over the contents of a list

better_list = []

for item in your_list:
    if len(item) >= 4:
        better_list.append(item)
print(better_list)

Best - List Comprehension

# Print the list created by using list comprehension

best_list = [item for item in your_list if len(item) >= 4]
print(best_list)

Functions

Regular Functions

def my_function():
  print("Hello World")

my_function()
def my_function(x,y):
  return 5 * x + y

print(my_function(3,1))

Lambda Functions

your_lambda = lambda a : a + 1

print(your_lambda(5))

Or also, with multiple arguments

your_lambda = lambda a, b : a * b

print(your_lambda(2, 3))

Using them together

Here, we will be using the output of the UDF myfunc as the input of the lambda function:

def myfunc(n):
  return (lambda a : a * n)

mydoubler = myfunc(2) #here you use a

print(mydoubler(11)) #here you are using n as input

Classes

Creating your first Class:

class Your_Class:
  x = 1
p1 = Your_Class()
print(p1.x)

Important: the init() Function

All classes have a function called init(), which is always executed when the class is being initiated.

class Person:
  def __init__(self, name, age):
    #this function gets executed everytime you will create an object of the class Person
    self.name = name
    self.age = age

p1 = Person("Yosua", 31)

print(p1.name)
print(p1.age)

The str() Function

class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age  

  def __str__(self):
    return f"{self.name}({self.age})"

p1 = Person("Yosua", 31)

print(p1)

Methods - ‘Class’s Functions’

Methods are function that belong to the objects created of a certain Class.

class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

  def myfunc(self):
    print("Hello my name is " + self.name + " and I am " + str(self.age))

p1 = Person("Yosua", 31)
p1.myfunc()

Document your work

You want to make sure that your work is understandable, so make sure you write proper comments in your code. If you do this and also make sure that your code is modular enough, it is a good start:

Docstrings

def add(num1, num2):
    """
    Add up two integer numbers.

    This function simply wraps the ``+`` operator, and does not
    do anything interesting, except for illustrating what
    the docstring of a very simple function looks like.

    Parameters
    ----------
    num1 : int
        First number to add.
    num2 : int
        Second number to add.

    Returns
    -------
    int
        The sum of ``num1`` and ``num2``.

    See Also
    --------
    subtract : Subtract one integer from another.

    Examples
    --------
    >>> add(2, 2)
    4
    >>> add(10, -10)
    0
    """
    return num1 + num2

FAQ

How to Install Python Dependencies?

  • There are two main repositories for Python Packages:
    • The Python Package Index (PyPI): PyPI is the main repository for third-party Python software.
    • GitHub: a code hosting platform where developers can store and share their code. The Python organization on GitHub has several repositories that contain the official source code for the Python programming language, as well as documentation, tools, and other resources.

So we know from where we can get Python packages, but now: How to install them and their dependencies properly?

These are some popular ways to install Python dependencies and make sure that we can replicate the working code in other people’s computers.

Click to expand! Options to manage Python Dependencies 👈

One Option we already saw. It might be an overkill, but it always works - Im talking about using Docker Containers with Python.

Conda

Conda provides a cross-platform and language-agnostic (not only Python, but R, Julia, C/C#…) solution for managing software environments and dependencies, making it especially valuable for complex projects involving multiple programming languages and libraries.

Its ability to create isolated environments with precise control over package versions ensures reproducibility and minimizes conflicts. Additionally, Conda offers a vast repository of pre-built packages, including many scientific and data analysis libraries, simplifying the installation process.

Create a conda environment:

conda create -n yourcondaenvironment python=3.11
conda activate yourcondaenvironment

Check the Python we are using:

which python

Install inside conda env:

python -m pip install numpy
# $(which python) -m pip install numpy #to make sure it is installing it in this conda environment

You can find and install packages with:

conda search numpy
conda install numpy

Poetry

Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you.

Poetry offers a lockfile to ensure repeatable installs, and can build your project for distribution.

You will need s poetry.toml file like:

[tool.poetry]
name = "my_project"
version = "0.1.0"
description = "A sample Python project"
authors = ["Your Name <your.email@example.com>"]

[tool.poetry.dependencies]
python = "^3.6"
requests = "^2.25.1"

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

You will need to use this command to build everything:

poetry build

Venv’s

Python’s built-in venv (virtual environment) module is a powerful tool for creating isolated Python environments specifically for Python projects.

While it’s primarily focused on Python, it’s an excellent choice for managing Python dependencies and ensuring project-specific isolation.

python -m venv myvirtualenv #create it

myvirtualenv\Scripts\activate #activate venv (windows)
source myvirtualenv/bin/activate #(linux)

deactivate #when you are done

Once active, you can just install packages as usual and that will affect only that venv:

pip install package_name
#pip install numpy

pipEnv

Pipenv is a command-line tool that aids in Python project development.

It combines capabilities of virtualenv and pip with additional features such as dependency management and script execution. Thi

Its goal is to provide a more comprehensive and user-friendly experience for Python project development. It offers several advantages over using virtualenv and pip separately, including:

  • Easy project initialization: Pipenv streamlines the process of initializing new projects by creating a default project directory structure, generating essential project files, and installing basic project dependencies. This simplifies project setup and provides a solid foundation for development.

  • Dependency management and installation: Pipenv simplifies dependency management by maintaining a Pipfile that lists all project dependencies and their versions. It allows developers to easily install new dependencies, lock dependencies to specific versions, and update dependencies to latest releases.

  • Script execution within the virtual environment: Pipenv enables developers to execute project scripts directly within the virtual environment without explicitly activating the environment. This ensures that the scripts utilize the appropriate dependencies and avoid conflicts with the global environment.