As part of your Data Analytics journey, you will have to build knowledge of Python.
In this post I am collecting some of the must know tricks to start your projects.
Installing Python 🐍
Python is F/OSS and you can download it here.
How to Install Python in Linux
sudo apt update
sudo apt install build-essential software-properties-common -y
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.11 -y
Dont forget to add it to the PATH if you are on Windows - Then check the version with:
python --version
You can also start an interactive session and try it:
python
1+1
Dont forget to check how to install dependencies
Choosing IDE
Python 101 for your Projects
Let’s make a recap on Python’s Data Structures:
Lists
Index-value pairs in Python. Their order is maintained, they are mutable, and allows duplicate values.
your_list = ['one', 'two', 'three']
your_list[0]
Dictionaries
Your key-value friend from now on. Remember that the dictionary is unordered and mutable, but the dictionary doesn’t allow duplicate values with the same key in it.
They are great to store large amounts of data for easy and quick access.
your_dictionary = {
'one': 1,
'two': 2,
'three': 3,
'four': 4
}
You can access an element with:
your_dictionary['two']
Sets
The sets are an unordered collection of data types. These are mutable, iterable, and do not allow of any duplicate elements.
Remember that they can’t be indexed.
your_set = {"hello", "fantastic", "world", True, 1, 2}
your_set
Tuples
Tuples are used to store multiple items in a single variable.
A tuple is a collection which is ordered and unchangeable, also they allow duplicates.
They are more space efficient, as they are inmutable, the memory allocation is better handdled.
your_tuple = ("hello", "fantastic", "world")
See a particular value with:
your_tuple[0]
Loops
Non-Pythonic approach
i = 0
new_list= []
while i < len(your_list):
if len(your_list[i]) >= 4:
new_list.append(your_list[i])
i += 1
print(new_list)
A more Pythonic approach would loop over the contents of names, rather than using an index variable.
Looping over the contents of a list
better_list = []
for item in your_list:
if len(item) >= 4:
better_list.append(item)
print(better_list)
Best - List Comprehension
# Print the list created by using list comprehension
best_list = [item for item in your_list if len(item) >= 4]
print(best_list)
Functions
Regular Functions
def my_function():
print("Hello World")
my_function()
def my_function_a(x,y):
return 5 * x + y
print(my_function_a(3,1))
You can also define functions in other python scripts and call them as:
from Z_Folder_with_UDF import my_function_a as udf
With this approach you can have all your User Defined Functions clearly ordered.
Lambda Functions
your_lambda = lambda a : a + 1
print(your_lambda(5))
Or also, with multiple arguments
your_lambda = lambda a, b : a * b
print(your_lambda(2, 3))
Using them together
Here, we will be using the output of the UDF myfunc as the input of the lambda function:
def myfunc(n):
return (lambda a : a * n)
mydoubler = myfunc(2) #here you use a
print(mydoubler(11)) #here you are using n as input
Classes
This is going to be a little bit tricky if you are not used to OOP.
But dont be scared, you dont need to be a master of Object Oriented Programming before completing your first project - Actually, it is something that you will learn on the way, as you need it.
Creating your first Class
class Your_Class:
x = 1
p1 = Your_Class()
print(p1.x)
Important: the init() Function
All classes have a function called init(), which is always executed when the class is being initiated.
class Person:
def __init__(self, name, age):
#this function gets executed everytime you will create an object of the class Person
self.name = name
self.age = age
p1 = Person("Yosua", 31)
print(p1.name)
print(p1.age)
The str() Function
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def __str__(self):
return f"{self.name}({self.age})"
p1 = Person("Yosua", 31)
print(p1)
Methods - ‘Class’s Functions’
Methods are function that belong to the objects created of a certain Class.
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def myfunc(self):
print("Hello my name is " + self.name + " and I am " + str(self.age))
p1 = Person("Yosua", 31)
p1.myfunc()
Document your work
You want to make sure that your work is understandable, so make sure you write proper comments in your code.
If you do this and also make sure that your code is modular enough (use UD’sF!), it is already a good start:
Docstrings
def add(num1, num2):
"""
Add up two integer numbers.
This function simply wraps the ``+`` operator, and does not
do anything interesting, except for illustrating what
the docstring of a very simple function looks like.
Parameters
----------
num1 : int
First number to add.
num2 : int
Second number to add.
Returns
-------
int
The sum of ``num1`` and ``num2``.
See Also
--------
subtract : Subtract one integer from another.
Examples
--------
>>> add(2, 2)
4
>>> add(10, -10)
0
"""
return num1 + num2
FAQ
How to Install Python Dependencies?
- There are two main repositories for Python Packages:
- The Python Package Index (PyPI): PyPI is the main repository for third-party Python software.
- GitHub: a code hosting platform where developers can store and share their code. The Python organization on GitHub has several repositories that contain the official source code for the Python programming language, as well as documentation, tools, and other resources.
So we know from where we can get Python packages, but now: How to install them and their dependencies properly?
These are some popular ways to install Python dependencies and make sure that we can replicate the working code in other people’s computers.
Python Dependencies with Conda 👈
Conda provides a cross-platform and language-agnostic (not only Python, but R, Julia, C/C#…) solution for managing software environments and dependencies, making it especially valuable for complex projects involving multiple programming languages and libraries.
Its ability to create isolated environments with precise control over package versions ensures reproducibility and minimizes conflicts. Additionally, Conda offers a vast repository of pre-built packages, including many scientific and data analysis libraries, simplifying the installation process.
Create a conda environment:
conda create -n yourcondaenvironment python=3.11
conda activate yourcondaenvironment
Check the Python we are using:
which python
Install inside conda env:
python -m pip install numpy
# $(which python) -m pip install numpy #to make sure it is installing it in this conda environment
You can find and install packages with:
conda search numpy
conda install numpy
Python Dependencies with Venv’s 👈
Python’s built-in venv (virtual environment) module is a powerful tool for creating isolated Python environments specifically for Python projects.
While it’s primarily focused on Python, it’s an excellent choice for managing Python dependencies and ensuring project-specific isolation.
python -m venv myvirtualenv #create it
myvirtualenv\Scripts\activate #activate venv (windows)
source myvirtualenv/bin/activate #(linux)
deactivate #when you are done
Once active, you can just install packages as usual and that will affect only that venv:
pip install package_name
#pip install numpy
#pip install numpy==1.26.4
#pip show numpy #check the installed version
Python Dependencies with Poetry 👈
Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you.
Poetry offers a lockfile to ensure repeatable installs, and can build your project for distribution.
You will need s poetry.toml file like:
[tool.poetry]
name = "my_project"
version = "0.1.0"
description = "A sample Python project"
authors = ["Your Name <your.email@example.com>"]
[tool.poetry.dependencies]
python = "^3.6"
requests = "^2.25.1"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
You will need to use this command to build everything:
poetry build
Python Dependencies with Pipenv 👈
Pipenv is a command-line tool that aids in Python project development.
It combines capabilities of virtualenv and pip with additional features such as dependency management and script execution. Thi
Its goal is to provide a more comprehensive and user-friendly experience for Python project development. It offers several advantages over using virtualenv and pip separately, including:
-
Easy project initialization: Pipenv streamlines the process of initializing new projects by creating a default project directory structure, generating essential project files, and installing basic project dependencies. This simplifies project setup and provides a solid foundation for development.
-
Dependency management and installation: Pipenv simplifies dependency management by maintaining a Pipfile that lists all project dependencies and their versions. It allows developers to easily install new dependencies, lock dependencies to specific versions, and update dependencies to latest releases.
-
Script execution within the virtual environment: Pipenv enables developers to execute project scripts directly within the virtual environment without explicitly activating the environment. This ensures that the scripts utilize the appropriate dependencies and avoid conflicts with the global environment.