Python for Effective Data Management and Reporting
by Rackspace Technology Staff
Introduction
The blog introduces using Python®; for data management and reporting. Python uses Jupyter®; Notebook to access data sources to explore reading and writing data for generating reports from the source data.
Jupyter Notebook
In this post, I use the classic Jupyter Notebook running inside a browser from a server that can be local or remote. Jupyter Notebook supports over 40 programming languages, including Python, Java®, and JavaScript®.
Jupyter Notebook consists of the following components:
- Jupyter Server
- Notebook
- Kernel
- Reports and portfolios
Install Jupyter Notebook, open a Notebook, and set the Kernel
NOTE: I used Ubuntu® 18 on my server for this installation.
Perform the following steps:
1. Install all security patches on the server using the following command:
sudo apt upgrade -y
2. Run the following command to install one or more packages for converting notebooks to PDF:
sudo apt install -y texlive-xetex
3. To use Jupyter notebook, install Anaconda, a data science and machine learning
platform based on Python. Download the Anaconda3-2019.03-Linux-x86_64.sh package.
4. Run the following command to ensure this download package is not corrupt:
bash Anaconda3-2019.03-Linux-x86_64.sh
5. Run the following command to install Anaconda:
bash Anaconda3-2019.03-Linux-x86_64.sh
6. After the installation completes, restart the bash shell to include the changes
by running the following command:
source ~/.bashrc
7. Update the conda®; base code by running the following commands:
conda config --set auto_activate_base false
conda update -n base -c defaults conda
8. Create the conda virtual environment (such as **python\_data**) using the latest Python version by running the following command:
conda create -n python_data python=3
9. Activate this virtual environment y running the following command:
conda activate python_data
10. Run the following commands to make a directory to hold this GitHub repo:
mkdir python_data
cd python_data
11. Run the following commands to install the packages:
conda activate python_data
conda install jupyter psutil
Connect to the Jupyter Notebook server
Perform the following steps to connect to the Juniper Notebook server:
1. Activate the virtual environment using the following command:
conda activate python_data
2. Run the following script to start the server:
python get_notebook_token.py
This script uses the following command to start the Jupyter Notebook server and continues to run it outside of the terminal window:
nohup jupyter notebook --no-browser --port=8086
This provides the token and searches the nohup output if you run the preceding
command in the background by using **get\_notebook\_token.py** as highlighted
in the following snapshot:
nohup jupyter notebook --no-browser --port=8086
This provides the token and searches the nohup output if you run the preceding command in the background by using get\_notebook\_token.py as highlighted in the following snapshot:
3. Go to the terminal window of your local machine and enter the following command:
ssh -N -L localhost:8087:localhost:8086 username@<the public IP address of the server>
4. Leave this terminal open while using the Jupyter Notebook server in this session.
5. In the browser, enter the following URL **http://localhost:8087** and provide
the token as input.
After logging, you can create a Jupyter Notebook by using Python code and Markdown cells.
Examining the data in a DataFrame
Use Pandas DataFrames inside of Jupyter Notebook for examining the data in a
DataFrame. Run the following command to import the pandas package.
import pandas as pd
There are different ways to examine data using Pandas DataFrame:
- head(): Displays the first five rows in the DataFrame.
- .tail(): Displays the last five rows in the DataFrame.
- vet\_records: Displays all the records of the DataFrame.
- .dtypes: Shows statistical operations on columns where you can perform these operations.
- .at: Allows the user to change the value of a specific cell.
- f: .assign: Used to add another column of data.
- vet_records.type.count(): Used for counting and grouping data.
- loc: Allows you to use column names to slice data.
- - .head(): Displays the first five rows in the DataFrame.
- .tail(): Displays the last five rows in the DataFrame.
- vet\_records: Displays all the records of the DataFrame.
- .dtypes: Shows statistical operations on columns where you can perform these operations.
- .at: Allows the user to change the value of a specific cell.
- f: .assign: Used to add another column of data.
- vet_records.type.count(): Used for counting and grouping data.
- loc: Allows you to use column names to slice data.
- iloc: Requires the use of index numbers. For example:
.iloc[row, column]. remember: python` indexes starting at 0.
- .isin: Used to gather data about a list of items.**: Requires the use of index numbers. For example:
`.iloc[row, column]. remember: python` indexes starting at 0.
- .isin: Used to gather data about a list of items.
Writing a report
For writing a report, you need to use LaTex®. LaTex, built into JupyterNotebook, helps you create technical and scientific documents and is good at displaying mathematical formulas. You can write LaTex formulas in a Markdown cell. For the Jupyter report, use a data set that allows you to consider some aspects of socioeconomic factors that may impact it. To create the report and make sure to run the following commands to install supporting packages:
- import matplotlib.pyplot
- import numpy
- import pandas
- import random
Conclusion
You can create and share documents that contain live codes, equations, and visualizations that help in data cleaning, transformation, and statistical modeling in data science.

Recent Posts
Relatório sobre o estado da nuvem em 2025
Janeiro 10th, 2025
Padrões de rede híbrida do Google Cloud - Parte 2
Outubro 16th, 2024
Padrões de rede híbrida do Google Cloud - Parte 2
Outubro 15th, 2024
How Rackspace Leverages AWS Systems Manager
Outubro 9th, 2024
O Windows Server impede a sincronização da hora com o Rackspace NTP
Outubro 3rd, 2024