Pandas in Python

Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.

Jupyter Notebooks offer a good environment for using pandas to do data exploration and modeling, but pandas can also be used in text editors just as easily.

 Install and import

We can use 2 methods to install pandas  as below

 Conda install pandas


 !pip3 install pandas

 The “!” at the beginning runs cells as if they were in a terminal.

 Alternatively, if you're currently viewing this article in a Jupyter notebook you can run this cell:

 To import pandas we usually import it with a shorter name since it's used so much:

 Import pandas as pd

pd is the alias name for the pandas, so that it'll be easy to use that whenever it's required.

The Primary 2 components of pandas are Series & DataFrame.

Where Series represents a single column and DataFrame represents Multi-Dimension table made up of collection of series.

Creating DataFrames right in Python is good to know and quite useful when testing new methods and functions you find in the pandas docs.

There are many ways to create a DataFrame from scratch

The Index of this DataFrame was given to us on creation as the numbers 0-3, but we could also create our own when we initialize the DataFrame.


How to Read Data from CSV, JSON

To Read the data from csv file use below

Csv's don't have indexes like our dataframes. we can modify the indexes using to set the parameter index_col using below.

Similarly, we can read data from JSON as well 


Popular posts from this blog

Db2 export command example using file format (del , ixf)

How to fix DB2 Tablespace OFFLINE state issue?

Phases of a load operation