EDA using Pandas

 1. Read data

# Read data from csv file

df = pd.read_csv('IMDB-Movie-Data.csv')


 

 # Read data with specified index.

df = pd.read_csv('IMDB-Movie-Data.csv', index_col="Title")

 



2. View data

Let’s do a quick preview of the data by using head( ) and tail( ) methods

head( ) 

Returns the top 5 rows in the dataset by default 

It can also take the number of rows to be viewed as a parameter

tail( )

Returns the bottom 5 rows in the dataset by default

 3.Describe() which displays all statistical summary of all numerical attributes in the dataframe.

df.describe()


Extract data using rows

loc and iloc are two functions that can be used to slice data from specific row indexes.

loc – locates the rows by name

loc performs slicing based explicit index.

It takes string indexes to retrieve data from specified rows

iloc – locates the rows by integer index

iloc performs slicing based on Python’s default numerical index.

 

4.Dealing with Missing Values:

Pandas has isnull() for detecting null values in a dataframe. Let’s see how to use these methods.

# To check null values row-wise

df.isnull().sum()

5. Dropping columns and null values

Dropping columns/rows is yet another operation that is most important for data analysis.

drop( ) function can be used to drop rows or columns based on condition

# Use drop function to drop columns

data.drop('Metascore', axis=1).head()

Using the above code, the Metascore column is dropped completely from data. Here axis= 1 specifies that column is to be dropped. These changes will not take place in actual data unless we specify inplace=True as a parameter in the drop( ) function.

We can also drop rows/ columns with null values by using dropna( ) function.

# Drops all rows containing missing data

data.dropna()

# Drop all columns containing missing data

data.dropna(axis=1)

fillna() --> Function used to fill null values with specified values

 



Comments

Popular posts from this blog

Db2 export command example using file format (del , ixf)

How to fix DB2 Tablespace OFFLINE state issue?

Phases of a load operation