EDA using Pandas
1. Read data
# Read data from csv file
df = pd.read_csv('IMDB-Movie-Data.csv')
df = pd.read_csv('IMDB-Movie-Data.csv',
index_col="Title")
2. View data
Let’s do a quick preview of the
data by using head( ) and tail( ) methods
head( )
Returns the top 5 rows in the
dataset by default
It can also take the number of
rows to be viewed as a parameter
tail( )
Returns the bottom 5 rows in the
dataset by default
3.Describe() which displays all statistical summary of all numerical attributes in the dataframe.
df.describe()
Extract data using rows
loc and iloc are two functions that can
be used to slice data from specific row indexes.
loc – locates the rows by name
loc performs slicing based explicit index.
It takes string indexes to
retrieve data from specified rows
iloc – locates the
rows by integer index
iloc performs slicing
based on Python’s default numerical index.
4.Dealing with Missing Values:
Pandas has isnull() for
detecting null values in a dataframe. Let’s see how to use these methods.
# To check
null values row-wise
df.isnull().sum()
5. Dropping columns and null values
Dropping columns/rows is yet another operation that is
most important for data analysis.
drop(
) function can be used to drop rows or columns
based on condition
# Use drop function to drop columns
data.drop('Metascore', axis=1).head()
Using the above code, the Metascore column
is dropped completely from data. Here axis=
1 specifies that column is to be dropped. These changes will not take
place in actual data unless we specify inplace=True as a parameter in the drop( ) function.
We can also drop rows/ columns with null values by
using dropna(
) function.
# Drops all rows containing missing data
data.dropna()
# Drop all columns containing missing
data
data.dropna(axis=1)
fillna() --> Function used to fill null values with specified values
Comments
Post a Comment