Python for Data Analysis: Reading CSV and Other Files

Reading data with Python¶

As we saw on previous courses we can read data simply using Python.

When you want to work with a file, the first thing to do is to open it. This is done by invoking the open() built-in function.

open() has a single required argument that is the path to the file and has a single return, the file object.

The with statement automatically takes care of closing the file once it leaves the with block, even in cases of error.

Once the file is opened, we can read its content as follows:

filepath = 'btc-market-price.csv'

with open(filepath, 'r') as reader:

for index, line in enumerate(reader.readlines()):

# read just the first 10 lines

if (index < 10):

print(index, line)

Reading the First CSV file:

Available reading functions in Pandas.

pandas read data table

How to open a csv file?

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

%matplotlib inline

df = pd.read_csv('btc-market-price.csv',

header=0,

na_values=['', '?', '-'],

dtype={'Price': 'float'},

parse_dates=[0],

index_col=[0])

Basic code is :

df=pd.read_csv('btc-eth-prices-original.csv')

yn=df3.dropna(how='all', axis=1) # drop columns where all values are null

yn=df3.dropna(how='any', axis=1) #drop columns where any value is null.

yn=df3.dropna(how='all', axis=0) #drop indexes where any all values are null.

How to drop an Unnamed Column or index?

df3.drop(['Unnamed: 0'], axis=1)

How to set a decimal indicator?

exam_df = pd.read_csv('exam_review.csv',

sep='>',

decimal=',')

how to check the data types of particular columns?

exam_df[['math_score', 'french_score']].dtypes

How to set a thousands parameter?

pd.read_csv('exam_review.csv',

sep='>',

thousands=',')

How to skip a row?

pd.read_csv('exam_review.csv',

sep='>',

skiprows=2)

How to skip multiple row?

exam_df = pd.read_csv('exam_review.csv',

sep='>',

decimal=',',

skiprows=[1,3])

How to skip blank lines and set values to nan?

The skip_blank_lines parameter is set to True so blank lines are skipped while we read files.

If we set this parameter to False, then every blank line will be loaded with NaN values into the DataFrame.

pd.read_csv('exam_review.csv',

sep='>',

skip_blank_lines=False)

How to loead specific columns?

pd.read_csv('exam_review.csv',

usecols=['first_name', 'last_name', 'age'],

sep='>')

How to convert a dataframe into series?

exam_test_2 = pd.read_csv('exam_review.csv',

sep='>',

usecols=['last_name'],

squeeze=True)

type(exam_test_2) #see the type if you were successful with this.

## Save to CSV file

Finally we can also save our DataFrame as a CSV file.

How to save a dataframe to CSV?

exam_df.to_csv('out.csv')

exam_df.to_csv('out.csv',

index=None) #this will save you with no index. it is a standard practice

Lets see if we were successful:

Now lets test:

pd.read_csv('out.csv')

How to read and view a specific number of columns:

pd.set_options('display.max_columns', 85)

pd.set_options('display.max_rows', 85)

Python for Data Analysis

Pages

Tuesday, 25 May 2021

Reading CSV and Other Files

Reading data with Python¶

No comments:

Post a Comment