Reading data with Python¶
As we saw on previous courses we can read data simply using Python.
When you want to work with a file, the first thing to do is to open it. This is done by invoking the open()
built-in function.
open()
has a single required argument that is the path to the file and has a single return, the file object.
The with
statement automatically takes care of closing the file once it leaves the with
block, even in cases of error.
Once the file is opened, we can read its content as follows:
filepath = 'btc-market-price.csv'
with open(filepath, 'r') as reader:
for index, line in enumerate(reader.readlines()):
# read just the first 10 lines
if (index < 10):
print(index, line)
Reading the First CSV file:
Available reading functions in Pandas.
How to open a csv file?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv('btc-market-price.csv',
header=0,
na_values=['', '?', '-'],
dtype={'Price': 'float'},
parse_dates=[0],
index_col=[0])
Basic code is :
df=pd.read_csv('btc-eth-prices-original.csv')
yn=df3.dropna(how='all', axis=1) # drop columns where all values are null
yn=df3.dropna(how='any', axis=1) #drop columns where any value is null.
yn=df3.dropna(how='all', axis=0) #drop indexes where any all values are null.
How to drop an Unnamed Column or index?
df3.drop(['Unnamed: 0'], axis=1)
How to set a decimal indicator?
exam_df = pd.read_csv('exam_review.csv',
sep='>',
decimal=',')
how to check the data types of particular columns?
exam_df[['math_score', 'french_score']].dtypes
How to set a thousands parameter?
pd.read_csv('exam_review.csv',
sep='>',
thousands=',')
How to skip a row?
pd.read_csv('exam_review.csv',
sep='>',
skiprows=2)
How to skip multiple row?
exam_df = pd.read_csv('exam_review.csv',
sep='>',
decimal=',',
skiprows=[1,3])
How to skip blank lines and set values to nan?
The skip_blank_lines
parameter is set to True
so blank lines are skipped while we read files.
If we set this parameter to False
, then every blank line will be loaded with NaN
values into the DataFrame
.
pd.read_csv('exam_review.csv',
sep='>',
skip_blank_lines=False)
How to loead specific columns?
pd.read_csv('exam_review.csv',
usecols=['first_name', 'last_name', 'age'],
sep='>')
How to convert a dataframe into series?
exam_test_2 = pd.read_csv('exam_review.csv',
sep='>',
usecols=['last_name'],
squeeze=True)
type(exam_test_2) #see the type if you were successful with this.
## Save to CSV file
Finally we can also save our DataFrame
as a CSV file.
How to save a dataframe to CSV?
exam_df.to_csv('out.csv')
exam_df.to_csv('out.csv',
index=None) #this will save you with no index. it is a standard practice
Lets see if we were successful:
Now lets test:
pd.read_csv('out.csv')
How to read and view a specific number of columns:
pd.set_options('display.max_columns', 85)
pd.set_options('display.max_rows', 85)
No comments:
Post a Comment