Pandas Read Methods

Pandas provide various read methods for different text files, databases and specific data files. Below we can look at the most common read methods that do not require custom files

import pandas as pd

1. read_csv()

The most common way of reading data using pandas is the read_csv method. It simply takes a csv or text file delimited by commas. Notice that the file is in the working directory. You may need to specify full path.

csv_file = pd.read_csv('./sample_file.csv')
csv_file.head()

OUTPUT country capital 0 USA Washington DC 1 Canada Ottawa 2 Mexico Mexico City

2. read_table()

read_table provides the same functionality as read_csv and they can be used interchangeably. In the example below I use a tab delimiter.

table_file = pd.read_table('./tab_file.txt', delimiter='\t')
table_file.head()

OUTPUT company symbol 0 Facebook FB 1 Google GOOG 2 Microsoft MSFT

3. read_html

Pandas also allow us to read a table from html. In case there are several tables we can select individual tables by index. It does this by evaluating the table tags from the page.

html_data = pd.read_html('https://money.cnn.com/data/us_markets/', header=0)
html_data[0].head()

OUTPUT Gainers & Losers Price Change % Change 0 HCAHCA Healthcare Inc... 118.72 10.56 +9.76% 1 FLIRFLIR Systems Inc 56.13 3.72 +7.10% 2 GLWCorning Inc 31.91 2.08 +6.97% 3 RHIRobert Half Intern... 72.64 3.67 +5.32% 4 WMWaste Management I... 85.66 4.17 +5.12%

4. read_sql

If you have an established database connection, you can read data directly from the database using the read_sql method. To demonstrate this I use a local database connection to my stock data sqlite database.

import sqlite3

conn = sqlite3.connect('data.db')
query = "SELECT * FROM stockdata LIMIT 10"

stock_data = pd.read_sql(query , con=conn)
stock_data.head()

OUTPUT Date Open High Low Close Volume Ticker 0 2018:07:24 09:30:00 192.2900 192.460 192.2000 192.4500 244661.0 AAPL 1 2018:07:24 09:31:00 192.7599 192.830 192.1900 192.2447 255873.0 AAPL 2 2018:07:24 09:32:00 192.4700 192.760 192.4600 192.7600 102642.0 AAPL 3 2018:07:24 09:33:00 192.7000 192.805 192.4648 192.4900 121629.0 AAPL 4 2018:07:24 09:34:00 192.8228 192.950 192.7000 192.7000 119910.0 AAPL

5. read_excel

Finally, the last common read method is the read_excel method. Excel files can house multiple spreadsheets and can contain formular and advanced operations so they do require some argument specification.

excel_file = pd.read_excel('./sample_excel.xlsx', sheet_name='Sheet1')
excel_file.head()

OUTPUT city county state 0 New Orleans Orleans LA 1 Brighton Livingston MI 2 Bridgeport Gloucester NJ 3 Anchorage Anchorage AK 4 Hamilton Butler OH