Site Map - skip to main content

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


hpr3253 :: Pandas Intro

Enigma introduces one of his favorite python modules pandas

<< First, < Previous, Latest >>

Hosted by Enigma on 2021-01-20 is flagged as Clean and is released under a CC-BY-SA license.
Tags: python, data analytics, data science.
Listen in ogg, spx, or mp3 format. | Comments (1)

Part of the series: A Little Bit of Python

Initially based on the podcast "A Little Bit of Python", by Michael Foord, Andrew Kuchling, Steve Holden, Dr. Brett Cannon and Jesse Noller. http://www.voidspace.org.uk/python/weblog/arch_d7_2009_12_19.shtml#e1138

Now the series is open to all.

Welcome to another episode of HPR I'm your host Enigma and today we are going to be talking about one of my favorite python modules Pandas
This will be the first episode in a series I'm naming: For The Love of Python.

First we need to get the module
pip or pip3 install pandas
This will install numpy as well
Pandas uses an object called a dataframe which is a two-dimensional data structure,
i.e., data is aligned in a tabular fashion in rows and columns. Think of a spreadsheet type object in memory

Today we are going to talk about:
1) Importing data from various sources
Csv, excel, sql. More advance topics like Json covered in another episode.
df = pd.read_csv('file name')

2) Accessing data by column names or positionally
print(df.head(5)) # print all columns only first 5 rows
print(df.tail(5)) # print all columns only last 5 rows
print(df.shape) # print number of rows and columns in dataframe
print(df.columns) print column names
print(df[0:1].head(5)) print first two columns first 5 values by column position
print(df['field1].head(5)) print same column first five values by column name

3) Setting column types.
df['FieldName'] = df['FieldName'].astype(int) # sets column as interger
df['FieldName'] = df['FieldName'].astype(str) # sets column to string
df['DateColumn'] = pd.to_datetime(df['DateColumn']) # sets column to Datetime


4) Some basic filtering/manipulation of data.
Splits string at the @ for one split next two lines create 2 columns that use the pieces.
new = df2["Email"].str.split("@", n = 1, expand = True)
df2["user"]= new[0]
df2["domain"]= new[1]

df['col'] = df['Office'].str[:3] # creates a new column grabing the first 3 positions of Office column
df = df[df['FieldName'] != 0] # Only keep rows that have a FieldName value not equal to zero

See example code that you can run at:
Pandas Working example


Comments

Subscribe to the comments RSS feed.

Comment #1 posted on 2021-01-20T01:14:53Z by b-yeezi

New info, even for me

I've been using Pandas and Numpy for years, and didn't know about np.select (from your code example). That's definitely going to come in handy.

<< First, < Previous, Latest >>

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the P in HPR stand for ?
Are you a spammer →
Who hosted this show →
What does HPR mean to you ?