Find and Replace in a CSV using Python

Find and Replace in a CSV using Python

Below is a quick tutorial on using a type of “find and replace” across a CSV file or you could do this find and replace on a TXT file too. I’m not a developer, just another blogger, blogging about learning python. I’m building use cases that are similar to a typical business analyst or data analyst.

You get a CSV file or TEXT file and want to scrape over the entire file, and output it with your modifications. Find and replace is the term I would think you would google if you wanted to do a find and replace in python over a CSV.

Here’s a typical CSV file. Delimited by a comma.

Name,Number,Rank,Website,Flag,Cat
Tyler,43,1,https://tylergarrett.com,Yes,0
Jeff,36,2,https://tylergarrett.com,No,0
Isabelle,57,3,https://tylergarrett.com,Yes,0
Mury,84,4,https://tyler-garrett.com,No,0
Meow,96,6,https://tylergarrett.com,Yes,1
Cats,25,5,https://tylergarr3tt.com,no,1

→ Save file as input.csv

Python, without any downloaded libraries, will do a find and replace on the data above. You can copy and paste the data above into a .txt file, and change the extension to .csv — if you don’t have an excel software on your computer, or you can use Google sheets and make your own data problem.

Our goal is to clean the 3.

The 3 in the https://tylergarr3tt.com link because it’s not accurate.

For whatever reason this is going to be used by a developer and they are asking me to find and replace the “errors.” Errors in the sample data generate a use case to learn how to do a find and replace on a CSV file, which taught me that using the a previous “text file” tutorial, I was able to handle the same ETL like solution, with simple python code, and no odd libraries to be imported.

Our use case will generate a full “find and replace python solution” and a few more obvious data issues. We will remove “3” and replace it with “e” in python below, to help us move down a path of learning and solving your use case today.

The code to find and replace anything on a CSV using python

text = open("input.csv", "r")
text = ''.join([i for i in text]) \
    .replace("3", "e")
x = open("output.csv","w")
x.writelines(text)
x.close()

We are not using any “imports” because this is all native python capability.

Let’s see what the output looks like.

Name,Number,Rank,Website,Flag,Cat
Tyler,4e,1,https://tylergarrett.com,Yes,0
Jeff,e6,2,https://tylergarrett.com,No,0
Isabelle,57,e,https://tylergarrett.com,Yes,0
Mury,84,4,https://tyler-garrett.com,No,0
Meow,96,6,https://tylergarrett.com,Yes,1
Cats,25,5,https://tylergarrett.com,no,1

Successfully cleaning the “3” and also adding an “e” where our “3” used to be.

Look at line 2 & line 3 before:

Jeff,36,2,https://tylergarrett.com,No,0
Isabelle,57,3,https://tylergarrett.com,Yes,0

Look at line 2 & line 3 after:

Jeff,e6,2,https://tylergarrett.com,No,0
Isabelle,57,e,https://tylergarrett.com,Yes,0

The small portion of code:

text = ''.join([i for i in text]) \
    .replace("3", "e")

Allows you to “find and replace” using one line of code.

It looks like 2 lines of code because python lets you put a space and a “\” in front of a long line of code. The space + “\” lets me see what I’m doing easier that writing one wrapping line of code.

An example that may require a “ \” in front of your code, looks something like…because it’s easier to read.

Simple python transformation.

Above image of code is an exaggeration to show you that python lets you break up your find and replace or you can do it in one line. I do it because I would go absolutely crazy if it wasn’t for that little feature, so if you’re like me, woop woop. This is a big help, otherwise

→write it like this…

text = ''.join([i for i in text]).replace("3", "e")

Above code lets us quickly replace “3”’s with “e”’s and lets us heal our link column in the csv file. Find and replace may require a more granular approach which will be covered in another lesson.

Final code to find and replace text in a CSV with python

Find and replace text or letters in a csv with python is simple, super fast, faster than any SQL stuff I’ve played with in the past, so — if you need to bulk process a CSV or TEXT file — python is a great direction because it’s easy to understand.

text = open("input.csv", "r")
text = ''.join([i for i in text]).replace("3", "e")
x = open("output.csv","w")
x.writelines(text)
x.close()

‘’.join is cool because it allows you to add anything, to each row of data.

Notice how…

text = '1'.join([i for i in text]) \

Adds a “1” to each row of data…

Name,Number,Rank,Website,Flag,Cat
1Tyler,4e,1,https://tylergarrett.com,Yes,0
1Jeff,e6,2,https://tylergarrett.com,No,0
1Isabelle,57,e,https://tylergarrett.com,Yes,0
1Mury,84,4,https://tyler-garrett.com,No,0
1Meow,96,6,https://tylergarrett.com,Yes,1
1Cats,25,5,https://tylergarrett.com,no,1

How exciting…

text = open("input.csv", "r")
text = '1.......+'.join([i for i in text]) \
    .replace("3", "e") \
    .replace('+','+\n')
x = open("output.csv","w")
x.writelines(text)
x.close()

Generates…

Name,Number,Rank,Website,Flag,Cat
1.......
Tyler,4e,1,https://tylergarrett.com,Yes,0
1.......
Jeff,e6,2,https://tylergarrett.com,No,0
1.......
Isabelle,57,e,https://tylergarrett.com,Yes,0
1.......
Mury,84,4,https://tyler-garrett.com,No,0
1.......
Meow,96,6,https://tylergarrett.com,Yes,1
1.......
Cats,25,5,https://tylergarrett.com,no,1

Find and replace in python, using the basics. You can use a .txt file too, for both input and output.

Later you will learn split() is a lot of fun to use.

By Tyler Garrett

Built a Tableau Consulting thing and now i do other stuff.

Finding Correlations in Public Data Sources

Finding Correlations in Public Data Sources

Can you choose different public data sources quickly, efficiently, and use that data for data science too? Finding correlations in public data sources used to be a lot of work to wrangle.

What I said above, for most, would require multiple people.

  1.       Data architect – helps you architect your data
  2.       Database admin – helps you access your data
  3.       Data scientist – helps you with predictive analytics
  4.       Developer – migrates data science models into code
  5.       BI Developer – helps you build a dashboard
Finding Correlations in Public Data Sources is like finding the right VM in silicon valley.

Finding Correlations in Public Data Sources is like finding the right VM in Silicon Valley.

In the time it would take for me to email all five people above, I can do all the work in Alteryx, and that’s why I love working at Alteryx.

We intend to find correlations in public data sources.

Finding correlations in public data sources is a lot like finding a pattern on your wallpaper.

Our prediction is that ethnicity, deaths, births, birth fertility, homelessness, and dozens of more measures will offer insights into average survey scores.

public data sources databases wrangled

Finding data sources online can take time, below we offer a few great resources. Artwork found here.

The more our measures correlate, the better we can predict our future.

Remember before reading the findings that statistics are opinionated. Correlation does not mean causation.

Noticing trends in our data sources.

Here’s a scatter plot displaying HCA survey average scores per state and Total Homelessness in 2014. The points on the graph are individual states. You will notice a negative trend sloping down as the survey scores increase, the amount of homeless people decreases too.

Tyler Garrett photo + design Finding Correlations in Public Data Sources

Finding correlations in public data sources on a laptop is possible.

Scoring correlations in our data sources.

If we score the correlation between the two values, across every state, it would enable us to determine what metrics follow a similar pattern.

With scores, we can understand what correlates the most, or does not correlate to the hospital survey scores.

There are hundreds of hospitals per state. Here’s the top ten bar chart, showing how many hospitals per state are being averaged.

Mastering Multiple Public Data Sources

If you want to bring multiple data sources together, you need to be very good at SQL, data manipulation, data architecture, and a lot of spare time in a spreadsheet if this sounds like new skills. Finding correlations is another beast.

Finding a data scientist in your office may be tricky if they are busy or not available, which generates a deep dive in learning R, Python, Statistics, or Excel martial arts.

Luckily, we Alteryx our data, which means we can do all the complex SQL, data manipulation, data architecture, and pipe it through a data science model, in the same window.

What would have taken my group multiple weeks of constant work, only took us 1.5 hours!

Looking for a public data source?

Know that picking a public data source has become readily available in the past few years, and being able to quickly identify powerful data sources has become a distinguishable skill to master.

oh shit using a lot of data sources is hard

Oh shit!

If you are interested in picking up this skill, read our solution below.

I stumbled on this request in a team project last week!

“Everyone, pick multiple data sources…”

This can sound like an easy request…

  1.       Where do you get your data?
  2.       What data is accurate?
  3.       What source of data can we trust?
  4.       What decisions are we attempting to make?

We want to find data sources that will add value to our audience, their questions will have answers, and their next steps will be clear.

In this blog, you will learn how…

You will learn how our team quickly accomplished our project using one simple whiteboarding methodology.

You’re probably thinking, ‘Yes, of course, I can pick a data source, I’m passionate about x, y, z.’

The more you grow in the data industry, the more you realize every new data source comes with new hurdles or barriers to entry.

Is the data clean? Is the data usable? Do we have access to the data?

You begin applying padding around your work, based on previous experiences, and avoid painful experiences.

If you choose to read this entirely, you will be armed with a solution to any brainstorming meeting, and you will learn where we found 3 different data sources online.

That’s right, I’m talking about your future in data.

Eventually, you’re going to hit that ten-year threshold. A decade of database-related solutions.

When you hit ten years, you start to become scared of ‘lots of disparate data sources,’ and learn to avoid these workloads.

When your meeting is going great… Get ready for that meeting to slow down.

Someone is around the corner to suggest another table of data…

The data source will put the brakes on your progress. A new data source that has never been cleaned, prepped, approved… Major ‘nope’ in my line of work…

You were asked, “What data source do you want to use?”

Feel free to play along, mentally put yourself in a group setting, and give yourself a few minutes to pick multiple data sources to present in front of a class of 40 professional peers.

You will be building decisions out of these data sources. Graphs, pie charts, forecasting, insights, etc.

Lastly, work as a team to generate a data solution surrounding these data sources. While you’re at it, don’t get too involved in figuring out what answers you will gain from your data source.

We were tasked to generate a project to kick-start our success stories. In this process, it seemed really challenging to decide on what to choose.

It was interesting because even though we were all eager to get started… We struggled to find the right granularity.

You have access to making documents aka superpowers!

Data lakes, big data, data warehouses, … Each row of data offers a granularity.

Are we looking for an address? Zip code? State? Country?

Stating different parent and child relationships are all great and good. But until these words hit a piece of paper, Google Slides, or a whiteboard… We are left remembering what was said and making decisions based on our memory.

A zip code would be a more granular view of your data than State…

But without documenting our decisions, how do we collectively come together to suggest a path forward?

Also, why am I talking about documents?

Well – documentation is really helpful and it’s tangible information you can point at and say, “LOOK”… and in return, you will likely receive feedback.

Finding multiple data sources in any single industry can feel like a daunting task.

When you’re actively finding multiple public data sources – there are a lot of options.

Noticed we just said “public data sources”…

Yes, as opposed to a private data source! If you’re new in the data industry, let me dive in.

  1.       Public data source – safe to download, safe to share, available for everyone.
  2.       Private data source – not safe to download, not safe to share, not available for everyone.

Here are a few up and coming data source providers. These are public data sets and available to everyone.

  1.      www.kaggle.com
  2.      www.data.world
  3.      www.data.gov

Remember – public data sources are like a Wiki.

Here’s my ‘buyer beware.’ Or maybe we should say ‘extracter beware.’

Anyone can add a data source, anyone can change the data source, and Wiki’s are public – anyone can make an edit. That’s why teachers recommend we avoid using them in scholarly reports because some random person likely added that information on the Wiki.

How did our team pick a data source?

Originally, we struggled to find a data source because we were not documenting our ideas.

Our ideas were lost as soon as we said “anything” out loud.

We would say a great idea, agree it was a great idea, but failed to generate next steps.

What made a difference was documenting our brainstorming session on the dry erase board.

Having everything on the whiteboard allowed everyone to see what was being said.

Not sure how to get started whiteboarding?

Whiteboarding is usually as simple as making a word cloud and circling good items and drawing a line through bad items.

Learn more about advanced analytics on my other blogs.
For example, on my LinkedIn, I write about spatial filtering Google analytics data.
https://www.linkedin.com/pulse/oh-boy-i-wrong-time-spatial-filtering-tyler-garrett/

 

 

When the Internet turns off.

When the Internet turns off.

When the internet turns off, we are left to wait. When the internet turns off, it’s time to get analog again.

Off goes the light. When the Internet turns off, everything stops.

No more online meetings, no more emails, no more tweets, no grams, no messaging application, and online progress halts.

On the other end, the world is spinning, on my end, we are on hold.

Waiting for the Internet to turn on, my single point of failure.

Andoni points to the sky as a plane goes over, I mentioned to him, “Daddy was in an Airplane last week, Airplane. Airplane.”

Tyler Garrett holding Andoni Garrett outside pointing into the sky

Tyler Garrett holding Andoni Garrett outside pointing into the sky at an airplane.

 

You want to use an API and you’re not a developer? Writing word spiders.

You want to use an API and you’re not a developer? Writing word spiders.

You’re cute, you want to use an API and you’re not a developer.

Oh no…  said Mr.Koolaid. “You can’t use an API. Unless you’re a developer.”

Sure, you can fire a URL off to an API but in return, are you able to make sense of the XML data or JSON data formats?

I found a user-friendly API that will help you understand more about API’s.

Today, I was building a workflow to use an API.

I’m calling it a “word spider.”

I’m using a user-friendly API called Sentic, in Alteryx Designer.

I use the Download Tool to easily use an API.

If you’re not a developer, API stands for the hard part you don’t touch, the weird programmer thing…

That thing your friend Cody does but you know… he’s really good, and can’t teach you anything important.

Well, paying for Cody is all great and good but eventually, you’re going to want to change the custom coded solution. None of these wizards want to train a muggle.

Eventually, you’re going to want to support it, fix it, updated it, and that’s just to keep it working.

You’re throwing money down the drain.

Starting with an easy API will help you understand API’s

I found an API that allows me to input words in a URL, just one word, or two words, a phrase, and it will output synonyms.

The full set of conceptual features associated with the concept meet friend in English, for example, can be retrieved with the command http://sentic.net/api/en/concept/meet_friend

Click if you want to see the screenshot above. It takes a little time to load. Most API’s are must faster but this particular API is easy to explain!

Sentiment Scoring – The output comes with a few other ways to see if the words are more negative or positive.

Sentiment Scoring used to be an advanced purchase, now I’m building the entire data solution, in a few hours.

If you have words, write words, read words,… You’re impacted by sentiment analysis.

Government agencies, media companies, and just about every company you can think of has paid for some sort of sentiment analysis of their data.

It has been so complex to do sentiment scoring, people have paid millions for years.

I’m now doing it in minutes and it’s my own custom solution.

A gathering of peoples intelligent works.

So, the API takes words or phrases…

It returns similar words. Word, words, phrases… The API is hungry for your URL input.

What we know.

  • Synonyms are words…
  • It gives us synonyms in response.
  • 1 word = 5 responses.
  • Our word spider merely needs to eat more words.
  • Loop the output into the input.

Right?


So, what if I could loop the data spider output into the input? A loop at Hotel Irvine.

Why not send the words back to the API?

It was easy to exponentially grow the API requests. 2 turned into 600+…

A little slow though.

The API connection we just built is not fast.

They offer the data in XML.

XML is a kind of file format, slash programming language, slash advanced if you aren’t a developer or a bit of a technology nerd.

It has code in the file and it’s a data table but not what you’re used to seeing in excel or a CSV.

I will offer the data in a few formats soon and credit the authors.

Why?

Because you can score and of the words, you’re using in your emails or text messages before sending…

How cool would that be?

The goal, of course, is to to make sure you don’t sound dumb.

www.Sentic.net/API offers an easy API for everyone.

Essentially the Sentic API lets you send it a URL and it outputs XML code, you can transform the XML code into a relational database table, and output it to a report, dashboard, visualization tool, etc…

What’s also interesting is I’ve been ripping apart sentences with “Regular Expression,” and transposing the words vertically.

Alteryx Designers join speeds are so fast I’m able to do string=string matches, with split-second speeds.

It opens a massive door for deeper insights and less work/cost to get there.

I wonder what people use to build applications with the voice to text capability…

Why?

That’s the next step to start another idea.

It would be nice to have mini sentiment scoring capabilities and generate a type of sentiment dashboard for words, emails, or copy on a website.

Automated…

– Tyler 4/24

Kicking back off, 4/27. I was able to generate the above quote. Before I start explaining everything, let me start with one thing.

Smart people don’t care about data prep.

It’s cool to finally be able to understand Ph.D. presentations. I found some Harvard data, lots of different sentiment related data, one sentiment analysis data set that was generated by hand, and also a data set that I’ve transposed into a bar chart scoring tool to capture the overall style, mood, and feelings being portrayed.

Some of the sources offer different styles of models. Modal is like, am I coming across as absolute and definitive? Or am I coming off as off I’m more moderate and general? One source offers a ‘weak score’ too.

All together I’ve curated my own sentiment analysis analytics application.

I’m taking comments from a meeting and transposing each word down one column. Lowercase to assume the ETL position and boom – off to the join, peel out any similar values.

Now we have several different scoring devices. It works really well.

Yes, now you and I have several different sentiment scoring devices.

Here are my 2 cents about sentiment scoring.

What’s interesting is everyone is recreating the wheel. People are building new sentiment dictionaries every day.

Scanning 5 or 10 years ago offers gigs of sentiment analysis data and very little thought leadership around the topic.

Everyone stores their work, output, etc… on outdated websites, with usually very little insights into what they found, the data is generally extremely dirty and shaped in a way that would make excel flustered with building a data visualization.

My next step is to take emails from sales reps who are winning the most deals and see if we can find any interesting outliers.

Looking for things straying from the pack can be easy if you know how to look.

But it’s a lot easier if you know SQL or have Alteryx.

A new post on a new domain.

A new post on a new domain.

Hello readers, yay a blog post. A new post on a new domain.

Welcome to TylerGarrett.com and thanks for reading my first new post on a new domain. TylerGarrett.com is a free resource and a personal blog founded by Tyler.

A new post usually consists of a little context about what you want your blog to be about.

I believe a few topics will suffice.

  1. Technology Demos and walkthroughs.
  2. Growth Hacks and safe SEO tips.
  3. Photography and free stock images.
  4. Music, Mixes, Remixes, etc.
  5. Using data and data visualization to discuss topics.
  6. Guest blogs.
  7. Interviews.

New posts and new domains moving forward…

My new posts will be about tech, photos, music, growth hacks, and data related to these topics.

My family saying hello from a blue bonnet field in Texas.

My family saying hello from a bluebonnet field in Texas.

Guest blogs will be a place where I will beg some genius to say a few words on a topic or two.

An interview will be a similar case. I’m going to bug people I meet while traveling to answer a few questions on the fly or hopefully people will use the website to sign up for interviews and I will stop bugging strangers.

What’s your next new post going to be? [Nerd Alert]

I’ve been quietly journaling offline for the past few months, mostly technology related, as I’m learning Alteryx’s platform, and boy I’m excited about the future posts too.

Just last week I learned how to use APIs without programming and it wasn’t complex.

So hopefully I will be writing about API usage, a user-friendly guide to using APIs, and sharing the data I’m able to grab.

Next post ideas, I need an idea!

Getting idea’s for posts can come in a lot of fashions, like writing about a hotel you stayed in, taking photos, and giving it a rating.

Another idea is using an automated system to give you ideas.

Have you heard of content or idea generators?

You give it an idea. I used weird stuff as my idea.

  • How
  • Weird Stuff
  • Made Me
  • a Better Person

I bet you could write a story that fits the title.