Sometimes you’ll be reading an article listing industry people and you’ll want to follow a few of them (or all) on Twitter.
Or maybe you want to put together a super quick target list of social profiles for a marketing campaign.
Maybe you are playing a large scale game of Kiss, Marry, Kill that has got out of hand.
Either way, searching for each individual Twitter account sounds too much like hard work.
Enter Twitter scraping.
There are awesome tools out there for extracting data from web pages using Xpath (like Data Miner and Scraper, and Scrape Similar)
But you don’t need any of them.
Here’s what you’ll need:
- Google sheets
- A page to scrape
- 300ml of orange juice
Let’s get scraping.
Step 1 – Pick you target
Let’s say we wanted to get all the Twitter profiles of the people listed in this article of the 58 best SEO people to follow on Twitter.
Copy the URL and paste it into Cell A1 of your Google Sheet:
Step 2 – Understanding IMPORTXML
IMPORTXML lets you imports data from any of various structured data types such XML, HTML, CSV, TSV, and RSS and ATOM XML feeds.
This is what it should look like:
This formula will look at the moon landing wiki page and pull any links (“//a/@href”) that are present on that page into your sheet.
In our case, we’ll only want to grab Twitter links so we’ll filter it to only look for these:
Step 3 – Grab data using IMPORTXML
You’ll want to make a copy of this IMPORTXML formula:
Paste it into Cell A3 and make sure the formula references wherever your URL you want to scrape is (A1).
Hit enter & you’ll see all the Twitter handles appear.
Step 4 – Make it look pretty
Just because you are working in a spreadsheet doesn’t mean your tools need to look ugly.
With just a bit of styling you can make something like this:
The great thing about building tools in Google Sheets is that if you set it up correctly to begin with, you can and use it again and again in the future
Step 5 – Enriching your data
If you want, you can take it a step further and enrich your freshly scraped Twitter handles.
To pull in the number of followers for each handle, just add this formula:
=QUERY(IMPORTXML(A1,”//span[@class=’ProfileNav-value’]/@data-count”),”limit 1 offset 2″)
Just remember to switch out A1 for whichever cell has your Twitter profile link in it.
Oh, and drink that juice.
PS, if you can’t be bothered to make your own sheet, feel free to make a copy of mine here.
This is a super basic tutorial and is only scratching the surface on what you can do with scraping.
Oh, just go steady with the scraping.
While it can be used as a legitimate way to access data on the web, it’s also important to consider the legal implications.
There may be cases where scraping data may be considered illegal. I am not a lawyer, just use some common sense.
Go forth and scrape (sensibly)