how to indicate the beginning and the end of a loop’s body). Well, in every data language there are certain solutions for certain problems (e.g. (It doesn’t make a difference – but we like to use indentations in bash, too, because the script is more readable that way.) On the other hand, in Python, you don’t need the do and done lines. indentations are obligatory in Python, in bash it’s optional. Note: if you have worked with Python for loops before, you might recognize notable differences. In this case, it’s the simplest possible example: returning the variable to the screen. In the body of the for loop, you’ll add the command(s) that you want to execute on each element of the list. This line tells bash that here starts the body of your for loop. Note: the variable name doesn’t have to be i … It can be anything: f, g, my_variable or anything else… And with that, you’ll be able to refer to this element (and execute commands on it) in the “body” of your for loop.
In each iteration, you’ll store the upcoming element of your list in this i variable. In this specific case, it will be the numbers between 1 and 100. It tells bash what you want to iterate through. This line is called the header of the for loop. It iterates through the numbers between 1 and 100 and it prints them to the screen one by one.Īnd how does it do that? Let’s see that line by line: And then you’ll use your for loop to go through and execute one or more commands on each element of this iterable.
You have to define an iterable (which can be a list or a series of numbers, for instance).
I’ve already introduced bash while loops.Ī for loop works simply. And since this is a repetitive task, your best shot is to write a loop. If you don’t want to iterate through 3,000+ web pages one by one manually, you’ll have to write a script that will do this for you automatically. Note: if you know how for loops work, just skip this and jump to the next headline. We will get there soon… But before everything else, you’ll have to learn how for loops work in bash. So in one sentence: you will scale up our little web scraping project!
You’ll download, extract and clean this data by reusing the code we have already created in the previous episode of this tutorial.You’ll iterate through this list with a for loop and you’ll scrape each transcript one by one.You’ll clean and save these URLs into a list.You’ll extract the unique URLs from TED.com’s html code - for each and every TED talk.They will be downloaded to your server, extracted and cleaned - ready for data analysis. initiatives/q p' |Īnd this was the result we got: (these are only the last lines of the transcript, of course, the whole talk is ~3.000 word long)īy the end of this article you won’t scrape only one but all 3,000+ TED talk transcripts. But if you are excited about something else, after finishing these tutorial articles, feel free to find any project that you fancy! Note: Why TED.com? As I always say, when you run a data science hobby project, you should always pick a topic that you are passionate about. In the previous article, we scraped a TED talk’s transcript from TED.com.