Skip to content

Commit 2ec727c

Browse files
committed
refactor: change image paths to use the shared folder
1 parent fd67efe commit 2ec727c

24 files changed

+86
-86
lines changed

sources/academy/webscraping/scraping_basics_javascript2/01_devtools_inspecting.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,11 @@ Google Chrome is currently the most popular browser, and many others use the sam
2828

2929
Now let's peek behind the scenes of a real-world website—say, Wikipedia. We'll open Google Chrome and visit [wikipedia.org](https://www.wikipedia.org/). Then, let's press **F12**, or right-click anywhere on the page and select **Inspect**.
3030

31-
![Wikipedia with Chrome DevTools open](./images/devtools-wikipedia.png)
31+
![Wikipedia with Chrome DevTools open](../scraping_basics/images/devtools-wikipedia.png)
3232

3333
Websites are built with three main technologies: HTML, CSS, and JavaScript. In the **Elements** tab, DevTools shows the HTML and CSS of the current page:
3434

35-
![Elements tab in Chrome DevTools](./images/devtools-elements-tab.png)
35+
![Elements tab in Chrome DevTools](../scraping_basics/images/devtools-elements-tab.png)
3636

3737
:::warning Screen adaptations
3838

@@ -62,17 +62,17 @@ While HTML and CSS describe what the browser should display, JavaScript adds int
6262

6363
If you don't see it, press <kbd>ESC</kbd> to toggle the Console. Running commands in the Console lets us manipulate the loaded page—we’ll try this shortly.
6464

65-
![Console in Chrome DevTools](./images/devtools-console.png)
65+
![Console in Chrome DevTools](../scraping_basics/images/devtools-console.png)
6666

6767
## Selecting an element
6868

6969
In the top-left corner of DevTools, let's find the icon with an arrow pointing to a square.
7070

71-
![Chrome DevTools element selection tool](./images/devtools-element-selection.png)
71+
![Chrome DevTools element selection tool](../scraping_basics/images/devtools-element-selection.png)
7272

7373
We'll click the icon and hover your cursor over Wikipedia's subtitle, **The Free Encyclopedia**. As we move our cursor, DevTools will display information about the HTML element under it. We'll click on the subtitle. In the **Elements** tab, DevTools will highlight the HTML element that represents the subtitle.
7474

75-
![Chrome DevTools element hover](./images/devtools-hover.png)
75+
![Chrome DevTools element hover](../scraping_basics/images/devtools-hover.png)
7676

7777
The highlighted section should look something like this:
7878

@@ -108,7 +108,7 @@ We won't be creating Node.js scrapers just yet. Let's first get familiar with wh
108108

109109
In the **Elements** tab, with the subtitle element highlighted, let's right-click the element to open the context menu. There, we'll choose **Store as global variable**. The **Console** should appear, with a `temp1` variable ready.
110110

111-
![Global variable in Chrome DevTools Console](./images/devtools-console-variable.png)
111+
![Global variable in Chrome DevTools Console](../scraping_basics/images/devtools-console-variable.png)
112112

113113
The Console allows us to run code in the context of the loaded page. We can use it to play around with elements.
114114

@@ -132,7 +132,7 @@ temp1.textContent = 'Hello World!';
132132

133133
When we change elements in the Console, those changes reflect immediately on the page!
134134

135-
![Changing textContent in Chrome DevTools Console](./images/devtools-console-textcontent.png)
135+
![Changing textContent in Chrome DevTools Console](../scraping_basics/images/devtools-console-textcontent.png)
136136

137137
But don't worry—we haven't hacked Wikipedia. The change only happens in our browser. If we reload the page, the change will disappear. This, however, is an easy way to craft a screenshot with fake content. That's why screenshots shouldn't be trusted as evidence.
138138

@@ -161,7 +161,7 @@ You're looking for an [`img`](https://developer.mozilla.org/en-US/docs/Web/HTML/
161161
1. Send the highlighted element to the **Console** using the **Store as global variable** option from the context menu.
162162
1. In the console, type `temp1.src` and hit **Enter**.
163163

164-
![DevTools exercise result](./images/devtools-exercise-fifa.png)
164+
![DevTools exercise result](../scraping_basics/images/devtools-exercise-fifa.png)
165165

166166
</details>
167167

@@ -178,6 +178,6 @@ Open a news website, such as [CNN](https://cnn.com). Use the Console to change t
178178
1. Send the highlighted element to the **Console** using the **Store as global variable** option from the context menu.
179179
1. In the console, type `temp1.textContent = 'Something something'` and hit **Enter**.
180180

181-
![DevTools exercise result](./images/devtools-exercise-cnn.png)
181+
![DevTools exercise result](../scraping_basics/images/devtools-exercise-cnn.png)
182182

183183
</details>

sources/academy/webscraping/scraping_basics_javascript2/02_devtools_locating_elements.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -30,17 +30,17 @@ That said, we designed all the additional exercises to work with live websites.
3030

3131
As mentioned in the previous lesson, before building a scraper, we need to understand structure of the target page and identify the specific elements our program should extract. Let's figure out how to select details for each product on the [Sales page](https://warehouse-theme-metal.myshopify.com/collections/sales).
3232

33-
![Warehouse store with DevTools open](./images/devtools-warehouse.png)
33+
![Warehouse store with DevTools open](../scraping_basics/images/devtools-warehouse.png)
3434

3535
The page displays a grid of product cards, each showing a product's title and picture. Let's open DevTools and locate the title of the **Sony SACS9 Active Subwoofer**. We'll highlight it in the **Elements** tab by clicking on it.
3636

37-
![Selecting an element with DevTools](./images/devtools-product-title.png)
37+
![Selecting an element with DevTools](../scraping_basics/images/devtools-product-title.png)
3838

3939
Next, let's find all the elements containing details about this subwoofer—its price, number of reviews, image, and more.
4040

4141
In the **Elements** tab, we'll move our cursor up from the `a` element containing the subwoofer's title. On the way, we'll hover over each element until we highlight the entire product card. Alternatively, we can use the arrow-up key. The `div` element we land on is the **parent element**, and all nested elements are its **child elements**.
4242

43-
![Selecting an element with hover](./images/devtools-hover-product.png)
43+
![Selecting an element with hover](../scraping_basics/images/devtools-hover-product.png)
4444

4545
At this stage, we could use the **Store as global variable** option to send the element to the **Console**. While helpful for manual inspection, this isn't something a program can do.
4646

@@ -64,7 +64,7 @@ document.querySelector('.product-item');
6464

6565
It will return the HTML element for the first product card in the listing:
6666

67-
![Using querySelector() in DevTools Console](./images/devtools-queryselector.webp)
67+
![Using querySelector() in DevTools Console](../scraping_basics/images/devtools-queryselector.webp)
6868

6969
CSS selectors can get quite complex, but the basics are enough to scrape most of the Warehouse store. Let's cover two simple types and how they can combine.
7070

@@ -114,13 +114,13 @@ The product card has four classes: `product-item`, `product-item--vertical`, `1/
114114

115115
This class is also unique enough in the page's context. If it were something generic like `item`, there would be a higher risk that developers of the website might use it for unrelated elements. In the **Elements** tab, we can see a parent element `product-list` that contains all the product cards marked as `product-item`. This structure aligns with the data we're after.
116116

117-
![Overview of all the product cards in DevTools](./images/devtools-product-list.png)
117+
![Overview of all the product cards in DevTools](../scraping_basics/images/devtools-product-list.png)
118118

119119
## Locating all product cards
120120

121121
In the **Console**, hovering our cursor over objects representing HTML elements highlights the corresponding elements on the page. This way we can verify that when we query `.product-item`, the result represents the JBL Flip speaker—the first product card in the list.
122122

123-
![Highlighting a querySelector() result](./images/devtools-hover-queryselector.png)
123+
![Highlighting a querySelector() result](../scraping_basics/images/devtools-hover-queryselector.png)
124124

125125
But what if we want to scrape details about the Sony subwoofer we inspected earlier? For that, we need a method that selects more than just the first match: [`querySelectorAll()`](https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelectorAll). As the name suggests, it takes a CSS selector string and returns all matching HTML elements. Let's type this into the **Console**:
126126

@@ -132,7 +132,7 @@ The returned value is a [`NodeList`](https://developer.mozilla.org/en-US/docs/We
132132

133133
We'll expand the result by clicking the small arrow, then hover our cursor over the third element in the list. Indexing starts at 0, so the third element is at index 2. There it is—the product card for the subwoofer!
134134

135-
![Highlighting a querySelectorAll() result](./images/devtools-hover-queryselectorall.png)
135+
![Highlighting a querySelectorAll() result](../scraping_basics/images/devtools-hover-queryselectorall.png)
136136

137137
To save the subwoofer in a variable for further inspection, we can use index access with brackets, just like with regular JavaScript arrays:
138138

@@ -151,7 +151,7 @@ Even though we're just playing in the browser's **Console**, we're inching close
151151

152152
On English Wikipedia's [Main Page](https://en.wikipedia.org/wiki/Main_Page), use CSS selectors in the **Console** to list the HTML elements representing headings of the colored boxes (including the grey ones).
153153

154-
![Wikipedia's Main Page headings](./images/devtools-exercise-wikipedia.png)
154+
![Wikipedia's Main Page headings](../scraping_basics/images/devtools-exercise-wikipedia.png)
155155

156156
<details>
157157
<summary>Solution</summary>
@@ -169,7 +169,7 @@ On English Wikipedia's [Main Page](https://en.wikipedia.org/wiki/Main_Page), use
169169

170170
Go to Shein's [Jewelry & Accessories](https://shein.com/RecommendSelection/Jewelry-Accessories-sc-017291431.html) category. In the **Console**, use CSS selectors to list all HTML elements representing the products.
171171

172-
![Products in Shein's Jewelry & Accessories category](./images/devtools-exercise-shein.png)
172+
![Products in Shein's Jewelry & Accessories category](../scraping_basics/images/devtools-exercise-shein.png)
173173

174174
<details>
175175
<summary>Solution</summary>
@@ -194,7 +194,7 @@ Learn about the [descendant combinator](https://developer.mozilla.org/en-US/docs
194194

195195
:::
196196

197-
![Articles on Guardian's page about F1](./images/devtools-exercise-guardian1.png)
197+
![Articles on Guardian's page about F1](../scraping_basics/images/devtools-exercise-guardian1.png)
198198

199199
<details>
200200
<summary>Solution</summary>

sources/academy/webscraping/scraping_basics_javascript2/03_devtools_extracting_data.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,15 @@ subwoofer.textContent;
3131

3232
That indeed outputs all the text, but in a form which would be hard to break down to relevant pieces.
3333

34-
![Printing text content of the parent element](./images/devtools-extracting-text.png)
34+
![Printing text content of the parent element](../scraping_basics/images/devtools-extracting-text.png)
3535

3636
We'll need to first locate relevant child elements and extract the data from each of them individually.
3737

3838
## Extracting title
3939

4040
We'll use the **Elements** tab of DevTools to inspect all child elements of the product card for the Sony subwoofer. We can see that the title of the product is inside an `a` element with several classes. From those the `product-item__title` seems like a great choice to locate the element.
4141

42-
![Finding child elements](./images/devtools-product-details.png)
42+
![Finding child elements](../scraping_basics/images/devtools-product-details.png)
4343

4444
Browser JavaScript represents HTML elements as [Element](https://developer.mozilla.org/en-US/docs/Web/API/Element) objects. Among properties we've already played with, such as `textContent` or `outerHTML`, it also has the [`querySelector()`](https://developer.mozilla.org/en-US/docs/Web/API/Element/querySelector) method. Here the method looks for matches only within children of the element:
4545

@@ -50,13 +50,13 @@ title.textContent;
5050

5151
Notice we're calling `querySelector()` on the `subwoofer` variable, not `document`. And just like this, we've scraped our first piece of data! We've extracted the product title:
5252

53-
![Extracting product title](./images/devtools-extracting-title.png)
53+
![Extracting product title](../scraping_basics/images/devtools-extracting-title.png)
5454

5555
## Extracting price
5656

5757
To figure out how to get the price, we'll use the **Elements** tab of DevTools again. We notice there are two prices, a regular price and a sale price. For the purposes of watching prices we'll need the sale price. Both are `span` elements with the `price` class.
5858

59-
![Finding child elements](./images/devtools-product-details.png)
59+
![Finding child elements](../scraping_basics/images/devtools-product-details.png)
6060

6161
We could either rely on the fact that the sale price is likely to be always the one which is highlighted, or that it's always the first price. For now we'll rely on the later and we'll let `querySelector()` to simply return the first result:
6262

@@ -67,7 +67,7 @@ price.textContent;
6767

6868
It works, but the price isn't alone in the result. Before we'd use such data, we'd need to do some **data cleaning**:
6969

70-
![Extracting product price](./images/devtools-extracting-price.png)
70+
![Extracting product price](../scraping_basics/images/devtools-extracting-price.png)
7171

7272
But for now that's okay. We're just testing the waters now, so that we have an idea about what our scraper will need to do. Once we'll get to extracting prices in Node.js, we'll figure out how to get the values as numbers.
7373

@@ -100,7 +100,7 @@ At IKEA's [Artificial plants & flowers listing](https://www.ikea.com/se/en/cat/a
100100

101101
On Fandom's [Movies page](https://www.fandom.com/topics/movies), use CSS selectors and HTML element manipulation in the **Console** to extract the name of the top wiki. Use the [`trim()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/trim) method to remove white space around the name.
102102

103-
![Fandom's Movies page](./images/devtools-exercise-fandom.png)
103+
![Fandom's Movies page](../scraping_basics/images/devtools-exercise-fandom.png)
104104

105105
<details>
106106
<summary>Solution</summary>
@@ -119,7 +119,7 @@ On Fandom's [Movies page](https://www.fandom.com/topics/movies), use CSS selecto
119119

120120
On the Guardian's [F1 news page](https://www.theguardian.com/sport/formulaone), use CSS selectors and HTML manipulation in the **Console** to extract details about the first post. Specifically, extract its title, lead paragraph, and URL of the associated photo.
121121

122-
![F1 news page](./images/devtools-exercise-guardian2.png)
122+
![F1 news page](../scraping_basics/images/devtools-exercise-guardian2.png)
123123

124124
<details>
125125
<summary>Solution</summary>

sources/academy/webscraping/scraping_basics_javascript2/05_parsing_html.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ import Exercises from '../scraping_basics/_exercises.mdx';
1414

1515
From lessons about browser DevTools we know that the HTML elements representing individual products have a `class` attribute which, among other values, contains `product-item`.
1616

17-
![Products have the ‘product-item’ class](./images/product-item.png)
17+
![Products have the ‘product-item’ class](../scraping_basics/images/product-item.png)
1818

1919
As a first step, let's try counting how many products are on the listing page.
2020

@@ -50,7 +50,7 @@ Being comfortable around installing Node.js packages is a prerequisite of this c
5050

5151
Now let's import the package and use it for parsing the HTML. The `cheerio` module allows us to work with the HTML elements in a structured way. As a demonstration, we'll first get the `<h1>` element, which represents the main heading of the page.
5252

53-
![Element of the main heading](./images/h1.png)
53+
![Element of the main heading](../scraping_basics/images/h1.png)
5454

5555
We'll update our code to the following:
5656

sources/academy/webscraping/scraping_basics_javascript2/06_locating_elements.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ To get details about each product in a structured way, we'll need a different ap
6464

6565
As in the browser DevTools lessons, we need to change the code so that it locates child elements for each product card.
6666

67-
![Product card's child elements](./images/child-elements.png)
67+
![Product card's child elements](../scraping_basics/images/child-elements.png)
6868

6969
We should be looking for elements which have the `product-item__title` and `price` classes. We already know how that translates to CSS selectors:
7070

sources/academy/webscraping/scraping_basics_javascript2/08_saving_data.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ await writeFile("products.csv", csvData);
178178

179179
The program should now also produce a `data.csv` file. When browsing the directory on macOS, we can see a nice preview of the file's contents, which proves that the file is correct and that other programs can read it. If you're using a different operating system, try opening the file with any spreadsheet program you have.
180180

181-
![CSV preview](images/csv.png)
181+
![CSV preview](../scraping_basics/images/csv.png)
182182

183183
In the CSV format, if a value contains commas, we should enclose it in quotes. If it contains quotes, we should double them. When we open the file in a text editor of our choice, we can see that the library automatically handled this:
184184

@@ -232,6 +232,6 @@ Open the `products.csv` file we created in the lesson using a spreadsheet applic
232232
1. Select the header row. Go to **Data > Create filter**.
233233
1. Use the filter icon that appears next to `minPrice`. Choose **Filter by condition**, select **Greater than**, and enter **500** in the text field. Confirm the dialog. You should see only the filtered data.
234234

235-
![CSV in Google Sheets](images/csv-sheets.png)
235+
![CSV in Google Sheets](../scraping_basics/images/csv-sheets.png)
236236

237237
</details>

sources/academy/webscraping/scraping_basics_javascript2/09_getting_links.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -205,15 +205,15 @@ The program is much easier to read now. With the `parseProduct()` function handy
205205

206206
We turned the whole program upside down, and at the same time, we didn't make any actual changes! This is [refactoring](https://en.wikipedia.org/wiki/Code_refactoring): improving the structure of existing code without changing its behavior.
207207

208-
![Refactoring](images/refactoring.gif)
208+
![Refactoring](../scraping_basics/images/refactoring.gif)
209209

210210
:::
211211

212212
## Extracting links
213213

214214
With everything in place, we can now start working on a scraper that also scrapes the product pages. For that, we'll need the links to those pages. Let's open the browser DevTools and remind ourselves of the structure of a single product item:
215215

216-
![Product card's child elements](./images/child-elements.png)
216+
![Product card's child elements](../scraping_basics/images/child-elements.png)
217217

218218
Several methods exist for transitioning from one page to another, but the most common is a link element, which looks like this:
219219

sources/academy/webscraping/scraping_basics_javascript2/10_crawling.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ await writeFile('products.csv', await exportCSV(data));
8282

8383
Each product URL points to a so-called _product detail page_, or PDP. If we open one of the product URLs in the browser, e.g. the one about [Sony XBR-950G BRAVIA](https://warehouse-theme-metal.myshopify.com/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv), we can see that it contains a vendor name, [SKU](https://en.wikipedia.org/wiki/Stock_keeping_unit), number of reviews, product images, product variants, stock availability, description, and perhaps more.
8484

85-
![Product detail page](./images/pdp.png)
85+
![Product detail page](../scraping_basics/images/pdp.png)
8686

8787
Depending on what's valuable for our use case, we can now use the same techniques as in previous lessons to extract any of the above. As a demonstration, let's scrape the vendor name. In browser DevTools, we can see that the HTML around the vendor name has the following structure:
8888

@@ -197,7 +197,7 @@ Scraping the vendor's name is nice, but the main reason we started checking the
197197

198198
Looking at the [Sony XBR-950G BRAVIA](https://warehouse-theme-metal.myshopify.com/products/sony-xbr-65x950g-65-class-64-5-diag-bravia-4k-hdr-ultra-hd-tv), it's clear that the listing only shows min prices, because some products have variants, each with a different price. And different stock availability. And different SKUs…
199199

200-
![Morpheus revealing the existence of product variants](images/variants.png)
200+
![Morpheus revealing the existence of product variants](../scraping_basics/images/variants.png)
201201

202202
In the next lesson, we'll scrape the product detail pages so that each product variant is represented as a separate item in our dataset.
203203

0 commit comments

Comments
 (0)