Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 31 additions & 3 deletions nginx.conf
Original file line number Diff line number Diff line change
Expand Up @@ -306,11 +306,39 @@ server {

# Rename output schema to dataset schema
rewrite ^/platform/actors/development/actor-definition/output-schema$ /platform/actors/development/actor-definition/dataset-schema permanent;
rewrite ^academy/deploying-your-code/output-schema$ /academy/deploying-your-code/dataset-schema permanent;
rewrite ^/academy/deploying-your-code/output-schema$ /academy/deploying-your-code/dataset-schema permanent;

# Academy restructuring
rewrite ^academy/advanced-web-scraping/scraping-paginated-sites$ /academy/advanced-web-scraping/crawling/crawling-with-search permanent;
rewrite ^academy/php$ /academy/php/use-apify-from-php redirect; # not permanent in case we want to reuse /php in the future
rewrite ^/academy/advanced-web-scraping/scraping-paginated-sites$ /academy/advanced-web-scraping/crawling/crawling-with-search permanent;
rewrite ^/academy/php$ /academy/php/use-apify-from-php redirect; # not permanent in case we want to reuse /php in the future

# Academy: replacing the 'Web Scraping for Beginners' course
rewrite ^/academy/web-scraping-for-beginners/best-practices$ /academy/scraping-basics-javascript?legacy-js-course=/legacy/best-practices permanent;
rewrite ^/academy/web-scraping-for-beginners/introduction$ /academy/scraping-basics-javascript?legacy-js-course=/legacy/introduction permanent;
rewrite ^/academy/web-scraping-for-beginners/challenge/initializing-and-setting-up$ /academy/scraping-basics-javascript?legacy-js-course=/legacy/challenge/initializing-and-setting-up permanent;
rewrite ^/academy/web-scraping-for-beginners/challenge/modularity$ /academy/scraping-basics-javascript?legacy-js-course=/legacy/challenge/modularity permanent;
rewrite ^/academy/web-scraping-for-beginners/challenge/scraping-amazon$ /academy/scraping-basics-javascript?legacy-js-course=/legacy/challenge/scraping-amazon permanent;
rewrite ^/academy/web-scraping-for-beginners/challenge$ /academy/scraping-basics-javascript?legacy-js-course=/legacy/challenge permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/exporting-data$ /academy/scraping-basics-javascript/framework?legacy-js-course=/legacy/crawling/exporting-data permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/filtering-links$ /academy/scraping-basics-javascript/getting-links?legacy-js-course=/legacy/crawling/filtering-links permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/finding-links$ /academy/scraping-basics-javascript/getting-links?legacy-js-course=/legacy/crawling/finding-links permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/first-crawl$ /academy/scraping-basics-javascript/crawling?legacy-js-course=/legacy/crawling/first-crawl permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/headless-browser$ /academy/scraping-basics-javascript?legacy-js-course=/legacy/crawling/headless-browser permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/pro-scraping$ /academy/scraping-basics-javascript/framework?legacy-js-course=/legacy/crawling/pro-scraping permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/recap-extraction-basics$ /academy/scraping-basics-javascript/extracting-data?legacy-js-course=/legacy/crawling/recap-extraction-basics permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/relative-urls$ /academy/scraping-basics-javascript/getting-links?legacy-js-course=/legacy/crawling/relative-urls permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling/scraping-the-data$ /academy/scraping-basics-javascript/scraping-variants?legacy-js-course=/legacy/crawling/scraping-the-data permanent;
rewrite ^/academy/web-scraping-for-beginners/crawling$ /academy/scraping-basics-javascript/crawling?legacy-js-course=/legacy/crawling permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/browser-devtools$ /academy/scraping-basics-javascript/devtools-inspecting?legacy-js-course=/legacy/data-extraction/browser-devtools permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/computer-preparation$ /academy/scraping-basics-javascript/downloading-html?legacy-js-course=/legacy/data-extraction/computer-preparation permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/devtools-continued$ /academy/scraping-basics-javascript/devtools-extracting-data?legacy-js-course=/legacy/data-extraction/devtools-continued permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/node-continued$ /academy/scraping-basics-javascript/extracting-data?legacy-js-course=/legacy/data-extraction/node-continued permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/node-js-scraper$ /academy/scraping-basics-javascript/downloading-html?legacy-js-course=/legacy/data-extraction/node-js-scraper permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/project-setup$ /academy/scraping-basics-javascript/downloading-html?legacy-js-course=/legacy/data-extraction/project-setup permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/save-to-csv$ /academy/scraping-basics-javascript/saving-data?legacy-js-course=/legacy/data-extraction/save-to-csv permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction/using-devtools$ /academy/scraping-basics-javascript/devtools-locating-elements?legacy-js-course=/legacy/data-extraction/using-devtools permanent;
rewrite ^/academy/web-scraping-for-beginners/data-extraction$ /academy/scraping-basics-javascript/devtools-inspecting?legacy-js-course=/legacy/data-extraction permanent;
rewrite ^/academy/web-scraping-for-beginners$ /academy/scraping-basics-javascript?legacy-js-course=/legacy permanent;

# Removed pages
# GPT plugins were discontinued April 9th, 2024 - https://help.openai.com/en/articles/8988022-winding-down-the-chatgpt-plugins-beta
Expand Down
2 changes: 1 addition & 1 deletion sources/academy/homepage_content.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"Beginner courses": [
{
"title": "Web scraping basics for JavaScript devs",
"link": "/academy/web-scraping-for-beginners",
"link": "/academy/scraping-basics-javascript",
"description": "Learn how to use JavaScript to extract information from websites in this practical course, starting from the absolute basics.",
"imageUrl": "/img/academy/intro.svg"
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Prior to moving forward, please read over these resources:

## Our task {#our-task}

In this task, we'll be building on top of what we already created in the [Web scraping basics for JavaScript devs](/academy/web-scraping-for-beginners/challenge) course's final challenge, so keep those files safe!
In this task, we'll be building on top of what we already created in the [Web scraping basics for JavaScript devs](/academy/scraping-basics-javascript/legacy/challenge) course's final challenge, so keep those files safe!

Once our Amazon Actor has completed its run, we will, rather than sending an email to ourselves, call an Actor through a webhook. The Actor called will be a new Actor that we will create together, which will take the dataset ID as input, then subsequently filter through all of the results and return only the cheapest one for each product. All of the results of the Actor will be pushed to its default dataset.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ sidebar_position: 1
slug: /advanced-web-scraping/crawling/sitemaps-vs-search
---

The core crawling problem comes to down to ensuring that we reliably find all detail pages on the target website or inside its categories. This is trivial for small sites. We just open the home page or category pages and paginate to the end as we did in the [Web scraping basics for JavaScript devs](/academy/web-scraping-for-beginners) course.
The core crawling problem comes to down to ensuring that we reliably find all detail pages on the target website or inside its categories. This is trivial for small sites. We just open the home page or category pages and paginate to the end.

Unfortunately, _most modern websites restrict pagination_ only to somewhere between 1 and 10,000 products. Solving this problem might seem relatively straightforward at first but there are multiple hurdles that we will explore in this lesson.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ category: web scraping & automation
slug: /advanced-web-scraping
---

In the [Web scraping basics for JavaScript devs](/academy/web-scraping-for-beginners) course, we have learned the necessary basics required to create a scraper. In the following courses, we learned more about specific practices and techniques that will help us to solve most of the problems we will face.
In the [Web scraping basics for JavaScript devs](/academy/scraping-basics-javascript) course, we have learned the necessary basics required to create a scraper. In the following courses, we learned more about specific practices and techniques that will help us to solve most of the problems we will face.

In this course, we will take all of that knowledge, add a few more advanced concepts, and apply them to learn how to build a production-ready web scraper.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Inspecting web pages with browser DevTools
sidebar_label: "DevTools: Inspecting"
description: Lesson about using the browser tools for developers to inspect and manipulate the structure of a website.
slug: /scraping-basics-javascript2/devtools-inspecting
slug: /scraping-basics-javascript/devtools-inspecting
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Locating HTML elements on a web page with browser DevTools
sidebar_label: "DevTools: Locating HTML elements"
description: Lesson about using the browser tools for developers to manually find products on an e-commerce website.
slug: /scraping-basics-javascript2/devtools-locating-elements
slug: /scraping-basics-javascript/devtools-locating-elements
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Extracting data from a web page with browser DevTools
sidebar_label: "DevTools: Extracting data"
description: Lesson about using the browser tools for developers to manually extract product data from an e-commerce website.
slug: /scraping-basics-javascript2/devtools-extracting-data
slug: /scraping-basics-javascript/devtools-extracting-data
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Downloading HTML with Node.js
sidebar_label: Downloading HTML
description: Lesson about building a Node.js application for watching prices. Using the Fetch API to download HTML code of a product listing page.
slug: /scraping-basics-javascript2/downloading-html
slug: /scraping-basics-javascript/downloading-html
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Parsing HTML with Node.js
sidebar_label: Parsing HTML
description: Lesson about building a Node.js application for watching prices. Using the Cheerio library to parse HTML code of a product listing page.
slug: /scraping-basics-javascript2/parsing-html
slug: /scraping-basics-javascript/parsing-html
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Locating HTML elements with Node.js
sidebar_label: Locating HTML elements
description: Lesson about building a Node.js application for watching prices. Using the Cheerio library to locate products on the product listing page.
slug: /scraping-basics-javascript2/locating-elements
slug: /scraping-basics-javascript/locating-elements
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Extracting data from HTML with Node.js
sidebar_label: Extracting data from HTML
description: Lesson about building a Node.js application for watching prices. Using string manipulation to extract and clean data scraped from the product listing page.
slug: /scraping-basics-javascript2/extracting-data
slug: /scraping-basics-javascript/extracting-data
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Saving data with Node.js
sidebar_label: Saving data
description: Lesson about building a Node.js application for watching prices. Using the json2csv library to save data scraped from product listing pages in both JSON and CSV.
slug: /scraping-basics-javascript2/saving-data
slug: /scraping-basics-javascript/saving-data
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Getting links from HTML with Node.js
sidebar_label: Getting links from HTML
description: Lesson about building a Node.js application for watching prices. Using the Cheerio library to locate links to individual product pages.
slug: /scraping-basics-javascript2/getting-links
slug: /scraping-basics-javascript/getting-links
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Crawling websites with Node.js
sidebar_label: Crawling websites
description: Lesson about building a Node.js application for watching prices. Using the Fetch API to follow links to individual product pages.
slug: /scraping-basics-javascript2/crawling
slug: /scraping-basics-javascript/crawling
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Scraping product variants with Node.js
sidebar_label: Scraping product variants
description: Lesson about building a Node.js application for watching prices. Using browser DevTools to figure out how to extract product variants and exporting them as separate items.
slug: /scraping-basics-javascript2/scraping-variants
slug: /scraping-basics-javascript/scraping-variants
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Using a scraping framework with Node.js
sidebar_label: Using a framework
description: Lesson about building a Node.js application for watching prices. Using the Crawlee framework to simplify creating a scraper.
slug: /scraping-basics-javascript2/framework
slug: /scraping-basics-javascript/framework
unlisted: true
---

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Using a scraping platform with Node.js
sidebar_label: Using a platform
description: Lesson about building a Node.js application for watching prices. Using the Apify platform to deploy a scraper.
slug: /scraping-basics-javascript2/platform
slug: /scraping-basics-javascript/platform
unlisted: true
---

Expand Down
Loading
Loading