Releases: apify/crawlee-python
Releases · apify/crawlee-python
1.0.0
1.0.0 (2025-09-29)
🚀 Features
- Add utility for load and parse Sitemap and
SitemapRequestLoader
(#1169) (66599f8) by @Mantisus - Add periodic status logging and
status_message_callback
parameter for customization (#1265) (b992fb2) by @Mantisus - Add crawlee-cli option to skip project installation (#1294) (4d5aef0) by @Pijukatel
- Improve
Crawlee
CLI help text (#1297) (afbe10f) by @Pijukatel - Add basic
OpenTelemetry
instrumentation (#1255) (a92d8b3) by @Pijukatel - Add
ImpitHttpClient
http-client client using theimpit
library (#1151) (0d0d268) by @Mantisus - Prevent overloading system memory when running locally (#1270) (30de3bd) by @janbuchar
- Expose
PlaywrightPersistentBrowser
class (#1314) (b5fa955) by @Mantisus - Add
impit
option for Crawlee CLI (#1312) (508d7ce) by @Mantisus - Persist RequestList state (#1274) (cc68014) by @janbuchar
- Persist
DefaultRenderingTypePredictor
state (#1340) (fad4c25) by @Mantisus - Persist the
SitemapRequestLoader
state (#1347) (27ef9ad) by @Mantisus - Add support for NDU storages (#1401) (5dbd212) by @vdusek
- Add RQ id, name, alias args to
add_requests
andenqueue_links
methods (#1413) (1cae2bc) by @Mantisus - Add
SqlStorageClient
based onsqlalchemy
v2+ (#1339) (07c75a0) by @Mantisus
🐛 Bug Fixes
- Fix memory estimation not working on MacOS (#1330) (ab020eb) by @Pijukatel
- Fix retry count to not count the original request (#1328) (74fa1d9) by @Pijukatel
- [breaking] Remove unused "stats" field from RequestQueueMetadata (#1331) (0a63bef) by @vdusek
- Ignore unknown parameters passed in cookies (#1336) (50d3ef7) by @Mantisus
- Fix
timeout
forstream
method inImpitHttpClient
(#1352) (54b693b) by @Mantisus - Include reason in the session rotation warning logs (#1363) (d6d7a45) by @vdusek
- Improve crawler statistics logging (#1364) (1eb6da5) by @vdusek
- Do not add a request that is already in progress to
MemoryRequestQueueClient
(#1384) (3af326c) by @Mantisus - Save
RequestQueueState
forFileSystemRequestQueueClient
in default KVS (#1411) (6ee60a0) by @Mantisus - Set default desired concurrency for non-browser crawlers to 10 (#1419) (1cc9401) by @vdusek
Refactor
- [breaking] Introduce new storage client system (#1194) (de1c03f) by @vdusek
- [breaking] Split
BrowserType
literal into two different literals based on context (#1070) (72b5698) by @Pijukatel - [breaking] Change method
HttpResponse.read
from sync to async (#1296) (83fa8a4) by @Mantisus - [breaking] Replace
HttpxHttpClient
withImpitHttpClient
as default HTTP client (#1307) (c803a97) by @Mantisus - [breaking] Change Dataset unwind parameter to accept list of strings (#1357) (862a203) by @vdusek
- [breaking] Remove
Request.id
field (#1366) (32f3580) by @Pijukatel - [breaking] Refactor storage creation and caching, configuration and services (#1386) (04649bd) by @Pijukatel
0.6.12
0.6.12 (2025-07-30)
🚀 Features
🐛 Bug Fixes
- Use
perf_counter_ns
for request duration tracking (#1260) (9e92f6b) by @Pijukatel, closes #1256 - Fix memory estimation not working on MacOS (#1330) (8558954) by @Pijukatel, closes #1329
- Fix retry count to not count the original request (#1328) (1aff3aa) by @Pijukatel, closes #1326
- Ignore unknown parameters passed in cookies (#1336) (0f2610c) by @Mantisus, closes #1333
0.6.11
0.6.11 (2025-06-23)
🚀 Features
🐛 Bug Fixes
- Fix
ClientSnapshot
overload calculation (#1228) (a4fc1b6) by @Pijukatel - Use
PSS
instead ofRSS
to estimate children process memory usage on Linux (#1210) (436032f) by @Pijukatel - Do not raise an error to check 'same-domain' if there is no hostname in the url (#1251) (a6c3aab) by @Mantisus
0.6.10
0.6.10 (2025-06-02)
🐛 Bug Fixes
- Allow config change on
PlaywrightCrawler
(#1186) (f17bf31) by @mylank - Add
payload
toSendRequestFunction
to supportPOST
request (#1202) (e7449f2) by @Mantisus - Fix match check for specified enqueue strategy for requests with redirect (#1199) (d84c30c) by @Mantisus
- Set
WindowsSelectorEventLoopPolicy
only for curl-impersonate template withoutplaywright
(#1209) (f3b839f) by @Mantisus - Add support non-GET requests for
PlaywrightCrawler
(#1208) (dbb9f44) by @Mantisus - Respect
EnqueueLinksKwargs
forextract_links
function (#1213) (c9907d6) by @Mantisus
0.6.9
0.6.9 (2025-05-02)
🚀 Features
- Add an internal
HttpClient
to be used insend_request
forPlaywrightCrawler
usingAPIRequestContext
bound to the browser context (#1134) (e794f49) by @Mantisus - Make timeout error log cleaner (#1170) (78ea9d2) by @Pijukatel
- Add
on_skipped_request
decorator, to process links skipped according torobots.txt
rules (#1166) (bd16f14) by @Mantisus
🐛 Bug Fixes
0.6.8
0.6.8 (2025-04-25)
🚀 Features
- Handle unprocessed requests in
add_requests_batched
(#1159) (7851175) by @Pijukatel - Add
respect_robots_txt_file
option (#1162) (c23f365) by @Mantisus
🐛 Bug Fixes
- Update
UnprocessedRequest
to match actual data (#1155) (a15a1f3) by @Pijukatel - Fix the order in which cookies are saved to the
SessionCookies
and the handler is executed forPlaywrightCrawler
(#1163) (82ff69a) by @Mantisus - Call
failed_request_handler
forSessionError
when session rotation count exceeds maximum (#1147) (b3637b6) by @Mantisus