Skip to content

gocrawl

2,053 2 6 BSD-3-Clause
(19 May 2021) Nov 20 2016 58.1 thousand (month)

Gocrawl is a polite, slim and concurrent web crawler library written in Go. It is designed to be simple and easy to use, while still providing a high degree of flexibility and control over the crawling process.

One of the key features of Gocrawl is its politeness, which means that it obeys the website's robots.txt file and respects the crawl-delay specified in the file. It also takes into account the website's last modified date, if any, to avoid recrawling the same page. This helps to reduce the load on the website and prevent any potential legal issues. Gocrawl is also highly concurrent, which allows it to efficiently crawl large numbers of pages in parallel. This helps to speed up the crawling process and reduce the time required to complete the task.

The library also offers a high degree of flexibility in customizing the crawling process. It allows you to specify custom callbacks and handlers for handling different types of pages, such as error pages, redirects, and so on. This allows you to handle and process the pages as per your requirement. Additionally, Gocrawl provides various functionalities such as support for cookies, user-agent, auto-detection of links, and auto-detection of sitemaps.

Example Use


``go // Only enqueue the root and paths beginning with an "a" var rxOk = regexp.MustCompile(http://duckduckgo.com(/a.*)?$`)

// Create the Extender implementation, based on the gocrawl-provided DefaultExtender, // because we don't want/need to override all methods. type ExampleExtender struct { gocrawl.DefaultExtender // Will use the default implementation of all but Visit and Filter }

// Override Visit for our need. func (x ExampleExtender) Visit(ctx gocrawl.URLContext, res http.Response, doc goquery.Document) (interface{}, bool) { // Use the goquery document or res.Body to manipulate the data // ...

// Return nil and true - let gocrawl find the links
return nil, true

}

// Override Filter for our need. func (x ExampleExtender) Filter(ctx gocrawl.URLContext, isVisited bool) bool { return !isVisited && rxOk.MatchString(ctx.NormalizedURL().String()) }

func ExampleCrawl() { // Set custom options opts := gocrawl.NewOptions(new(ExampleExtender))

// should always set your robot name so that it looks for the most
// specific rules possible in robots.txt.
opts.RobotUserAgent = "Example"
// and reflect that in the user-agent string used to make requests,
// ideally with a link so site owners can contact you if there's an issue
opts.UserAgent = "Mozilla/5.0 (compatible; Example/1.0; +http://example.com)"

opts.CrawlDelay = 1 * time.Second
opts.LogFlags = gocrawl.LogAll

// Play nice with ddgo when running the test!
opts.MaxVisits = 2

// Create crawler and start at root of duckduckgo
c := gocrawl.NewCrawlerWithOptions(opts)
c.Run("https://duckduckgo.com/")

// Remove "x" before Output: to activate the example (will run on go test)

// xOutput: voluntarily fail to see log output

} ```

Alternatives / Similar


25,231 v2.2.0 (2025-03-27 10:47:28 ago) May 14 2018 compare
katana new
16,499 v1.5.0 (2026-03-10 14:52:47 ago) Nov 07 2022 compare
7,594 v1.4.0 (2026-03-03 03:58:32 ago) Feb 15 2020 compare
2,772 2026-04-11 (2026-04-11 21:30:25 ago) Jun 06 2019 compare
711 2026-03-21 (2026-03-21 09:11:03 ago) Feb 09 2017 compare
5,964 v2.0.0-alpha.7 (2026-04-07 15:33:51 ago) Oct 28 2020 compare

Other Languages

61,276 2.15.0 (2026-04-09 12:02:09 ago) Jul 26 2019 compare
crawl4ai new
63,373 0.8.6 (2026-03-24 15:07:50 ago) May 01 2024 compare
1,517 1.0.5 (2024-02-12 21:10:00 ago) Nov 22 2014 compare
scrapling new
36,206 0.4.5 (2026-04-07 04:22:27 ago) Aug 01 2024 compare
crawlee new
22,720 3.16.0 (2026-04-09 07:36:53 ago) Apr 22 2022 compare
mechanize new
4,440 2.14.0 (2025-01-05 18:30:46 ago) Jul 25 2009 compare
23,278 1.76.0 (2026-04-09 09:41:03 ago) Jan 15 2024 compare
3,087 1.6.0 (2025-07-22 06:00:53 ago) Sep 04 2013 compare
4,321 4.0.97 (2026-01-06 07:45:54 ago) Oct 01 2023 compare
6,790 2.0.2 (2025-05-28 09:36:01 ago) Sep 10 2012 compare
3,062 v2.4.0 (2026-01-08 05:29:21 ago) Jul 17 2018 compare
goutte new
9,215 v4.0.3 (2023-04-01 09:05:33 ago) Dec 02 2012 compare
248 1.34.0 (2024-11-27 14:57:34 ago) Feb 05 2023 compare
835 0.7.2 (2025-02-03 07:58:27 ago) Jul 25 2009 compare
kimurai new
1,098 2.2.0 (2026-01-27 17:36:19 ago) Aug 23 2018 compare
3,400 1.6.0 (2025-02-16 13:18:50 ago) Sep 30 2018 compare
12,807 1.1.9 (2018-10-21 03:39:17 ago) Aug 24 2018 compare
1,360 3.3.0 (2026-04-07 16:31:34 ago) Dec 27 2011 compare
7,136 1.1.14 (2022-07-17 17:20:09 ago) Jul 26 2019 compare
1,454 v3.2.1 (2025-03-21 06:53:36 ago) Dec 27 2021 compare
3,495 0.9.13 (2023-07-19 18:53:46 ago) Jul 04 2017 compare
1,743 0.8.5 (2022-09-06 08:54:56 ago) Oct 17 2018 compare
165 2.3.0 (2021-03-18 00:10:00 ago) Dec 22 2019 compare
217 1.0.0-beta8.4 (2023-06-29 12:37:12 ago) Apr 18 2019 compare
583 3.0.0 (2024-04-09 15:34:59 ago) May 04 2020 compare
425 0.1.3 (2023-08-01 20:28:33 ago) Feb 20 2022 compare
1,341 v0.7.6 (2025-12-04 15:08:06 ago) Mar 16 2013 compare
369 v3.5.6 (2026-01-05 11:13:18 ago) Apr 18 2022 compare
firecrawl new
- 0.0.0 (2025-03-15 00:00:00 ago) Apr 01 2024 compare
Was this page helpful?