Is Web Scraping Legal in the United States?
Web scraping, like any other technology, can be used for legal or illegal purposes. In the United States, there is no federal law that prohibits web scraping. US is in a rather unique position where public data web scraping is explicitly legal through a recent precedent set by the Ninth Circuit Court of Appeals in the HiQ v. LinkedIn case
However, the general web scraping rules apply to US as well:
- The scraped data is publically available OR belongs to the user.
Data behind login is not publically available thus for legal scraping has to belong to the user. An example of such a case would be, scraping your order history from Amazon.
- The data is not protected by copyright.
Copyright applies to all forms of creative work, including text, images, music, and software. If the data is protected by copyright, you need to get permission from the copyright holder before scraping it.
- The scraping process does not cause damage to the website being scraped.
All connections impose some costs for the server. If the plaintiff can prove that the scraping process caused damage (e.g. overload, significant bandwidth fees etc.) to the website, the plaintiff can sue the scraper for damages.
Another thing to note is that browsewrap is not legally binding in the US so Terms of Service (ToS) is not legally binding unless agreed explicitly (referred to as clickwrap). This means that scraping a website that has a ToS or PP that prohibits scraping is not illegal unless the scraper agrees to it explicitly by logging in or clicking a button.
Note that the website still reserves the right to block any connections for any reason so scrapers can still be blocked legally.
While scraping public data is safe and legal here are some popular laws that can be used to prosecute web scraping:
The Computer Fraud and Abuse Act (CFAA)
This law makes it illegal to access a computer without authorization or in excess of authorization. This law can be used to prosecute web scraping if the scraping is done in a way that is unauthorized. For example, if a scraper is accessing a website using a forged IP address or a stolen account, they could be in violation of the CFAA.
The Electronic Communications Privacy Act (ECPA)
This law makes it illegal to intercept electronic communications without authorization. This law can be used to prosecute web scraping if the scraping is done by intercepting communications, such as by using a packet sniffer to capture data sent over a network.
The Digital Millennium Copyright Act (DMCA)
This law makes it illegal to circumvent technological measures that protect copyrighted works. This law can be used to prosecute web scraping if the scraping is done by bypassing a website's security measures, such as by using a scraper that bypasses a website's CAPTCHA.
The California Consumer Privacy Act (CCPA)
This law regulates the collection, use, and sharing of personal information of California residents. The CCPA applies to businesses that collect, use, or share personal information of California residents and that meet certain criteria, such as having more than $25 million in annual revenue, or that buy, sell, or share the personal information of 50,000 or more consumers, households, or devices.
In web scraping, this can mean that personal information data fields cannot be collected at this scale without a legitimate use case like research. In other words, scraping personal data for spam is illegal but legal for research or security validation etc.
While legal web scraping cases are relatively rare there are several notable legal cases reaching as far as the Nith Circut court:
HiQ v. LinkedIn (2019)
In this case, a data analytics company called HiQ used web scraping to collect publicly available data from LinkedIn's website. LinkedIn sent HiQ a cease-and-desist letter, claiming that the scraping violated the Computer Fraud and Abuse Act (CFAA) and the Digital Millennium Copyright Act (DMCA). HiQ filed a lawsuit against LinkedIn, arguing that the scraping was protected by the First Amendment. In 2019, the U.S. Court of Appeals for the Ninth Circuit ruled in favor of HiQ, finding that the scraping was protected by the First Amendment, and that LinkedIn could not use the CFAA or the DMCA to block it. 1
This particular case has set the strongest precedent for web scraping in the United States. The court found that web scraping is protected by the First Amendment, and that the CFAA and the DMCA cannot be used to block it.
Ticketmaster v. RMG (2016)
In this case, a ticket broker called RMG Technologies used web scraping to collect data from Ticketmaster's website, including ticket prices and availability, and then used the data to create a competing service. Ticketmaster filed a lawsuit against RMG, claiming that the scraping violated the CFAA and the DMCA, as well as copyright laws. In 2016, the U.S. District Court for the Central District of California ruled in favor of Ticketmaster, finding that RMG had violated the CFAA, the DMCA, and copyright laws by accessing Ticketmaster's website without authorization and by circumventing Ticketmaster's technical measures 2
Craigslist v. 3taps (2013)
In this case, a data provider called 3taps used web scraping to collect data from Craigslist's website, including housing listings, and then sold the data to other companies. Craigslist filed a lawsuit against 3taps, claiming that the scraping violated the CFAA and the DMCA. 3
In 2013, the U.S. District Court for the Northern District of California ruled in favor of Craigslist, finding that 3taps had violated the CFAA and the DMCA by accessing Craigslist's website without authorization and by circumventing Craigslist's technical measures to block scraping. More info on eff.org
Facebook v. Power Ventures (Power.com) (2010)
In this case, a social media aggregator called Power.com used web scraping to collect data from Facebook's website, including users' profiles and friend lists, and then used the data to create a competing service. Facebook filed a lawsuit against Power.com, claiming that the scraping violated the CFAA, the DMCA, and copyright laws. In 2010, the U.S. District Court for the Northern District of California ruled in favor of Facebook, finding that Power.com had violated the CFAA and the DMCA by accessing Facebook's website without authorization and by circumventing Facebook's technical measures to block scraping. 4
The final ruling in 2017 found that Facebook was only entitled to the reduced sum of $79,640.50 in compensatory damages and a permanent injunction. The Court also ordered Defendants to pay the $39,796.73 discovery sanction.
Very complex and long case, see the full wiki page: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn ↩
This one is quite big, see the whole wiki page for more info: https://en.wikipedia.org/wiki/Facebook,_Inc._v._Power_Ventures,_Inc. ↩