Data scraping, also known as web scraping, is the process of importing information from a website into a structured format like a spreadsheet or a local file on your computer. It's one of the most efficient ways to get data from the web and, in some cases, to channel that data to another website. Popular uses of data scraping include:
And that list is just scratching the surface. Data scraping has a vast number of applications - it's useful in just about any case where data needs to be moved from one place to another.
A simple yet powerful way to begin data scraping is by using Microsoft Excel's built-in Power Query feature. This modern method allows you to establish a direct data feed from a website into a spreadsheet, replacing older, less flexible techniques. You can configure the query to refresh automatically, ensuring your spreadsheet always contains the latest information from the source page.
Here's how to set up a web query using Power Query in Excel 2025:
The great thing about Power Query is that it creates a dynamic connection. To configure how regularly your data updates, right-click on your data table, go to Query Properties, and on the Usage tab, you can set it to "Refresh every X minutes" or "Refresh data when opening the file". Note that for unattended, scheduled refreshes when the workbook is closed, you'll need a server-based solution like Power BI.
For more regular or complex tasks, dedicated data scraping tools offer greater efficiency and power than manual methods. Here is an updated 2025 overview of popular tools, including their features and pricing structures.
Data Scraper (Data Miner)
This tool slots directly into your Chrome or Edge browser as an extension, allowing you to extract data from any web page. It works by using "recipes" - pre-built extraction rules. It has a large library of public recipes for popular sites, and you can create your own. While there are no public catalogues of recipes for sites like X (formerly Twitter) or Wikipedia, the extension will automatically show you relevant public recipes when you visit a matching site.
Data Miner operates on a tiered pricing model:
WebHarvy
WebHarvy is a visual, point-and-click web scraping application for Windows. It features a built-in browser where you can simply click on the data elements you want to extract, with no coding required. It can handle pagination, infinite scroll, and data behind logins.
Its biggest selling point is its licensing model: a one-time payment instead of a recurring subscription. A single-user license costs around $139 and includes one year of free updates and support. After the year, the software continues to work, but you'll need to purchase an upgrade for newer versions.
Import.io
Import.io is an enterprise-grade, AI-native web data platform. It's designed for heavy-duty scraping, offering features like "self-healing pipelines" that automatically adapt when a website's layout changes, authenticated extraction for data behind logins, and PII masking to redact sensitive information.
Pricing is primarily customized, and you'll need to contact their sales team for a quote. However, third-party sites report tiered plans starting around $399/month for 5,000 queries, scaling up based on volume and features. A 14-day free trial is available.
Artificial intelligence is no longer a future concept in data scraping; it's a present-day reality transforming the industry. Market forecasts project a compound annual growth rate for AI-driven scraping tools as high as 39.4%.
AI-powered tools like Octoparse now offer features such as one-click "Auto-detect," which scans a page and automatically generates an extraction template. It also integrates with large language models like ChatGPT to perform advanced tasks, such as running sentiment analysis on scraped product reviews or comments.
Furthermore, advanced visual search technologies are opening up entirely new frontiers. Google's Multisearch allows you to combine an image with text in a single query—for example, taking a photo of a sofa and adding the text "in green" to find similar products. This is powered by sophisticated AI that can interpret both images and language.
For developers, the Google Cloud Vision API provides direct access to this power. It can detect labels, read text from images (OCR), identify logos and landmarks, and much more. The API offers a free tier of 1,000 requests per feature per month, with tiered pricing for higher volumes, making it accessible for projects of all sizes.
Data scraping is a cornerstone of modern e-commerce marketing, particularly for managing product feeds for platforms like Google Shopping. Here's an advanced 2025 guide to creating and optimising a product feed.
Services like FeedOptimise can crawl your e-commerce site to extract product information, structure it, and enrich it using AI to improve titles and descriptions. This creates a single, optimised feed from potentially scattered data sources.
Once you have your data, you need to get it into Google Merchant Center.
1. Onboard to Google Merchant Center First, you need a properly configured account.
2. Meet Google's Product Data Specification Your feed must be formatted correctly to be accepted.
.txt
or .xml
file, up to 4GB in size.id
, title
, description
, link
, image_link
, availability
, and price
.brand
, gtin
, and mpn
wherever possible. Omitting these for products that have them is a common reason for disapproval.3. Optimise Your Feed for Performance A basic feed gets you listed; an optimised feed gets you sales.
price
and availability
data perfectly in sync with your website to avoid disapprovals and a poor user experience.4. Automate Bidding on Top Products Once your feed is live, you can integrate it with Google Ads to automatically bid more on your best-selling products.
custom_label_0 = 'bestseller'
to your top-performing products in the feed.custom_label_0
equals 'bestseller'. Then, apply an automated bidding strategy like Target ROAS to this group, telling Google to bid more aggressively to maximise your return on these proven winners.Once set up, this system automatically focuses your ad spend on the products most likely to drive revenue, creating a powerful, self-optimising marketing engine.
Navigating the legality of data scraping is more complex than ever and requires a nuanced understanding of the current landscape. The old view of scraping as just a "dark art" for email harvesting is outdated.
In the United States, the legal ground has shifted significantly. The Supreme Court's ruling in Van Buren v. United States narrowed the scope of the Computer Fraud and Abuse Act (CFAA), establishing that simply using data for an improper purpose isn't a crime if you were authorized to access it. Following this, the landmark hiQ v. LinkedIn case established that scraping data from publicly accessible web pages is unlikely to violate the CFAA. However, this doesn't mean it's a free-for-all. You can still be held liable under other laws, such as breach of contract (violating a site's terms of service) or trade secret misappropriation, especially if you circumvent technical barriers like IP blocks.
In the European Union, the Digital Services Act (DSA), fully applicable since February 2024, is changing the game. For Very Large Online Platforms (VLOPs), Article 40 of the DSA mandates that they provide vetted researchers with access to public data for studying systemic risks. This creates a formal, legal channel for data access, shifting the paradigm away from unauthorised scraping. Of course, any scraping that involves personal data must still comply with the GDPR.
Finally, individual platforms are enforcing their own rules. X (formerly Twitter), for example, has a strict developer policy that forbids circumventing rate limits. In 2025, it explicitly banned using its data to train third-party AI models and has made its API more expensive and restrictive, limiting what can be done on its free tier.
Data scraping has evolved from a simple import function into a sophisticated field powered by AI and governed by a complex legal framework. The key to success in 2025 is to leverage powerful new tools and techniques while remaining acutely aware of ethical considerations and legal boundaries. By respecting terms of service, understanding the law, and using official channels where available, marketers and businesses can unlock immense value from web data, driving smarter decisions and gaining a competitive edge.
Although there aren't as many reports in the new GA4 as in Universal Analytics, the power of GA4 is in its customisation and being able to understand key terminology is key to being able to understand your GA4 data.
In this episode of The Digital Marketing Podcast, Daniel Rowles is joined by Matthew Gardiner from World Travel Market for a special look at how the user journey is being radically reshaped, and why the travel industry offers a powerful lens for understanding the broader changes all marketers now face.
We sit down with Alex Schultz, Meta’s Chief Marketing Officer to unpack the future of digital marketing, and why the fundamentals still matter.