I can scrape hundreds of case studies in minutes and you can do it too.
In this guide, I'll show you exactly how to scrape case studies efficiently, helping you build a valuable database for sales, marketing, or competitive analysis.
By the end of this tutorial, you'll be able to automatically extract not just the case study links, but also specific information like customer details, industry data, and other key metrics - all organized neatly in a structured format.
This will be a 2 part workflow that breaks down the process into actionable steps:
- In the first part we scrape all the links from the main pages where the customer stories are
- In the second part we will scrape specific information we want to have
Note: This guide is for scraping dozens or hundreds of case studies from one website. If you want to scrape one or two case studies from many company websites, read this instead: How to Scrape Case Studies at Scale with AI.
Part 1 of Scraping All Case Studies From a Website - Getting Case Study Links
Part 1, Step 1 of Scraping All Case Studies From a Website
Go to Datablist.com and sign up.
Create a collection
Click on “See all sources”
Choose the “AI Agent - Site Scraper”
Part 1, Step 2 of Scraping All Case Studies From a Website
In this step we will configure our AI agent to extract all links from the page that stores all case studies.
Start by giving it the link to the page with the case studies.
Now write a prompt to extract the links or use our template below.
Here is my prompt:
I want you to extract all links to the case studies on this page
===Extract only the links that have this structure "https://www.mazak-customers.com/story/story/......"
===
No Introductions
No Explanations
No Thoughts
Only the links that lead to the case study
Make sure to provide the AI with a sample link structure that you want to target, such as www.mazak-customers.com/story/
or www.salesforce.com/customer-stories/
, since sometimes it can get PDF case studies which are not as useful for this use case.
Now check the box to the left of "Enable Pagination" and set a limit for the number of pages the AI agent should be able to visit.
Then configure your outputs as needed, or copy and paste the values below:
- Output Name: Case Study Link
- Output Description: The link found on the page
- Output Type: URL
Now, check the box to the left of "Advanced Settings" and enable "Website Scraper Option: Render HTML".
Once you've done this, click on "Continue" to start scraping.
Once the AI agent has finished scraping the case studies, your collection should look like this.
The results display the case study link in the column we named "Case Study Link" and the source page in the column "Page Scraped".
Now that we have scraped all the case study links from the first page, let's scrape the case study contents from each case study page.
Part 2 of Scraping All Case Studies from a Website — Extracting Information
This part of the workflow is a bit more sophisticated but will save you a lot of time compared to doing it manually — just follow the instructions I am going to give you and you'll be on the safe side!
Here are the steps this workflow consists of:
- Visiting one or two pages to scan and analyze the structure of pages
- Creating tags for each piece of information you want to have
- Writing a prompt to provide the AI with clear instructions and examples
- Configuring the outputs you want to get
- Running the AI agent to scrape the case study content
Part 2, Step 1 of Scraping All Case Studies from a Website
First, you need to visit one or two of the pages that you just scraped, define which pieces of information you want to have, and look for any patterns in the structure of the case studies.
Second, create a tag for each piece of information you want to have, give the AI examples, and tell it where it can find the information since the AI will provide you with much better outputs that way.
Sometimes you can hover over text to see if the link has specifications you can use to better define your output formats. In my case, for example, "VERSATECH" would be a machine series.
💡 Quick Tip
Providing examples will enhance your outputs up to 3x more than without them
Part 2, Step 2 of Scraping All Case Studies From a Website
In this step, we will configure the AI agent to scrape the information from the case study page — let's go!
First, open your collection with the links to the case study pages again.
Since the "Scraped Page" column is not needed for this workflow, we'll hide it and then click on "Enrich".
Now go to “AI” and select the “AI Agent”.
Now copy the prompt template below and modify it according to the information you need from the case study page
Context: I need some of information that are related to the case study on the web page
===What I want you to do: Visit the page I am going to give you and extract requested the data points. I'll tell you more about the information shortly
===
The data points you have to look for (with examples):
[Information Tag 1] e.g., [Example 1, Example 2, Example 3]
[Information Tag 2] e.g., [Example 1, Example 2, Example 3]
[Information Tag 3] e.g., [Example 1, Example 2, Example 3]
You can access the case study with this link: /Your column
Here is this template prompt with example data:
Context: I need some of information that are related to the case study on the web page
===What I want you to do: Visit the page I am going to give you and extract requested the data points. I'll tell you more about the information shortly.
===
The data points you have to look for (with examples):
Machine Information:
- Machine Series e.g., VERSATECH, Dual Turn, CV5-500
- Machine Name e.g., VERSATECH V-140N/280, OPTIPLEX 4020 DDL, INTEGREX j-200
Customer’s Information:
- Customer's Industry e.g., Manufacturing, Aerospace, Construction
- Customer's Location e.g., Germany, France, Baltics
- Customer’s Name e.g.,
You can access the case study with this link: /Case Study Link
💡 Quick Fact About the AI Agent
The AI agent is incredibly good in following instructions but if you don’t provide it with clear examples the AI agent wont provide you with good results.
After configuring your prompt using our template you have to configure the outputs, here’s how:
For each piece of information you want to extract:
- Use the information tag name as your "Output Name"
- Add a clear description in the "Output Description" field or include examples
- Choose the appropriate "Output Type" for the data you want to have
- Click "More" to add additional outputs and do the same there
After you've configured all your outputs, click on "Continue to outputs configuration"
Now click on all the plus (+) icons to add a new column for each output, and click on "Instant Run"
These are the results of the scraped case studies
Frequently Asked Questions About Scraping Case Studies
How Do I Scrape Case Studies From a Website Legally?
Website scraping is legal when you scrape publicly available data and respect copyright restrictions.
What Tools Do I Need to Scrape Case Studies From Websites?
You can use web scraping tools like Datablist for no-code solutions.
How Long Does It Take to Scrape Case Studies From a Website?
With tools like Datablist, you can scrape hundreds of case studies within minutes to hours. The setup time for automation is typically 15-30 minutes once you understand the website's structure.
Can I Scrape Case Studies From Any Website?
Not all websites allow scraping. Some websites use anti-scraping measures or explicitly forbid it in their terms of service.
What Information Can I Extract From Case Studies?
You can extract various data points including company names, industries, challenges, solutions, results, testimonials, dates, and metrics. The key is identifying consistent patterns in how the case studies are structured on the website to ensure accurate data extraction.