Changelog

Latest developments

New features, improvements and fixes to Datablist.

January 2025

Hi guys, we are starting this year with two highlights. Both are unmatched and will bring you a lot of value; this is a promise.

  1. Job listings scraper - Scrapes job listings from 19 different boards simultaneously
  2. AI Editing - Allows you to do complex data manipulation tasks, which otherwise would require code, with a simple prompt.

Now, to the details!

The Highlight of January

Job Postings Search – Scrape 19 Job Boards at Once

After having already released the Indeed jobs scraper and the LinkedIn jobs scraper, we saw how high the demand is for job market data, so we implemented this scraper, which delivers you data from the biggest job boards on the planet, including:

1. Indeed

2. LinkedIn

3. Glassdoor

4. Naukri.com

5. AngelList

6. InfoJobs

7. Tecnoempleo

8. Startup Jobs

9. SimplyHired

...and 10 others

Here’s what makes this scraper different from all the other job scrapers on the market:

  • Scrapes fresh and up-to-date job postings
  • Allows to search for keywords or phrases in job descriptions (awesome, isn’t it?)
  • Global coverage of the labor market (195 Countries)
  • Includes company information
  • Includes hiring manager information

How It Works

You choose the starting point from the two following options:

  • Scrape job listings of companies you've already added to your Datablist collection
  • Start search from scratch using job titles, keywords, industries, funding stage, and 10 more filters

Datablist will then scrape the job listings that match your search and give you the results we find across those 19 job boards.

How To Use It
  • Create a collection and select “Job Offers Search.”
  • Map a collection to the search or start from scratch
  • Use the 14 different filters to narrow down your search
  • Click on “Continue to outputs configuration.”
  • Click on the ⊕ icons to add a new column for each output
  • Click on “Import now” to start scraping

New Feature: AI Editing

Florian is really excited about this one (and so am I)

Here’s why you should be excited too!

Datablist has a hidden strength, which is using JavaScript to manipulate data in any way you want it to be. Until now, this has been kept only for those who know how to write JavaScript scripts.

AI Editing brings the same power to non-technical folks using plain English instead of code.

How It Works

The functionality of this feature is pretty simple, as we want it to be.

Imagine having Claude, Gemini, or ChatGPT sitting in your spreadsheet, where you could just tell it what to do, because it’s exactly like this. Here’s how you could collaborate with our AI:

  • You imagine anything you want to do with your data
  • You write a prompt explaining it to the AI
  • You wait 10-20 seconds while the AI is writing the script
  • You look at the preview, confirm, or write a follow-up prompt to improve the outcome

Regardless of whether it's a certain structure, edit, system, or specific format you want to have, just tell your AI assistant about it, and it will do it for you.

What does that mean for you?

  • You can use plain English to edit your data.
  • You can build scoring systems with a single command.
  • You can clean and format your data with simple prompts.
  • … and so much more.

How To Use It
  • Click on "Edit" in the Datablist top menu
  • Select "AI Editing"
  • Type your prompt
  • Click on "Generate"
  • Review the changes and click on "Run Script" to apply them, or click "Improve Prompt" to improve your prompt

Use Cases
  • Format phone numbers
  • Clean company names
  • Build an account scoring system
  • Capitalize words

Related Resources

Watch me build an AI scoring system with a simple prompt

That's it, folks, see y’all next time!

P.S. If you want us to build something for you, PITCH ME HERE 👈🏽

December 2024

Hi folks, we are wrapping the year up with a huge UI update + two new sources.

Let’s get into it!

The Highlight of December

Preview Mode for Merging Duplicate Groups

This improvement in our preview mode is a big step towards making data handling and automation easy and accessible for everyone; it doesn't just add simplicity but makes it visual.

How It Was Before

Before we implemented this, the preview mode was not only harder to understand but also visually confusing when trying to understand how the data would be merged.

The preview mode wasn't intuitive and made it difficult for users to confidently make merging decisions. Additionally, there was no clear indication of which record would be the master record in the merging process.

The Changes We Made
  • We added deeper and clearer descriptions and labels to every part
  • We grouped the settings of the conflicting properties and separated them visually
  • We separated the master item selection configuration visually
  • We added labels to the merging preview of each duplicate group
  • We added colors that highlight the master and secondary items
  • We added an action to remove an item from the duplicate group

How This is Going To Make Your Life Easier
  • Clear Visual Understanding: Thanks to our new color-coding and improved layout, you'll instantly see which records are being merged and how they fit together
  • Reduced Error Risk: We've added better labels and grouping to make sure you don't accidentally merge the wrong records or pick the wrong master record
  • Increased Confidence: With our detailed preview, you'll feel much more confident about merging decisions
  • Time Savings: Our new intuitive interface means you'll spend less time reviewing and confirming merge operations
  • Greater Control: You can now remove items from duplicate groups whenever you want, giving you more flexibility with your data

All these improvements make it way easier to keep your data clean and accurate, with much less effort on your part.

New Features and Improvements

New Features

Well, no new features, but 2 new sources entered Datablist this month.

Let’s begin with our new data sources and what they’re good for:

LinkedIn Jobs Scraper

Now you can scrape LinkedIn jobs as a Datablist source.

How to Use It:
  1. Create a new Collection
  2. Click on “See all sources”
  3. Choose “LinkedIn Jobs Scraper”

Why we did it: We thought it just made sense, after having an Indeed source – the biggest job board worldwide – already available as a source, having also a LinkedIn Jobs source – the second largest.

Remote CSV/JSON source

Now you can connect remote CSV/JSON sources to Datablist and keep your collections synchronized with external data sources.

This feature is particularly valuable for teams working with:

  • Multiple data sources across different platforms
  • Frequently updated datasets
  • Automated reporting systems
How To Use It:
  • Create a collection
  • Click on “See all sources"
  • Select "Remote CSV/JSON Import"

Send me feedback ⇒ ⇒ Habib’s LinkedIn

November 2024

Hi Folks, it’s been an incredible month of putting our heads down and working relentlessly to improve Datablist for you. Here’s what we did:

We pushed new features such as:

  • Impressum Scraper
  • Indeed Scraper
  • Waterfall people search
  • Waterfall email verification
  • Waterfall Email Finder
  • Import filter

We improved:

  • Templates
  • Collection view

New Enrichment: Impressum Scraper

The Impressum Scraper is particularly valuable for sales teams and business developers working in German-speaking markets, as they can extract valuable data from companies' Impressum pages.

How it Works:

We use AI to visit the Impressum pages of German, Austrian, and Swiss businesses and extract all the data from their Impressum.

This automated process saves hours of manual data gathering and ensures accurate, up-to-date information for your business contacts.

How To Use It

  1. Upload a CSV with website URLs or domains
  2. Click "Enrich" in the top menu
  3. Go to "AI" and select "Impressum Scraper"
  4. Map the column with the URLs as Input Property and click on "Continue to outputs configuration"
  5. Click on the ⊕ icons to add a new column for each returned data point
  6. Click on "Instant Run" or schedule your task
  7. Configure your preferred "Run Settings" and start scraping

Returned Data

  • Company name
  • Managing directors
  • Phone numbers
  • Emails
  • Addresses
  • Legal registration numbers

Related Resources:

Englisch Guide

Deutsche Anleitung

Für die, die Videos lieber mögen

New Data Source: Indeed Scraper

Since many recruiters and HR professionals are struggling with inefficient methods of manually gathering job posts to identify potential clients, we had to build this one.

This Indeed scraper makes it simple to extract vacancies at scale, allowing you to make data-backed decisions without the manual effort of browsing countless job listings.

How It Works

You provide an Indeed search URL or configure your search in Datablist using keywords, locations, and time based filtering and we scrape all the job listings matching your search criteria for you.

How to use it

  1. Provide keywords & locations or Indeed search URLs
  2. Define country, time of publication, and set result limit per search
  3. Click on "Continue" to configure your outputs
  4. Click on the ⊕ icons to add a new column for each piece of information
  5. Click on "Import Data" to start scraping!

Returned Data

  • Basic Information: Job Title, Location, Country, Type
  • Job Content: Description, Benefits, Salary Range, Date Posted
  • Application: Indeed Offer URL, Apply Link
  • Basic Company Data: Company Name, Website, Description, Industry
  • Company Metrics: Staff Range, Revenue, Rating, Reviews Count
  • Company Location: Address
  • Company Links: Indeed Link
  • Source Information: Job Source

Related Resources:

Indeed Scraping Guide

Indeed Scraping Video Tutorial

New Enrichment: Waterfall Email Verification

This email verification allows you to take existing email lists and check if there’s an actual inbox behind that email or not which is crucial for having healthy inboxes, effective growth campaigns, and newsletters that convert people.

How It Works:

Here's how our email verification process works:

  • First, we scan MX records to check if the domain can receive emails
    • If no MX records exist, the email is marked as undeliverable (this check is free)
    • If MX records exist, we perform an SMTP check to verify if the inbox exists

For additional accuracy, emails marked as "unknown" or "catch_all" undergo a more sophisticated verification process to determine their safety status.

For each stage, we use specialized providers.

How to Use It

  1. Upload a list of emails
  2. Click on “Enrich”
  3. Go to “People” and select “Advanced Waterfall Email Address Verification”
  4. Keep the “Default Settings” or configure a “Custom Waterfall” and use your API keys
  5. Map the column with the emails as “Input property”
  6. Click on “Continue to outputs configuration”

Returned Data:

  • Email Status – valid, invalid, risky, catch_all, unknown
  • Reason – Additional context to explain the status.
  • Suggested Email – If the address is invalid and we found a likely repaired version
  • Free Provider – Return True if the domain is from a free email provider (Gmail, Yahoo, Hotmail, etc.)
  • Role Account – Return True if the email address is a role address (support@, team@, etc.)
  • Domain – Return the domain part of the email. After the @. Example: gmail.com
  • MX Provider – Return the email provider. Examples: google, microsoft, ovh, etc.

New Enrichment: Waterfall People Search

Until now if you had a list of accounts and wanted to find prospects within those companies you had to export your list, use a tool like Apollo or Lusha to upload the list there, search for prospects, export the list, and import it back to Datablist. Here’s why this was a problem:

  • Adds more (and manual) steps to your workflow
  • You get outdated contact information
  • You had to manage multiple subscriptions

We fixed all that with our new Waterfall People Search, now you can create prospect lists directly in Datablist!

How It Works:

You configure a search using the company domain, job title, department, and seniority. Then you can set up a fallback in case our database doesn't contain the contact you're looking for. Once a contact is found, you'll get their contact information with fresh LinkedIn data.

How To Use It:

  1. Upload an account list
  2. Click on “Enrich”
  3. Go to “People” and select “Waterfall People Search”
  4. Configure your search using: Job titles, departments and seniorities
  5. Map the company domain as “Input Property”
  6. Click on “Continue to outputs configuration”
  7. Click on the ⊕ icons to add a new column for each output
  8. Click on “Instant Run”
  9. Configure your “Run Settings” and click on “Run enrichment on X items”

Returned Data

  • First Name - Contact's first name
  • Last Name - Contact's last name
  • Full Name - Complete name
  • LinkedIn URL - Profile URL
  • LinkedIn Summary - Profile description
  • Job Title - Current position
  • Job Start Date - When they started their current role
  • Work Email - Business email address
  • Seniority - Level (owner, CXO, VP, director, manager, senior, entry, intern)
  • City - Current city
  • Region - State/province
  • Country - Current country
  • Company LinkedIn - Company's LinkedIn page

Use Cases:

  • Build a prospect list
  • Find a colleague of a prospect

And anything else that you creativity allows

Related Resources:

How to find a prospect colleagues

New Enrichment: Waterfall Email Finder

Many of you were frustrated about the fact that we only had one email finder and didn't allow using API keys. We heard your feedback and implemented not only one but two new email finders, and yes, you can use your own API keys.

To our existing provider Icypeas, we've added Enrow and Prospeo, which are both considered to be among the top 1% of email providers.

How It Works:

The new Waterfall Email Finder discovers email addresses using algorithmic patterns based on first name, last name, and company domain. Each of the providers has its unique strengths, and the good thing with Datablist's Waterfall Email Finder is that you're only charged for found emails.

How To Use It

  1. Upload a list of prospects
  2. Click on “Enrich”
  3. Go to “People” and select “Waterfall Email Finder”
  4. Configure your own “Waterfall” or keep the default settings
  5. Map the columns with the first names, last names and domains of your prospects as “Input properties”
  6. Click on “Continue to outputs configuration”
  7. Click on the ⊕ icons to add a new column for each output and click on “Instant Run”
  8. Configure your “Run Settings” and click on “Run enrichment on X items”

Use Cases:

  • CRM enrichment
  • Prospecting

Related Resources:

How to clean and refresh CRM data

New Feature: Import Filter

When you have a file with non-matching records, you don't want to import the file as a whole — you want to filter only for matching records and delete the rest. With this feature, you can control which records get imported by applying custom filters, letting you work smarter, not harder.

How It Works:

You import your file, select the filters and we import only the records that match your filters.

How To Use It:

  1. Import a CSV or Excel
  2. Click on "Continue to properties"
  3. Select the file columns you want to import and define your column data types
  4. Click on "Continue"
  5. Switch the toggle to start filtering
  6. Set up your filters and click "Process X items with filters"

Use Cases:

  • Filter before importing
  • Create subsets of huge CSV files
  • Remove duplicates before importing
  • Import only records matching specific criteria (e.g., job titles, locations)

Improvement: Update Your Templates

We added this feature to let you keep your template library simple and clean and make adjustments based on the new learnings that you get from using them. Even if you just want to update your API key, you can easily do it.

How It Works:

You add or remove parts of your template and click the update icon.

Improvement: Full Screen View

Toggle between full-screen and compact views by closing or opening the sidebar, making it easier to focus on what matters most to you.

Removed: Email finder

The legacy email finder is now the waterfall email finder.

March 13th, 2024

New Enrichments Experience

After building strong foundations for dealing with CSV files and data cleaning (deduplication, etc.), it's time to work on enrichments.

The vision is simple: the web is overwhelmed with external services to enrich companies/people, verify email addresses, guess the gender from a name, scrape URLs, etc. But it's a mess to combine all those services.

A way to do it is to use workflow automation tools (Zapier, Make, n8n) on top of spreadsheets. Yet, it is complex, error management is a mess, and it's mostly suited for event-based workflows.

Datablist aims to replace spreadsheet tools for list management (lead generation, lead scoring, product catalogs, customer management, company screening, etc.). A central hub with built-in enrichment integration.

Here are the latest developments to get enrichments as a first-class citizen in Datablist.

Enrichments Listing

Enrichments are listed in a new drawer. The top bar lets you filter between enrichments for "Companies", "People", "Translations", "Places", "AI (Artificial Intelligence)", and "URLs".

For each enrichment, the inputs and output properties are visible. The cost is displayed directly in the listing.

And a bookmark flag moves your favorite enrichments to the top.

Enrichment Runner

The "Enrichment Runner" is the screen to configure and run an enrichment. I've heard your feedback and the runner has been revamped.

Custom inputs with RichText editor

Imagine you have an enrichment with a "Full Name" input and you have "First Name" and "Last Name" in your collection. That's when you will be happy to use the new "Custom Input" feature.

You can write custom texts with variables from your properties. In the previous example, you would write "{{firstName}} {{lastName}}" to build "Full Name" input values.

Auto-skip items with existing data

I want the default behavior to be the least risky. So you don't lose or overwrite data. With the new runner, the default behavior is to skip your items when there is already some data in the output properties.

For example, if you use a translation enrichment. You have a "Source" property with the text to translate. And a "Target" property to store the translated text.

When you run the translation, it will translate and populate the "Target" property.

Then, you add new items, etc. The second time you run the translation enrichment, it will skip all the items with a text in the "Target" property. So only the new items will be sent to be translated.

This setting is available in the "Existing Data Rule". Other options are available to only edit data for the empty cells, or to overwrite data.

See new properties to be created

It can be complex to understand how the enrichments work. With "Settings", "Inputs", "Outputs", etc.

The outputs section is even more complex. You can ignore an output, map it with an existing property, or create a new property to store the data.

With this new runner UI, I've made some visual changes to better understand what will happen with your enrichment outputs.

New properties are shown in green.

New properties are not created until you run the enrichment. You can change the output configuration without messing with your collection data structure.

Test on first 10 items

Enrichments can mess with your data and some of them cost credits. You need to be sure the enrichment works like you expect it to do.

Before running on all your current items, the enrichment will be run on the first 10 items. Once you have validated the results, it runs on the remaining items.

Better errors management

Dealing with external APIs can give headaches! You can get server errors (for example throttling errors).

Datablist stops the enrichment when an error occurs. To prevent any collateral damage.

But once you have seen the error message, you might want to retry the enrichment on the remaining items. This is now possible. The runner keeps track of the item IDs that have been processed. A "Retry" button is available after an error happens. On Retry, Datablist will skip the already processed items.

Async Enrichments

Another big release with the asynchronous runner! Previously, enrichments could only run from the browser. This was enough for fast enrichments, or with a small number of items. But you had to keep your browser open to enrich a large collection. This prevented me from adding long-running enrichments such as email finder, email verification, scraping, etc.

Currently, it is not possible to choose to run a specific enrichment asynchronously. Some enrichments that take a long time to be processed have been configured to run asynchronously, and others are still triggered by the browser. In the coming weeks, you will be able to select how to run the enrichment.

This opens several future possibilities such as workflow building, etc.

Data Sources for Lead Generation

Data Sources are a new kind of enrichment! Classic enrichments run on each item to provide additional data. Whereas data sources create new items. This is perfect for lead generation.

Data Sources are available from the "Import" menu.

Start from a Google Search query

This one is self-explicit. You write a Google Search query, and it returns the Google results as items. This source can scrape a maximum of 200 results with a free account, and up to 1000 results with the Standard plan.

Google is more powerful than you might think. With operators, it is possible to build complex queries and search on specific websites.

A search such as "saas human resources site:linkedin.com/company/*" will return all SaaS companies in the HR spaces. You can search for LinkedIn profiles, job ads, etc.

Will scrape and import results items from the following Google query:

Start from a Sitemap URL

This data source is technical but very powerful. Sitemaps are XML files listing all the webpage URLs a website has. For datablist.com, the sitemap lists all guides, blog posts, etc.

And for companies or people directories, job boards, blogs, etc. you can scrape the pages in a snap using the sitemap and the Bulk Scraper (or Links Scraper).

This data source plays nice with the "Unique value" setting available for a property. With the "Unique value" setting, you can detect a delta between two sitemap imports. Perfect for finding new job ads, newly published companies, or people.

If you need help implementing a Lead Generation workflow using sitemaps, just contact me.

New Enrichments

LinkedIn Profile Scraper

Extract public information from LinkedIn profiles. This enrichment loads and parses LinkedIn Profile pages to get data. An option is available to only fetch the profiles in real-time or to allow cached profile data.

Email Verification Premium

Complete email verification service. From syntax validation to checking the mailbox exists and can receive emails.

Email Finder

Find a professional email address using first name, last name, and company info (name or domain). Email addresses are verified and you pay only for the emails found.

Bulk Scraper

Scrape URLs with CSS selectors. Use the proxy option to scrape protected webpages, and configure multiple selectors to scrape several texts.

Links & Email Addresses Scraper

Scrape a page and search for LinkedIn member/company profile URLs, Email Addresses, and Instagram Profiles.

Apollo People Search

Search one or more profiles using Apollo.io. Define matching Job Titles, Seniority, and company domains and get profiles.

PeopleDataLabs Person Search

Search one or more profiles using PeopleDataLabs powerful ElasticSearch query language. Use variables from your items to build complex queries.

Instagram Profile Scraper

Extract public information from Instagram profiles in bulk. This enrichment loads and scrapes Instagram Profile pages to get data.

Find Company domains from Company names

Return the domain matching the company name. Return the domain with the more traffic when several domains match.

Detect Language from a Text

Return the language code and name by analyzing a text.

Duplicates Finder

Two improvements have been done to the Duplicates Finder.

The first is a link to automatically combine or drop the remaining conflicting properties in duplicates finder.

The links are available after a first "Auto-Merge" that returns the conflicting properties.

The second improvement is a new button to download the changes list from the duplicates merging. The change list contains the modifications done on each item: updated or deleted. And two columns for each property, "Previous {property name}" and "Destination {property name}".

Extract Menu

A new "Extract" menu has been added in the collection header. You can extract email addresses, tags, domains, etc. from texts.

Improved Splitting Property tool

The "Split Property" has been improved.

First, you no longer need to explicitly set the number of properties to create. Now, an "analysis" step scans your first 2000 items to detect the best number of properties to create.

Second, a new option is now available to group split terms by name.

New Filters

Startswith, Endswith, and RegEx filtering on texts

Startswith, Endswith, and RegEx filters are now available on texts.

RegEx expressions are powerful when you master them. Perfect for finding items that match a pattern (phone number validation, URLs).

Check Data Filtering documentation.

Relative filters on DateTime

You can now filter dates by comparing them to the current day.

You define three parts:

  • Next or Last
  • A number
  • A duration term: hours, days, months, years

For example: "Last 2 days".

Map Extract and Convert results into an existing property

Previously, extract and convert tools returned the results into new properties. So, after adding new items, you couldn't re-run the tools on the new items without having to create new properties.

You can now select if the results go to a new property or an existing one. Only compatible properties are available. If you convert Text to DateTime, you can only map the result property to existing DateTime properties.

Misc

  • Show a tooltip on the preview cells with text overflow
  • Allow Number and Checkbox properties for RichText variables
  • Add "Sum" in Calculations
  • Create a new property with the keyword shortcut "p" on a collection page
  • Shortcuts to filter from the "Distinct Value" calculation
  • Convert DateTime to Text
  • Handle multiple date formats for Text to DateTime conversion
  • Shortcut to BulkEdit from the column menu
  • Allow Bulk Edit on DateTime properties

Bug Fixes

  • Fix error when editing a collection name and directly switching to another collection
  • Fix phone numbers (+XXXX) that were imported as numbers. CSV columns with texts in the format "+XXXXX" (plus sign and digits) with at least 8 digits are kept as Text.
  • Fix loading items issue when switching between collections quickly. A "loading" text was displayed and the items didn't load.


November 1st, 2023

Calculations

You can now run calculations on property values. Calculations are accessible from a property column menu.

Datablist runs the calculation in the "current view". It takes the items in this order:

  • If you have selected items in your collection, it will process them.
  • If you have a filter or a full-text search term, it will process the filtered items.
  • Otherwise, it will process all your collection items.

Calculations available for all data types:

  • Count Empty - How many items with an empty value for the property.
  • Count Filled - How many items with a value for the property.

Other calculations depend on the property data types such as Text or Number.

Calculation available for text-based data types:

  • Characters count - Return the sum of all characters. Leading and trailing spaces are not counted. Spaces in between words are.
  • Words count - Return the number of words found in the texts.
  • Count distinct values - Return facets for a property with how many times each value appears. This is great for aggregation of limited choice values (countries, status, etc.).

For number-based data types:

  • Min - Return the lowest value for the property.
  • Max - Return the highest value for the property.
  • Average - Return the sum of values divided by the number of non-empty values.

Check the calculations documentation to know more.

Filter Groups

Data Filtering has been improved with "Filter Groups".

With Filter Groups, you can create complex filters with different filtering operations. Filtering operations define how filters are combined. With "AND", an item must pass all conditions. With "OR", an item passes once one of the filters returns true.

Filter Groups are compatible with Saved Filters.

Duplicate Finder Improvements

Select a different algorithm for each property

Until now, a single data-matching algorithm was selected before the deduplication process. Internally, Datablist checked each property data type to apply the selected algorithm on compatible properties. And it fell back to Exact matching on the other properties (e.g. Date, Checkbox, Number).

Now, each property used for deduplication is listed in the data-matching algorithm step.

Compatible algorithms are listed according to their data type. And options only apply to the property.

For example, two properties might use a fuzzy matching algorithm and have different distance thresholds.

Ignore the case in the Exact algorithm

By default, Datablist Duplicates Finder is case-insensitive. But in some cases, you need to match duplicate values only when they have a similar case.

A new option is available for the "Exact" Algorithm to be case-sensitive.

Master Item Rule selection

After the data matching step, an important part of deduplication is duplicate merging. With the auto-merge algorithm, Datablist selects a master item, merges the values from the other items in it, and deletes all but the master item.

By default, the elected master item is the one with the most data.

A new setting has been added in the auto-merging assistant to change this master item selection.

Two new rules are now available:

  • Last Updated - This rule chooses the item based on the newest modified date.
  • First Created - This rule chooses the item based on the oldest creation date.

During this development cycle, the "Most Complete" default rule has also been improved. Until now, the rule checked how many properties had data. When two items had the same number of properties with data, it took the last created item.

Now, for two items with the same number of properties with data, it also checks the text length.

For two items such as:

First Name | Last Name | Notes

John...... | Doe ..... | A great man.

John...... | Doe ..... | A great man. Remember to contact him.

The second one will be selected as the master item. The "Notes" text is longer for the second item.

Normalize street names

In Data Cleaning, normalization ensures you have a uniform format across all your data. Normalization reduces errors during deduplication and you get a consistent view of your data.

I have several built-in normalizations in mind for later:

  • Company name normalization to remove suffixes such as "Inc." or "GmbH".
  • People name normalization to clean nicknames, deal with initials, etc.

Last month, I released the first normalization algorithm to deal with street names written in English.

The "Normalize Street Name" algorithm deals with abbreviations (St. == St == Street), directional words (N 45 == North 45), etc.

Other Improvements & Fixes

  • Option to auto-generate column names during import for files without headers.
  • Fix Excel export in selected items (and duplicate groups download).
  • Fix auto merging on properties with punctuation differences.
  • Show how many duplicate groups have been merged during the auto-merge process.
  • Auto updated disposable provider domain list and added Stop Forum Spam as a new source.
  • Fix anonymous collection import for collections with more than 10k items.
  • Auto open Datetime picker on cell edition.
  • Show data loss warning every 48 hours for collections not synced to the cloud (anonymous, or free account with more than 1000 items per collection).