Changelog

New features, improvements and fixes to Datablist.

November 1, 2022

October 2022

Combine or drop conflicting values when merging duplicate items

In September and October, we improved the auto-merge algorithm in our "Duplicates Finder". The previous version could only merge non-conflicting items. For example, if two items were duplicates on their email address. Datablist could merge them if the other properties had similar or empty values.

All the remaining items with conflicts had to be merged manually.

On CRM data or any contacts list, most of the duplicates found had to be manually merged.

With our improved auto-merging algorithm, you can now define how to deal with your conflicting data during the merge process. Two options are available :

Combine conflicting properties - With this option, conflicting properties will be concatenated with a separator. If two contacts have a conflicting "Note" property. The merged item will have a single "Note" property with the two values concatenated. The delimiter here could be a new line. Available delimiters are space, comma, semi-colon, and new line.
Drop conflicting properties - With this option, only the value from a master item will be kept during the duplicate items merging. This is useful for technical properties. If your "Contacts" have an "Id", "Created On", or "Last Modified" property, you don't need to combine them. You only want to keep the values from one item. At the moment, the primary item is the one with the most data. In the future, you will be able to define rules to select your primary item.

If you run the auto-merge algorithm and conflicting items remain. Conflicting properties will be listed. Configure the conflicting properties with the "combine" or "drop" options and rerun the auto-merge to finish the merging process.

Download invoices from your account

You don't need to send me an email to get your invoices! Your invoices are now listed in your account. The invoices can be downloaded as PDFs. Check your Billing page to see your invoices.

Quick feedback system

Datablist is in beta. I'm looking for a maximum of feedback to know your use cases, your issues, and your ideas on how to improve the product.

I've been sending emails to users to learn how they use the product but the response rate is very low. In October, I tested another approach with a very simple feedback system directly in the backoffice. It pops up after you export your collection items; after you use the Duplicates Finder; or after using Datablist for some time. And it shows only once a month.

After a few weeks, I've collected a little less than 30 ratings. Mostly good ratings 🤩 and interesting feedback.

Fixes & improvements

Add an error message when loading an empty CSV
Show a warning message to free users to alert their data is not synced with the cloud
Fix check credits balance regularly when running an action on a large collection
Fix cloud saving on checkbox data
Fix cloud synchronization after deleting a property
Fix copy/pasting text in the search input
Fix auto merging of duplicates with values with different case

September 5, 2022

August 2022

Run JavaScript code

Data transformation will be a focus for the next months. Splitting or joining properties, find and replace, etc. They are part of your day-to-day data-cleaning tasks.

I wanted to implement a first dev-friendly feature to run javascript code directly on your collection items. You can clean and transform any of your properties' data by writing a JavaScript function. Check our guide to scraping and enriching Facebook Group members to see how it can be used.

Credit system

Datablist goal is to be the perfect mix of a productivity tool for data management and business software to help you grow your company. Data management is not enough to make an impact. Native data enrichment services and third-party APIs integration will be at the core.

In marketing, SaaS APIs offer email validation, business and people enrichment, scoring, etc. Instead of moving your data from one tool to the next. Datablist will consolidate your data so you trigger each service directly from it.

Every service charges a per-use fee and this cost has to be passed to Datablist customers. The first step toward this vision is a new credit system. Every month, customers receive 5000 credits to be used during the month. And top ups are available to buy extra credits. Free users receive 500 credits on sign-up.

With this system, new third-party integrations will be possible. Feel free to reach me if you want a service to be integrated.

Improvements

Export filtered items

When triggering an export, Datablist will check if you have filters. If your collection is filtered, two options will be available: export only the filtered items, or export the complete collection.

Prevent the browser to load the previous URL on horizontal scroll when a drawer is open

Web browsers have a native implementation with horizontal scrolling to navigate your URL history. Scroll left to load your previous URL, and right to move forward.

This behavior is counterintuitive with Single Page Applications such as Datablist. In the data listing, you have to scroll right and left to see all your properties. Scroll too much and your browser moves you to another page.

It happened to me many times. I open an item in the drawer, I scroll horizontally to check some data, my scroll goes too far, and I left my current page. And the drawer disappears with my data unsaved.

I don't like to overwrite native browser features so I haven't disabled this behavior on all Datablist pages. But it is now disabled when you have the drawer open.

This will prevent most of the data loss when creating a new item or when running an action.

Fixes

Fix export on collections with more than 500k items
Fix export on 1 item collection

August 1, 2022

July 2022

Managing collection up to 1.5 millions items

Last year, I focused on building the foundations for Datablist. Users management, the data table, and the basics for dealing with data. Until January 2022, Datablist could only import CSV files with 10k rows or less. This is the current limit you find on Airtable, Coda, etc.

For 2022, I wanted Datablist to deal with listings of at least 1 million items. This is a comfortable limit to deal with logs, product data sets, users, and prospect lists. Spreadsheet tools break when dealing with a few hundred of thousand items.

In July I finally unlocked import for CSV files up to 1.5 million rows! (1 million for free users).

Going higher is not on the roadmap.

I can't find business use cases with needs for more than 1.5 million items. Bigger CSV files are for data science and are used for analytics, in read-only mode. Read-only analytics on big CSV files is possible with tools like Microsoft PowerBI and Datablist doesn't have any advantage.

Datablist shines on data consolidation, enrichment from external files or API, and cleaning (deduplication, merging).

Stoppable import process

By allowing big CSV files, the import process can take a few minutes. We want the user to have a reactive experience when using Datablist. We added a "stop import" button to cancel the import before the end.

Improve search and filtering for large collections

The time to process searches and filtering on your data in Datablist is proportional to the number of items. If your collection has 1 million items, it takes one thousand times longer to filter your data than with a 1k items collection.

To scale, Datablist filtering engine stops once it has found enough results to fill your list view. And when you scroll, it resumes the search to find more results.

With this behavior, searching and filtering on hundred of thousand of items feel the same as searching a small dataset.

-----

On top of that, any processing search is canceled when the search and filtering parameters change. When typing in the search box, a search is run any time you stop typing for some time. When you resume typing to add a keyword, the previous search request is canceled.

Persistent item drawer

I've added a persistent url for any item. When you open an item in the drawer, the url changes with the item persistent url.

Returning on this url on a new tab or in another browser will load the collection and open your item directly.

Improvements

See how many items are returned on a search or filter listing

The way Datablist processes data during a search (see above) means the engine doesn't know how many results can be returned on any query. It just stops when it has enough results to show. That is the reason I don't show a "Counter" all the time with the number of matching items. The process of counting how many items match a query is an intensive operation.

But this information is important when managing a dataset.

The total number of items matching a query is now available when using the select all feature.

Toggle the master checkbox, and click on "All items selected". Datablist will count all the results matching your query and replace the text with the value.

June 10, 2022

10 June 2022

Import files up to 500k items

Another milestone in Datablist handling large collections. After moving the limit from 10k to 50k in April, I've been able to increase it ten-fold in May to 500k. From 10k to 500k in 4 months is a big step forward. When importing a data file (like a CSV file), the data is parsed, and stored in a local database. Datablist uses a database to filter, sort the data, and save edits. In my test, importing a 500k file takes less than 2 minutes. My goal is to import and edit files up to 1 million items.

Join big CSV files

In the process of increasing the number of items in a collection, I've rewrote some features. The algorithm to join several CSV files on a unique key has been improved to handle bigger collections and edge cases.

Joining two CSV files with hundred of thousand of items is now possible.

Collection Filters

Filtering data is finally available on Datablist. Select one or several filter conditions to show a subset of your collection items. Filtering conditions depend on your data types. Number properties can be filtered on numerical operators, DateTime values are filtered related to timestamps. To export your filtered view, select all items and click on "Export selected items".

Auto create properties during first import

Datablist is both used by regular users and users coming from Google who wants to perform a single task (like deduplicating a CSV file). The import process must be as straightforward as possible. On an empty collection, when importing a file, properties are auto-created using column names and detected data types. If the collection already has properties, the mapping process is shown.

Clone collection

Instead of exporting a collection to a CSV file to re-import it in another collection, you can use the shortcut "Clone collection". All the properties and the collection items are duplicated in a new collection.

Select CSV Export separator

When exporting your data to a CSV file, a new option to select the separator character is available. Choices are "comma" (default) and "semicolon".

April 29, 2022

April 2022

Import files up to 50k items

In April, I released a first step toward managing a higher number of items with Datablist. From 10k items per collection, the limit has been increased to 50k items.

A lot has changed just to multiply the limit by five. Instead of loading all the data in memory (like all spreadsheet tools), Datablist now loads it on demand from a local database. The user interface is still responsive and easy to use. And this stack will scale well to millions of items (at least it does in my head 🤞).

Performance Improvements

Faster file import

Importing files (CSV/Excel) into a collection for registered users has been improved. Previously, items were saved in the cloud during the file import process. For big files with thousands of items, this leads to frustrating seconds/minutes waiting for cloud sync before accessing the data.

Cloud sync is now asynchronous. Access and manage your data instantly after importing a file while the data is being synced to the cloud in the background.

Faster duplicates finder

Duplicates finder algorithm has been improved. It is now faster with big collections.

Also:

Duplicates comparison is now case insensitive.
DateTime values are now compared
Duplicates on empty values are skipped

Action Runner for big collection

Running actions (verify email addresses, find a LinkedIn profile for an email address, etc.) on thousands of items was challenging. The action runner now split the collection into small parts (chunks) and sends them in sequence. A stop button is available to stop the action before processing all the items.

Better errors handling

A lot can happen in a web application. Internet can be lost, servers might have intermittent issues, etc. Shit happens 🤷‍♂️

I continue to improve how Datablist web client deals with any errors. Retries, showing feedback, etc.