Changelog

New features, improvements and fixes to Datablist.

July 2022

Managing collection up to 1.5 millions items

Last year, I focused on building the foundations for Datablist. Users management, the data table, and the basics for dealing with data. Until January 2022, Datablist could only import CSV files with 10k rows or less. This is the current limit you find on Airtable, Coda, etc.

For 2022, I wanted Datablist to deal with listings of at least 1 million items. This is a comfortable limit to deal with logs, product data sets, users, and prospect lists. Spreadsheet tools break when dealing with a few hundred of thousand items.

In July I finally unlocked import for CSV files up to 1.5 million rows! (1 million for free users).

Going higher is not on the roadmap.

I can't find business use cases with needs for more than 1.5 million items. Bigger CSV files are for data science and are used for analytics, in read-only mode. Read-only analytics on big CSV files is possible with tools like Microsoft PowerBI and Datablist doesn't have any advantage.

Datablist shines on data consolidation, enrichment from external files or API, and cleaning (deduplication, merging).

Stoppable import process

By allowing big CSV files, the import process can take a few minutes. We want the user to have a reactive experience when using Datablist. We added a "stop import" button to cancel the import before the end.

Improve search and filtering for large collections

The time to process searches and filtering on your data in Datablist is proportional to the number of items. If your collection has 1 million items, it takes one thousand times longer to filter your data than with a 1k items collection.

To scale, Datablist filtering engine stops once it has found enough results to fill your list view. And when you scroll, it resumes the search to find more results.

With this behavior, searching and filtering on hundred of thousand of items feel the same as searching a small dataset.

-----

On top of that, any processing search is canceled when the search and filtering parameters change. When typing in the search box, a search is run any time you stop typing for some time. When you resume typing to add a keyword, the previous search request is canceled.

Persistent item drawer

I've added a persistent url for any item. When you open an item in the drawer, the url changes with the item persistent url.

Returning on this url on a new tab or in another browser will load the collection and open your item directly.

Improvements

See how many items are returned on a search or filter listing

The way Datablist processes data during a search (see above) means the engine doesn't know how many results can be returned on any query. It just stops when it has enough results to show. That is the reason I don't show a "Counter" all the time with the number of matching items. The process of counting how many items match a query is an intensive operation.

But this information is important when managing a dataset.

The total number of items matching a query is now available when using the select all feature.

Toggle the master checkbox, and click on "All items selected". Datablist will count all the results matching your query and replace the text with the value.

10 June 2022

Import files up to 500k items

Another milestone in Datablist handling large collections. After moving the limit from 10k to 50k in April, I've been able to increase it ten-fold in May to 500k. From 10k to 500k in 4 months is a big step forward. When importing a data file (like a CSV file), the data is parsed, and stored in a local database. Datablist uses a database to filter, sort the data, and save edits. In my test, importing a 500k file takes less than 2 minutes. My goal is to import and edit files up to 1 million items.

Join big CSV files

In the process of increasing the number of items in a collection, I've rewrote some features. The algorithm to join several CSV files on a unique key has been improved to handle bigger collections and edge cases.

Joining two CSV files with hundred of thousand of items is now possible.

Collection Filters

Filtering data is finally available on Datablist. Select one or several filter conditions to show a subset of your collection items. Filtering conditions depend on your data types. Number properties can be filtered on numerical operators, DateTime values are filtered related to timestamps. To export your filtered view, select all items and click on "Export selected items".

Auto create properties during first import

Datablist is both used by regular users and users coming from Google who wants to perform a single task (like deduplicating a CSV file). The import process must be as straightforward as possible. On an empty collection, when importing a file, properties are auto-created using column names and detected data types. If the collection already has properties, the mapping process is shown.

Clone collection

Instead of exporting a collection to a CSV file to re-import it in another collection, you can use the shortcut "Clone collection". All the properties and the collection items are duplicated in a new collection.

Select CSV Export separator

When exporting your data to a CSV file, a new option to select the separator character is available. Choices are "comma" (default) and "semicolon".

April 2022

Import files up to 50k items

In April, I released a first step toward managing a higher number of items with Datablist. From 10k items per collection, the limit has been increased to 50k items.

A lot has changed just to multiply the limit by five. Instead of loading all the data in memory (like all spreadsheet tools), Datablist now loads it on demand from a local database. The user interface is still responsive and easy to use. And this stack will scale well to millions of items (at least it does in my head 🤞).

Performance Improvements

Faster file import

Importing files (CSV/Excel) into a collection for registered users has been improved. Previously, items were saved in the cloud during the file import process. For big files with thousands of items, this leads to frustrating seconds/minutes waiting for cloud sync before accessing the data.

Cloud sync is now asynchronous. Access and manage your data instantly after importing a file while the data is being synced to the cloud in the background.

Faster duplicates finder

Duplicates finder algorithm has been improved. It is now faster with big collections.

Also:

  • Duplicates comparison is now case insensitive.
  • DateTime values are now compared
  • Duplicates on empty values are skipped

Action Runner for big collection

Running actions (verify email addresses, find a LinkedIn profile for an email address, etc.) on thousands of items was challenging. The action runner now split the collection into small parts (chunks) and sends them in sequence. A stop button is available to stop the action before processing all the items.

Better errors handling

A lot can happen in a web application. Internet can be lost, servers might have intermittent issues, etc. Shit happens 🤷‍♂️

I continue to improve how Datablist web client deals with any errors. Retries, showing feedback, etc.

February 2022

New features

Datablist Help Center

Learn to use Datablist and discover how to get the most of it with our new Help Center: https://www.datablist.com/docs

New action: LinkedIn Profile Finder

This action takes a name and keyword properties and returns a LinkedIn Profile URL when found. Read our new guide: How to scrape Facebook group members and find their LinkedIn Profile.

Notifications

Long running tasks are being moved to background jobs to improve UI reactivity. For example, when a collection is deleted, the task takes several seconds to complete but it does't prevent the user to navigate on the Datablist App.

Notifications have been implemented on: Collection Delete, Item edit from the drawer, undo and redo operations.

Network lost, API errors while editing items now return visible error notifications.

Improvements

Improve selected items export

Select export format between CSV and Excel files.

Improve history (Undo/Redo)

History actions (Undo and Redo) are new shown directly in the collection header.

Also, it's now possible to undo collection name and icon changes. And, after creating a new item, calling "undo" will delete it.

Keep CSV rows order on import

During import, file rows are split in several chunks and saved using parallel calls. Before, this could lead to a reordering of items order depending of what call was saved first. File import has been improved to keep file rows order.

Create a collection fast with keyboard shortcut

Press "n" to create a new Datablist collection. See Keyboard Shortcuts documentation https://www.datablist.com/docs/keyboard-shortcuts

November 2021

New features

Automatic duplicates merging

Last month, we released Datablist Duplicates Finder to find and list collection items with duplicate values. Yet, processing those duplicates was a manual task.

Not anymore! In November, Duplicates Finder got an upgrade to automatically merge non-conflicting items.

What are non-conflicting items?

  • Items with similar values for all their properties
  • Items with complementary values

Merging Assistant

For items with conflicting values found with Datablist Duplicates Finder or to merge items directly from a collection, we introduce a Merging Assistant.

It shows two or more items, automatically select the item with the most information as Primary Item and allow merging data from other items into this Primary Item.

Import/Export Excel files

In addition to CSV, Microsoft Excel files have been added as import and export format.

To import a Microsoft Excel file, the data must be on the first worksheet and display a Table like structure.

Improvements

Properties data types

In November, built-in data types received improvements. Current built-in types are:

  • Text
  • Long Text
  • Checkbox
  • Email
  • Url
  • Datetime
  • Date
  • Number

During data import (CSV or Excel), Datablist scans the first 100 lines to automatically detect data types.

Collection search and sort actions are compatible with all those built-in data types.

In the detail drawer, data validation is performed to forbid non valid data inputs.

Empty cell en backspace/delete key press

Last month, bulk delete was added to empty several cells with the "Delete"/"Backspace" key. Now it also works on a single cell.

Fix freeze on big CSV file import

On very large files (more than 50mo), Datablist CSV Importer could freeze as browser memory became full.

Now, CSV files are read chunk by chunk to handle large files.

Currently, Datablist only import the first 10k lines but this is an important step toward higher volume of data.