Have you ever faced the challenge of counting distinct words in a CSV or Excel file? Whether you're working with survey responses, customer feedback, or any text-heavy data, identifying unique words can offer invaluable insights.

With spreadsheet tools, you can count the number of distinct values. But when each cell contains several words, Excel formulas becomes too complicated.

In this article, we will guide you through counting distinct words (and splitting cells by separator) in a column using Datablist, a robust CSV viewer and editor with advanced data-cleaning features.

Count Distinct words with separator
Count Distinct words with separator

How many distinct words are in my CSV file?

Step 1: Import Your CSV or Excel File

The first step to get the distinct word count is to load your CSV file. Datablist is a powerful CSV viewer that loads CSV files with up to 1.5 million rows.

In the Datablist Application, create a new collection (using the "+" in the sidebar) to load your CSV file.

Import CSV file
Import CSV file

Follow the import wizard to get your CSV data in Datablist.

Step 2: Open the Calculation tool

Once the import is done, click on the column you want to count words in. Select "Perform calculation".

Open Calculations tool
Open Calculations tool

The "Calculation" tool opens. Datablist provides several calculation algorithms.

For text columns:

For numeric columns:

Step 3: Count Distinct Words with or without a separator

Select Count distinct values. An extra option appears to define a "Splitting Rule". The splitting rule defines if a cell contains one or several words.

Possible separators: Comma, Semicolon, Dot, Space, or Custom. When you select custom, another option will pop up to write your custom separator. You can write any string.

Configure separator
Configure separator

Then click on the "Run calculation" to start the process. A list of results appears in the drawer.

Count Distinct words with separator
Count Distinct words with separator

Step 4: Review

Review the analysis results directly within Datablist. For each term, a shortcut button appears on mouse hover to create a filter.

Conclusion: Counting distinct words using a separator is a breeze with Datablist. Datablist opens CSV and Excel files alike. Its intuitive interface and powerful features make it an essential tool for data professionals.

When is Counting Distinct Words useful

Counting distinct words is useful for several tasks:

  • Analysis Tags in a product catalog - Example: A Tags column in a product catalog might contain values like "electronics, gadgets, accessories". Using the "Distinct words" feature gives you a summary of occurence for each tag across the products.
  • Text Analysis - Helps in text mining, sentiment analysis, and understanding frequency patterns. Example: Consider a survey asking for user opinions. Counting distinct words in the feedback column helps identify common sentiments and themes.
  • Data Normalization - Identifies and reduces redundancy in datasets.
  • Content Creation - Assists in keyword research and content optimization.