Join and Merge are two operations to combine data from several files. When merging, you are combining several files with the same structure into a single listing.

When joining, you are combining several files with different data structure but with at least one common field. This common field will be used as a key to combine the data and generate a single listing combining the fields from all the files.

Merge and Join examples

Merge example

Let's say you have a File A:

  firstName |    lastName    |  jobTitle
    Jean    |       Doe      |  CEO
   Morgan   |     Stanlet    |  Marketing Officer

And a File B:

  firstName |    lastName    |  jobTitle
    Joe     |     Kanigan    |  Finance
   Marie    |     Filman     |  Dev

Then, the resulting of a merge operation would be:

  firstName |    lastName    |  jobTitle
    Jean    |       Doe      |  CEO
   Morgan   |     Stanlet    |  Marketing Officer
    Joe     |     Kanigan    |  Finance
   Marie    |     Filman     |  Dev

Join example

Now, let's look at the join operation. When performing a join, your files can have different structures but must have a common identifier that will be used to combine the data together.

We have a File C:

          email           |      jobTitle
    jean.doe@gmail.com    |        CEO
     morgan@gmail.com     |    Marketing Officer

Than we want to join with a File D:

          email           |         City
    jean.doe@gmail.com    |        Paris
     morgan@gmail.com     |       New York

The resulting join operation using the email identifier is

          email           |      jobTitle         |       City
    jean.doe@gmail.com    |        CEO            |       Paris
     morgan@gmail.com     |    Marketing Officer  |      New York

A very common data manipulation task is to bring two or more sets of data together based on a common key.