Remove duplicate objects from JavaScript array – How to & Performance comparison

Posted by

The purpose of this article is to share with you the best ways to remove duplicate objects from JavaScript Array based on specific property/key. We will also analyze their performance in terms of execution time for different lengths of the array.

If you want to learn how to remove duplicate PRIMITIVE values from JavaScript array check this article.

Here are the ways we will explore one by one:

  • Filter
  • Filter + Set
  • Filter + FindIndex
  • Reduce + Find
  • For
  • ForEach

Before continue with implementing them, I want to share the data I will be using for testing. I’m sharing it here, so I can omit it later in the article for better visualisation purposes.

const employees = [
    { id: 1, name: 'John Smith' },
    { id: 2, name: 'John Smith' },
    { id: 3, name: 'John Smith' },
    { id: 4, name: 'John Smith' },
    { id: 2, name: 'John Smith' },
    { id: 2, name: 'John Smith' },
    { id: 9, name: 'John Smith' },
    { id: 6, name: 'John Smith' },
    { id: 1, name: 'John Smith' },
    { id: 4, name: 'John Smith' },
];

Filter

function removeDuplicates(array, key) {
    let lookup = {};
    return array.filter(obj => !lookup[obj[key]] && lookup[obj[key]] = true);
}
console.log(removeDuplicates(employees, 'id'))
// [{ id: 1, name: 'John Smith' },{ id: 2, name: 'John Smith' },{ id: 3, name: 'John Smith' },{ id: 4, name: 'John Smith' },{ id: 9, name: 'John Smith' },{ id: 6, name: 'John Smith' } ]

One of my favorite methods. We are taking advantage of the .filter method provided by the Array.prototype. The method is returning a new array with all the elements that pass a specific test(written by us).

The logic for removing is extracted to a function that accept two arguments. The first one is the array we want to remove the duplicate objects and second one – the key we want to use for comparing them.

In our case we are calling it with the employees array and pass ‘id’ as a key. This is our unique object identifier.

Declaring a variable ‘lookup’, initially it will be an empty object, but we will use that object to collect the already added key values. What the callback function of the .filter is doing is to check if this key value already exist as a property in the lookup object.

If such a value already exists, then this object has already been added to the new unique array and we are skipping it.
In case it doesn’t exist, we are adding it to the lookup and set it to true.

This way, when the next object with the same value of the key is added, we will just skip it by returning false from the callback function,i.e. !lookup[obj[key]] will be false.

Filter + Set

function removeDuplicates(array, key) {
    let lookup = new Set();

    return array.filter(obj => !lookup.has(obj[key]) && lookup.add(obj[key]));
}
console.log(removeDuplicates(employees, 'id'))
// [{ id: 1, name: 'John Smith' },{ id: 2, name: 'John Smith' },{ id: 3, name: 'John Smith' },{ id: 4, name: 'John Smith' },{ id: 9, name: 'John Smith' },{ id: 6, name: 'John Smith' } ]

This method is the same as the previous one with single difference – we are using the Set data structure instead of simple object. I’m curious about the performance differences between these two data structures, but we will check the results in the comparison section.

Filter + FindIndex

function removeDuplicates(array, key) {
    return array.filter((obj, index, self) =>
        index === self.findIndex((el) => (
            el[key] === obj[key]
        ))
    )
}
console.log(removeDuplicates(employees, 'id'))
// [{ id: 1, name: 'John Smith' },{ id: 2, name: 'John Smith' },{ id: 3, name: 'John Smith' },{ id: 4, name: 'John Smith' },{ id: 9, name: 'John Smith' },{ id: 6, name: 'John Smith' } ]

A little bit slower method. What we basically do is using the good old filter method, but instead of saving the already added key values to a lookup object, we are using findIndex to find the index of the current object. Taking advantage of the fact that findIndex always returns the index of the first found object, we are comparing it with the current object index.

If they are the same, this means that the object is still not added to the new unique array. If the index is different, the comparison will return false and the object will be skipped(not added again).

Reduce + Find

function removeDuplicates(array, key) {
    return array.reduce((accumulator, element) => {
        if (!accumulator.find(el => el[key] === element[key])) {
          accumulator.push(element);
        }
        return accumulator;
      }, []);
}
console.log(removeDuplicates(employees, 'id'))
// [{ id: 1, name: 'John Smith' },{ id: 2, name: 'John Smith' },{ id: 3, name: 'John Smith' },{ id: 4, name: 'John Smith' },{ id: 9, name: 'John Smith' },{ id: 6, name: 'John Smith' } ]

Reduce is another method provided to us by the Array prototype. The “reduced” value is stored in the so called accumulator, which is returned at the end. In our case the accumulator will contain our unique array.

For every iteration, we are doing a check. Is the accumulator already containing an object with property(key) value, equal to the current object property value. If there is no such an object, we are pushing the current one to the accumulator. Otherwise, just returning the current accumulator, without pushing a new one.

For

function removeDuplicates(array, key) {
    let lookup = {};
    let result = [];

    for(let i=0; i<array.length; i++) {
        if(!lookup[array[i][key]]){
            lookup[array[i][key]] = true;
            result.push(array[i]);
        }
    }

    return result;
}
console.log(removeDuplicates(employees, 'id'))
// [{ id: 1, name: 'John Smith' },{ id: 2, name: 'John Smith' },{ id: 3, name: 'John Smith' },{ id: 4, name: 'John Smith' },{ id: 9, name: 'John Smith' },{ id: 6, name: 'John Smith' } ]

Like in the first two methods, we are using a helper variable which is helping as to keep track of the already added objects. The only difference is that instead of using the .filter method, we are using the most basic for loop. I suspect that the performance will be better comparing to the filter, due to the fact that we are not executing callback function for every element. Will see it in a minute.

ForEach

function removeDuplicates(array, key) {
    let lookup = {};
    let result = [];

    array.forEach(element => {
        if(!lookup[element[key]]) {
            lookup[element[key]] = true;
            result.push(element);
        }
    });

    return result;
}
console.log(removeDuplicates(employees, 'id'))
// [{ id: 1, name: 'John Smith' },{ id: 2, name: 'John Smith' },{ id: 3, name: 'John Smith' },{ id: 4, name: 'John Smith' },{ id: 9, name: 'John Smith' },{ id: 6, name: 'John Smith' } ]

Like the previous one, but instead of the basic for, using the .forEach. Like I mentioned, it could be a little bit slower due to the fact that forEach if using the callback approach, i.e. calling a function for every element.

Here is the promised performance comparison

I’m comparing the execution time for all the methods above for different array lengths – 10, 1 000, 10 000, 100 000 and 1 000 000.

All the tests below are executed in Google Chrome, with the same input data. The results are in milliseconds.

Conclusion

We can conclude from the performance charts, that the Filter, Filter + Set, For and Foreach are having really close performance results and you can’t go wrong with them. However, the ‘nested’ solutions – Filter + FindIndex and Reduce + Find are good for small arrays, but when the things become serious, they are too slow and can even freeze the browser/environment. This is the reason for excluding them from charts in 10 000 entries and more.