Remove duplicate values from Javascript array – How to & Performance comparison

Posted by

The purpose of this article is to share with you the best ways to remove duplicate PRIMITIVE(strings, numbers and booleans) values from an array in Javascript and compare their performance in terms of execution time.

If you want to learn how to remove duplicate OBJECTS from JavaScript array check this article.

SET

One of my favorite(and simple) approaches is to use the SET data structure.

If you are not familiar with it, SET is similar to the ordinary Array with a few differences. One of the them is that you are allowed to store only unique values there, rather than the array, where a duplicates are allowed.

You can read more about SET here – https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set

We can take advantage of this to remove all duplicates.

const arr = [1,2,3,4,5,5,5,5,5];

const uniqueArray = Array.from(new Set(arr)); 
// OR
const uniqueArray = [...new Set(arr)];
// result for both of them is [1,2,3,4,5]

The important thing here is that when creating the set by new Set(arr), the value returned is of type ‘SET’. We have to convert it back to array(if we want to) by using the Array.from method or the spread operator ‘…’

Filter

The second approach we can use it the javascript .filter method

const arr = [1,2,3,4,5,5,5,5,5];

const uniqueArray = arr.filter((el, idx) => arr.indexOf(el) === idx);

What we do here is iterating over the original array and for each element do a quick check. Is the first position of this element equal to the current element index. If they are equal, this means that this element is unique, i.e. returning true and the element will be added to the new array.

As you may already spot is that we have a nested iterating over an array. The first one is .filter and the next – .indexOf, both using a simple for loops behind the scene.

In case of a large array, this could became a problem as the time complexity is n^n

For/ForEach

And the last one is using the forEach/for loop and building the logic for checking duplicates by ourselves.

As I’m really curious if there is a performance differences between forEach and for, I will implement both solutions and compare their executing time.

  • ForEach
const arr = [1,2,3,4,5,5,5,5,5];

let lookup = {};
let uniqueArr = [];
  arr.forEach(function(el) {
    if(!lookup[el]) {
      lookup[el] = true;
      uniqueArr.push(el);
    }
 });
  • For
const arr = [1,2,3,4,5,5,5,5,5];

let lookup = {};
let uniqueArr = [];
for(let i=0; i<arr.length; i++) {
   if(!lookup[arr[i]]) {
      lookup[arr[i]] = true;
      uniqueArr.push(arr[i]);
   }
}

Yes, I know it looks a bit ugly .. 🙂

Something really important to mention here is that this solution is working as expected for array with elements of the same type. With elements of type string OR type number. Not with mixed ones. What I mean is that if we apply this approach to the following array [1,’1’], the result will be [1]. This is, because we are using the lookup object keys to check for existing, but the object keys are with type string.

If your case is with mixed types, you can extend it and separate the values in different objects depending of the type.

Here is the promised performance comparison

I’m comparing the execution time for all the methods above for different array lengths – 10, 1 000, 10 000, 100 000 and 1 000 000.

All the tests below are executed in Google Chrome, with the same input data. The results are in milliseconds.

Conclusion

As a conclusion from the performance charts, we can say that the SET is definitely the winner and fastest for our task. However, as you can see the difference between all of them is not so big and all four will be able to do the job to remove the duplicates blazingly fast.

Keep in mind that SET is a data structure released in ES6, Filter and ForEach in ES5. If you are working on a project with different version(older) of Javscript, you may need to use the old fashioned For way.

Good luck!