How to compare text file lists in Julia

Say you have two lists, each in their own text file. Here’s how to find the mutually exclusive items, as well as the overlap.

Ron Erdos
Updated March 17, 2024
Tested with Julia version 1.10.2

Let’s compare the countries in the EU and NATO

Say you have two lists, each in their own text file, and you want to compare them.

For example, say you have a list of all the members of NATO in one text file named nato.txt, and all the members of the European Union (EU) in another called eu.txt.

Let’s say you want to find out:

  1. Which countries are in the EU but not NATO?
  2. What about the other way round—which countries are in NATO but not the EU?
  3. Which countries are in both NATO and the EU?

Let’s dive in.

For this example, our text files will have a different country on each line.

Here’s nato.txt in its entirety:

Albania
Austria
Belarus
Belgium
Bulgaria
Canada
Croatia
Czech Republic
Denmark
Estonia
Finland
France
Germany
Greece
Hungary
Iceland
Italy
Japan
Latvia
Lithuania
Luxembourg
Mexico
Montenegro
Netherlands
North Macedonia
Norway
Poland
Portugal
Romania
Slovakia
Slovenia
Spain
Sweden
Turkey
United Kingdom
United States

Note that at the time of writing, this list includes four future NATO members: Austria, Belarus, Japan, and Mexico.

And here’s eu.txt in full:

Austria
Belgium
Bulgaria
Croatia
Cyprus
Czech Republic
Denmark
Estonia
Finland
France
Germany
Greece
Hungary
Ireland
Italy
Latvia
Lithuania
Luxembourg
Malta
Netherlands
Poland
Portugal
Romania
Slovakia
Slovenia
Spain
Sweden

Now for the fun part—writing our Julia code.

We’re going to put our Julia script in the same folder as our two text files—I’ll explain why in just a minute.

Which countries are in the EU but not NATO?

We can establish which countries are in the EU but not NATO with the following Julia code:

using DelimitedFiles
const DIR = @__DIR__

# Read text files into Julia variables
const NATO = readdlm("$DIR/nato.txt", '\n')
const EU = readdlm("$DIR/eu.txt", '\n')

const EU_BUT_NOT_NATO = setdiff(EU, NATO)
println(EU_BUT_NOT_NATO)

We get:

Any["Cyprus", "Ireland", "Malta"]

Okay, so that’s our answer. Cyprus, Ireland and Malta are in the EU but not NATO.

Let’s do a quick code walkthrough:

using DelimitedFiles This tells Julia to use the DelimitedFiles package. We don’t need to install it because it ships with Julia out of the box.

const DIR = @__DIR__ Here we create a constant named DIR which represents the directory containing our Julia script. (By convention, constants are given all-uppercase names.)

By keeping the text files in the same directory as the script, it’s trivial to reference them, as per the next two lines of code.

Note that the value of our constant is a macro built in to Julia, you don’t need to install anything for it to work.

const NATO = readdlm("$DIR/nato.txt", '\n') In this line, we create another constant, NATO, where we’ll store a matrix of all the NATO countries as listed in our text file.

We’re using the readdlm() function from the DelimitedFiles package, and we’re using two arguments.

The first is the text file we want Julia to read.

The second argument is '\n', which tells Julia that our text file has a new item on each line—delimited by newlines, in other words. (\n is the regex for newline.) If we’d left this argument out then Julia would have defaulted to using whitespace as a delimiter, which means multi-word country names like United States will end up as two items: United and States, which is obviously not what we want.

The eagle-eyed among you might have noticed that we used double quotes to wrap the text file path ("$DIR/nato.txt"), but single quotes to wrap the newline character ('\n'). This is because the text file path is a String, which requires double quotes in Julia, and the newline character is, well, a Character, which requires single quotes.

const EU = readdlm("$DIR/eu.txt", '\n') Same deal here, but this time for the EU.

const EU_BUT_NOT_NATO = setdiff(EU, NATO) Here we’re creating yet another constant, this one will hold a vector of the list of countries that are in the EU but not NATO.

We’re using Julia’s built-in function setdiff(), where the order of the arguments makes a difference. By listing the EU first (setdiff(EU, NATO)), we’re asking Julia for the countries that are in the EU but not NATO—rather than the other way around.

println(EU_BUT_NOT_NATO) Finally, we print our results to the terminal. You could easily write this to a text file, CSV or TSV though.

What about the other way round—which countries are in NATO but not the EU?

All we have to do here is add an extra couple of lines to the end of our code:

const NATO_BUT_NOT_EU = setdiff(NATO, EU)
println(NATO_BUT_NOT_EU)

All we’ve done here is to create a new constant, NATO_BUT_NOT_EU which stores the value of setdiff() but with the arguments the other way round: NATO first, EU second.

We get:

Any["Albania", "Belarus", "Canada", "Czech Republic", "Iceland", "Japan", "Mexico", "Montenegro", "North Macedonia", "Norway", "Turkey", "United Kingdom", "United States"]

A lot of those countries aren’t even in Europe, so that makes sense. And we know the UK used to be in the EU, but Brexited stage-right.

Which countries are in both NATO and the EU?

Again, all we need to do is add two extra lines to the end of our code:

const BOTH_NATO_AND_EU = intersect(EU, NATO)
println(BOTH_NATO_AND_EU)

We get:

Any["Austria", "Belgium", "Bulgaria", "Croatia", "Czech Republic", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Italy", "Latvia", "Lithuania", "Luxembourg", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Sweden"]

How many is that? Let’s have Julia count by adding these lines to the end of our code:

const NUMBER_IN_BOTH = length(BOTH_NATO_AND_EU)
println(NUMBER_IN_BOTH)

Now, instead of just the list of countries that are in both NATO and the EU, we also get their count:

24

How very Jack Bauer-esque.

Of course. There are 27 EU countries and just three aren’t in NATO (Cyprus, Ireland, Malta; we learned this above), which leaves us with an overlap of 24 countries.

So there you have it—with just a few lines of code, you can make (hopefully) useful comparisons with your text file lists using Julia. Thanks for reading!

Get Julia tips in your inbox a few times per year. Unsubscribe anytime.

Despite being an amateur at Julia programming and having no prior web-scraping experience, a JuliaSchool tutorial helped me learn enough to make a contribution to an open-source project in the Julia community using these tools, all in the course of a single afternoon. Thanks JuliaSchool! - Joe from Germany