Running open-source software on Windows

7 tips for porting a Linux-targeted source code to Windows

Shoubhik R Maiti
CodeX

--

A Linux penguin symbol on the left, and arrow points from it to a Windows symbol
Open-source softwares developed on Linux can be ported to Windows (Images from Pixabay)

I am a chemistry student, but my first foray into programming was when I decided to go into computational chemistry. I had been using my old Windows laptop for as far back as I can remember. So naturally, I wanted to do my testing and preparatory work for my projects on my laptop.

And this is where I ran into a whole lot of problems. Most codes in my field are open-source codes that were written in Linux. These codes are almost always a pain to run on Windows (either the program does not compile, or there are no instructions on how to compile it). There is no inherent feature that the program needs that cannot be provided by Windows. It’s just that as they were written and tested on Linux, so Linux-only functions or system calls percolate into the source code. Some codes do provide a pre-compiled binary for Windows (NAMD for instance), but they are few and far between.

Cygwin/WSL is an option, but they are slow, particularly with disk I/O. After a lot of trying to use Cygwin, I gave up and dived into the source codes. So far, I have managed to modify and build two software (GAMESS and MRCC) using native compilers (Visual C++ and Intel) on Windows.

Why is it difficult to build on Windows?

A hand is coming out of the screen of a laptop carrying a error message windows
Trying to build an open-source software on Windows! (Image from Pixabay)

The problem of compiling open-source scientific softwares on Windows is nothing new. The problem lies partly with Microsoft itself and partly with the developer who writes this open-source softwares. Microsoft did not make it easy to get a compiling environment running on Windows. MS Visual Studio is supposed to be the main development environment but it only provides C/C++ compiler; what if you need Fortran? You have to either try to install Intel Visual Fortran, or one of the open-source alternatives like LLVM (flang) or GNU (gfortran) and all of them are a pain to setup. It is also difficult to use the command line (cmd.exe) because there is no consistency in argument parsing. On Linux, the shell parses the arguments based on a fixed set of rules and then feeds the separated arguments to the program called. On Windows, the command prompt simply takes the whole command and sends it to the program, and its up to the individual program to process the arguments. This means that, for example, if you enclose a text with spaces in double quotes, it might be treated as one argument by one software, and multiple arguments by another. All of this means that someone needs to be waste a lot of time becoming familiar with the peculiarities of Windows before coding for it. Windows also does not have a default integrated build system, like Linux, where building the software is sometimes as easy as running make install on the command line.

On the other hand, the developers writing the softwares are not entirely blameless — they use Linux only features, sometimes deliberately, which would not compile on Windows. This is unnecessary in most cases as the open-source codes I have come across are command line codes, and do not need OS specific features; the functionality provided by standard libraries is enough. Many developers refuse to even consider having a Windows build. The standard response to the question “Why doesn’t it work on Windows?” is “Why don’t you install Linux?”.

Despite all of these problems I believe that many open-source softwares can be compiled on Windows with minimal changes. This is because standard C/C++ and Fortran source code can be compiled on both Windows and Linux. Most of the time, low level functionality of OS (for Linux: unistd.h, for Windows: windows.h) is not necessary as I mentioned earlier.

I am not talking about porting softwares that are specifically for Linux. That is way more difficult and I am not yet that good of a programmer to hand out advice about that! No, I am only talking about the general purpose open-source code which is mostly a command line application and has probably been developed on Linux. These can be ported to run natively on Windows.

Porting a source code to Windows

1) Check the differences in compilers

If I understand correctly, the traditional compilers for Linux systems are provided by the GNU toolset. So for C/C++ the compilers are gcc and g++, for Fortran, it is gfortran. These compilers are almost always guaranteed to be available on Linux (or easily installed), so most developers write code that support GNU compilers. There are also LLVM and Intel compilers that you can install on Linux.

For Windows, the native C/C++ compiler is the Microsoft’s own Visual C/C++ compiler. It can be installed as a part of Visual Studio. (I have heard that it can be installed separately but I am not sure). However, there is no Fortran compiler provided by Microsoft. Intel provides Visual Fortran (along with its own C/C++ compiler) which can generate native executables for Windows. (There are also ports of GNU and LLVM compilers available.)

Each of these compilers have own peculiarities. Some trends that I have noticed:

  1. The compiler command line options are all different. The argument format is also different. Linux compilers would be run in command line as compiler -argument1 option1 -argument2 option2 . For Visual C++ and Intel compilers, the default style is compiler /argument1:option1 /argument2:option2 . (However, writing compiler -argument1:option1 -argument2:option2 is sometimes allowed.)
  2. On Linux, Fortran compilers run the preprocessor by default, but for Windows, it needs to be explicitly requested by the argument /fpp .
  3. On Linux, most Fortran compilers export symbols in lowercase with an underscore at the end, following GNU convention. So, subroutine name mysubrt would be exported as mysubrt_. On Windows, Intel Fortran exports symbols in uppercase with no underscore, i.e. mysubrt would become MYSUBRT . This is only a problem if you are trying to interface C/C++ with Fortran. Many codes which do this fail to compile on Windows for this specific reason. (Windows port of gfortran follows GNU convention however)

2) Signal handling is limited in Windows

C and C++ codes have a feature to intercept “signals” which are messages from the OS to the software that something unexpected has happened (e.g. Ctrl+C or Ctr+Break was pressed, or a division by zero happened, or the program tried to access a memory location that is not accessible to it etc.). The point of signal handling is for programs to have some control over how it responds to exceptions and errors.

Signal handling is done by installing special functions as handlers. Those functions will be called when the signal is received by the program. For instance, if I install a handler for SIGINT then the program will call the handler function when Ctrl+C is pressed, instead of shutting down the program, which is the default action.

The C standard library defines 6 signals — SIGABRT (abnormal termination), SIGFPE (floating point error), SIGILL (illegal instruction), SIGINT (ctrl+c signal), SIGSEGV (illegal storage access) and SIGTERM (termination request) . These are supported by both Linux and Windows. Windows also supports SIGBREAK (Ctrl+Break) additionally.

However, on Linux, there are a whole slew of other signals (read more here) that can be intercepted by the code. If open source codes include the handling of those signals then they cannot be compiled on Windows. The solution is to remove the handling of those signals (because they would never be sent by Windows OS) and intercept other signals that might be needed in Windows (from the list above). Fortunately, anything other that the standard signals are rarely needed in academic open source programs.

Also, on Linux signal handlers are set by struct sigaction. The function that handles the signal is set by sigaction.sa_handler . On Windows, only the basic C signal() function is present.

For example, the following code for Linux sets handle_sig as the signal handler for SIGINT:

void handle_sig(int signal) {
...
}
struct sigaction act; //act is instance of the structure sigactionact.sa_handler = &handle_sig; // handle_sig is the handler in actsigaction(SIGINT, &act, NULL);/* Here, structure act is installed as the sigaction structure for SIGINT, which means handle_sig becomes the signal handler*/

On Windows it would become:

void handle_sing(int signal) {
...
}
signal(SIGINT,handle_sig);

That’s it for Windows: short and simple.

3) Beware of missing libraries

On Linux, most libraries, when installed, put their paths into LD_LIBRARY_PATH or some other environment variable. When a source code that depends on that library is compiled, it looks for the library in the LD_LIBRARY_PATH or that libraries specific env. variable.

However, this is not the case in Windows. Often, libraries are provided as a .lib and .dll files in an archive. There is usually no installation process, so environment variables are not modified; it is just extracting an archive containing the files.

In those cases, the compiler (actually, the linker) would not be able to find those libraries, and the compilation would fail. For Visual C++ or Intel compilers, the LNK2019 Unresolved external symbol is often caused by this.

The solution is to point the compiler (or linker) to the library file (.lib). Most open source softwares compile on the command line. For Visual C++ or Intel compilers, the path to the library file has to be added to the environment variable LIB or provided to the argument /libpath: to the linker. During the linking process, the names of the library files also have to be included.

This is quite complicated and the exact command you need to use would depend on the libraries you need and the build system you are using.

For example, if I am compiling a Fortran program named calculate.f90 and I am using the subroutine mysub provided by a separate library mylibrary.lib like this:

program calculateinteger :: i,j  !some variables!some code herecall mysub(i,j) !external subroutine called!some more code hereend program calculate

Then I have to link like this on Intel Fortran:

ifort calculate.f90 /link C:\path\to\mylibrary.lib

Or, more simpler:

ifort calculate.f90 C:\path\to\mylibrary.lib

In the first case, the library is pointed to the linker, in the second case, the compiler recognizes the file as a library, and automatically passes it to the linker.

Without the library, the linker will throw Unresolved external symbol error.

If you are using GNU compilers, they do not use Microsoft’s link.exe linker, so the above are not applicable.

4) POSIX-only header files

This is the last nightmare that you might occasionally come across while porting C/C++ code. Fortran does not really have header’s like C, so this will not be applicable there. This is one of the reasons porting Fortran code is easier.

OS-specific header files in C/C++ expose core functionalities of the OS and allow very low-level operations. Unfortunately, this means that they are not really portable as Windows and Linux kernels do things differently. Fortunately though, this type of functionality is rarely required, and often there are almost perfect equivalents between Linux and Windows.

For example, in Linux the header file unistd.h provides the function usleep(time) which will pause the process thread invoking the function for time microseconds.

#include <stdio.h>
#include <unistd.h>
int main(){// some code hereusleep(1000); // pauses the program for 1000 microseconds i.e. 1 ms// after 1 ms, the code starts here again}

If you attempt to compile this on Windows, the compiler will give an error as the unistd.h header does not exist on Windows.

The solution* would be to replace it with one of the functions provided by Windows C headers. A quick google search will tell you that on Windows, the Sleep() function does a similar job, and does not require any header. However, Sleep(time) pauses for time miliseconds. So, you can modify it to something like this:

#include <stdio.h>int main(){// some code hereSleep(1); // pauses for 1 ms// resumes after 1 ms}

*Even though you could solve missing headers by going through the source one by one, it’s far easier to use MinGW. It basically provides the translations of Linux functions to Windows functions, and provides that as the header. The programs compiled by MinGW are completely native Windows programs, so you avoid the problem of having to replace missing headers.

5) Look at the compiler errors/build system errors

Most of the time, the compiler errors are very informative about what went wrong. Since you are not writing a new code, only porting a code, it is almost always a missing library/missing object file (i.e. Unresolved externals ) that causes the error.

In some rare cases though, there are macro name collision. Macros are some instructions that are read by the C or Fortran preprocessor which tells them how to modify the source file before actual compilation. For example, if you are compiling on Windows almost all compilers define the macro _WIN32 . This allows a programmer to write Windows specific code which is compiled only when _WIN32 is defined.

#if defined _WIN32// Some C++ code for Windows#else// C++ code for other systems#endif

Anyways, software developers sometime use their own custom macros, which are defined through a command line argument. Let’s say that I developed an open-source software on Linux, and used a macro named _WIN32 which would turn off the compilation of a certain part of the code. (It’s a made up example, no one would do this!) Now, on Linux, _WIN32 is not defined, so it is treated as a user-defind macro. When you attempt to compile that source on Windows, _WIN32 is always defined, so that part of the code will never be compiled.

This type of macro name collision is very difficult to detect. Most of the time, the compiler won’t even complain of anything going wrong, and will just compile the program. Then when the program is run, there will probably be a multitude of errors. Sometimes, if the user-defined macro has the same name as one of the system’s pre-defined macros, you might get an error in a header file.

Build system errors will also be instructive in knowing what is going wrong.

6) Look out for shell invocations

Sometimes, programmers use shell commands from within a source code, mainly to perform file system operations. This is provided in C/C++ by system() function and in Fortran by system() subroutine or execute_command_line subroutine.

For example, this code in C will copy file1 to dir directory in Linux:

#include <stdlib.h>int main(){
system("cp file1 dir/");
return 0;
}
// works on Linux

Generally, these type of shell calls are rare. Obviously on Windows, the same code would not work because system() invokes the Windows command prompt (cmd.exe) which does not recognize the cp command. Additionally, the file system separator is backslash in Windows, which needs to be escaped in C.

#include <stdlib.h>int main(){
system("copy file1 dir\\");
return 0;
}
// works on Windows

7) Always test

So, after all these modifications, you finally managed to get the software to compile. Does that mean the software works? No, just because the compiler did not throw any errors does not mean the final executable works.

Fortunately, most academic open-source softwares come with tests. These are sample calculations you can run to check whether your software works, and whether it gives the correct numerical results within the acceptable margin.

You should always run as many of these tests as you can, to ensure that the software you built is actually running as intended.

Is it worth it?

Is all of this trouble to port a software worth it? Many people would recommend to switch to Linux, or install dual-boot. Despite several advances, I still don’t believe Linux is suitable for the common end-user out here. A whole lot of utility softwares still do not run on Linux, and only run on Windows and Mac OS X. It is just more convenient for me to use Windows, as I use it for everything else. (Maybe you disagree, and that’s okay, because you should choose a OS which is most convenient for you.)

So, I believe it is worth the time to attempt to port open-source codes to Windows. Most of the time, it would be less of a hassle than installing Linux.

Thanks for reading! Feel free to leave comments or questions in the response.

--

--

Shoubhik R Maiti
CodeX
Writer for

PhD student in computational chemistry. Interested in theoretical chemistry, programming and data science.