Walt's Fortran FAQ

Fortran has been around since 1956.  It has evolved over the years, and often times older codes have not evolved at the same rate.  Here are a few issues which are sometimes encountered when 'opening up the covers' of an old Fortran code.  Also some well-reasoned answers to some otherwise highly charged polical questions.
  1. Fortran Standards - why so many?
  2. Why should I always use IMPLICIT NONE?
  3. Stack vs static allocation of local variables
  4. My program requires static allocation to work - how do I fix this?
  5. What is a 'storage unit' - numeric or otherwise?
  6. Storage association (common, equivalence, caller/callee args)
  7. Dynamic memory allocation
  8. Fixed vs free source form
  9. Why should I use modules?
  10. The SAVE statement and global (COMMON/MODULE) data
  11. The old 'BLOCK DATA in a library' problem.
  12. What was Hollerith data?
  13. What are the POSIX Fortran bindings?
  14. Old Fortran vs Fortran-90 calling sequence
  15. Pointers in Fortran
  16. Function pointers in Fortran
  17. Calling C routines from Fortran
  18. Calling Fortran routines from C
  19. Why is Fortran code faster than C code (aliasing)

  20.  

Fortran Standards - why so many?

There have been four major ANSI Fortran Standards. These are Fortran-66, Fortran-77, Fortran-90, and Fortran-95.

Many older codes were developed during the reign of Fortran-66. This was a spartan standard. The fundamental features included:

Fortran-66 lacked some important features. As a response, all vendors implemented quite a variety of extensions. Users liked the extensions and this created a portability nightmare. Some of the most non-portable aspects were the lack of character data type, file handling (no OPEN/CLOSE or random access), and ways of dealing numerically with varied word sized machines. Also all array sizes were fixed at compile time (though the standard did not mandate static allocation) so there were many attempts to incorporate dynamic sizing into applications.

Fortran-77 was quite an advancement. It addressed many of the portability concerns including:

A number of other features that had been widely implemented were also incorporated. Although it still had flaws (like retaining the 6-character limit on variable names, and still not addressing dynamic memory management), Fortran-77 was a very successful standard.

MIL-STD 1753 - Standardized the following:

The MIL-STD features were almost universally incorporated into Fortran-77 compilers.  They were also fully incorporated into the Fortran-90 standard.

Fortran-90 introduced many new capabilities. Some of the major ones are:

Fortran-95 added a few capabilities from HPF (High-Performance Fortran), and also provided some clarifications and corrections to the Fortran-90 Standard.

Why should I always use IMPLICIT NONE?

Fortran has alway had default typing rules for variables. Names starting with I through N mean INTEGER, the rest are assumed to be REAL. This rule may have sounded like a good idea when it was developed, and meant that fewer cards would need to be punched or spilled on the floor. However it is now well known that default typing rules create many opportunities for bugs to creep into programs.

For example a typographical error in a variable name could lie hidden - silently injecting erroneous values into an expression. Or just as bad, a misspelled name could be receiving the results of some computation which is then never reused. IMPLICIT NONE comes to the rescue by requiring a conformance between variables explicitly declared, and their usage.

Realistically, many modern optimizing compilers can report some cases of 'used but not defined' and 'defined but never used' variables. However there are cases, especially in conditional expressions, where it is impossible for a compiler to give warning - so it must remain silent. Once again, IMPLICIT NONE comes to the rescue.

Note that even though the IMPLICIT statement has been a part of Fortran since Fortran-77, IMPLICIT NONE was standardized at Fortran-90.  It was also a part of MIL-STD 1753.  IMPLICIT NONE was almost universally implemented in Fortran-77 compilers.

Stack vs Static Allocation of Local Variables

In the early days of Fortran, most compilers statically allocated local data. Consider the following subroutine:
 
subroutine compute (arg1, arg2)
real scratch(1000)
:
return
end


A system which uses static allocation of data will permanently set aside 1000 numeric storage units for array SCRATCH.  This space is reserved for the exclusive use of subroutine COMPUTE.  If subroutine COMPUTE represents a routine in a code which is rarely, if ever, used, then this space is wasted.

Contrast this to an environment where data is dynamically allocated.  In this scenario, space for SCRATCH will only be set aside upon entry to the subroutine.  As soon as the subroutine has completed, the space is released and available for reuse.  This may sound expensive, and perhaps it was in the 1950s on machines with no index registers, but in reality is not.  Consider the case where the underlying run-time environment contains a data structure known as a stack for use by local data.  A register is set aside as a stack pointer.  This stack pointer is simply bumped forwards (or backwards) depending upon how much space is used.  All local memory addresses are then based as an offset to this pointer.

Contrary to popular belief, no Fortran standard has ever mandated static allocation. The earliest IBM compilers simply implemented static allocation. As other vendors wrote Fortran compilers they tended to use static allocation as well. However there were exceptions. Burroughs (now Unisys) Fortran has always placed local data on a stack. Cray started offering stack allocation by compiler option in the early 1980s as multitasking started to become popular.

So there are at least two major advantages to stack allocation:

  1. Smaller memory 'footprint' (which also improves cache utilization),
  2. Isolation of data during parallel invocations
Because of the above, most modern Fortran compilers use stack allocation as the default.

My program requires static allocation to work - how do I fix this?

To isolate the problem routine, try compiling half the routines with the -static flag and half without. Based on whether answers change or not, keep performing this binary search until you have found the bad routine. Note that there may be several bad routines.

Typically the problem is that one or more routines have local variables which are assumed to be the same between invocations of that routine. Consider the following subroutine which print a page count on a listing:
 

subroutine page_out ()
integer page_number
:
page_number = page_number + 1
print *, 'page ',page_number
:
return
end subroutine


The above routine has two problems. First, it assumes that page_number was magically initialized to zero prior to the first call. Second, it assumes that the values will be retained between invocations. Neither assumption has ever been a requirement of the Standard - though it usually worked on compilers which had static allocation as a default. With stack allocation, page_number will have stack trash as an initial value.

To fix the above, SAVE attribute should be specified to retain the value of page_number between calls. Second, we should give page_number an initial value. The following example shows this using a Fortran-90 style declaration:
 

subroutine page_out ()
integer, save:: page_number = 0
:
page_number = page_number + 1
print *, 'page ',page_number
:
return
end


Note that techically just by initializing the variable (page_number = ) in the declaration, or using a DATA statement, the declaration has the SAVE attribute. But spelling it out makes the usage obvious to the reader.

What is a 'storage unit' - numeric or otherwise?

As per Fortran standards going all the way back to Fortran-66, REAL, INTEGER, and LOGICAL data types are defined as using 1 'numeric storage unit'.  DOUBLE PRECISION and COMPLEX are defined as using 2 numeric storage units.

No guidelines or requirements are imposed as to how big, in terms of numbers of bits or bytes, a numeric storage unit is. This is intentional to allow Fortran to be easily implemented on a wide variety of hardware. These days one numeric storage unit tends to be 32-bits to accomodate the IEEE floating point standard. However in the past, I've used computers where a single numeric storage unit was 16, 18, 24, 32, 36, 60, and 64 bits. Even 48 bit numeric storage units are not unknown.

Note that even though DOUBLE PRECISION is required to occupy twice the storage as REAL, the standard does not require twice the precision in calculations.  Thus, even if only 1 additional bit were actually used, an implementation would meet the requirements of the standard.

Fortran-77 introduced the CHARACTER data type.  Intentionally, there is no relationship defined between character storage units and numeric storage units.  This is why storage association (equivalencing and so on) between the two is undefined in the Standard - even though it is commonly implemented as an extension by many compilers.

Storage association (common, equivalence, caller/callee args)

Storage association refers to techniques for overlaying a given area of memory with different names of different types. These techniques are not considered to be Good Programming. However they were a way to Get Things Done in early Fortran. A simple example is with the EQUIVALENCE statement:
 
subroutine junk
integer iarray(1000)
real rarray(1000)
equivalence (array,rarray)
: (use iarray here)
: (use rarray here)
return
end


Since integers and real elements each occupy one numeric storage unit, iarray(1) occupies the same memory location as rarray(1), iarray(2) occupies the same location as rarray(2), and so on.

In the olden days, especially with static allocation, it was common to take advantage of storage association to reuse memory. Thus, storage was overlaid in space, but not in time.  These days, between stack allocation of local data and dynamic memory management, there is little need to explicitly overlay memory in this way.

A second usage was to allow integer access to floating point (or other data) in order to get bit-level access to the data. In this case, the storage is associated in both space AND time.  This latter usage was especially common in Fortran-66 level code with packed Hollerith data. With the introduction of character data type in Fortran-77, most such code should have been thrown away years ago. In Fortran-90, the TRANSFER intrinsic allows bit-level data motion between different data types.

Storage association can also occur with COMMON blocks. It is actually legal to have a given common block described in multiple ways in multiple routines. For example, in the spirit of the above routine:
 

subroutine a ()
common /scratch/ iarray(1000)
:
end subroutine
subroutine b ()
common /scratch/ rarray(1000)
:
end subroutine b


In the above, the storage associated with the common block /scratch/ is shared by the two routines.

Last, a similar effect can be seen between a caller and a callee. Consider:
 

subroutine a ()
integer iarray(1000)
:
call b (iarray)
:
end subroutine a
subroutine b (rarray)
real rarray(*)
:
end subroutine b


In the above, routine A considers the storage as integer, and routine B considers it real.

Again, better techniques are available in modern Fortran compilers to make dependance on storage association obsolete.

Dynamic memory Allocation

Two mechanisms were added in Fortran-90 for dynamic memory management: Automatic arrays, and allocatable arrays.

Automatic arrays, (which date back to ALGOL-60...), are simply local arrays where the size is passed in. Upon invocation of the routine, storage is allocated as if on the end of a stack. The size is passed in via a dummy argument or through a global value in a common block or module. Here is a simple example:
 

subroutine sub (size)
integer size
real scratch_array(size)
:
return
end subroutine sub


Upon activation of the routine, the array is sized correctly. Then upon exit, storage is released for use by other routines. Thus there is no chance for memory leakage.

Likewise, arrays with the allocatable attribute may also have a variable size. However allocatable arrays are only allocated via the ALLOCATE statement. Additionally, they may be deallocated with the DEALLOCATE statement.
 

subroutine sub (size)
integer size
real,allocatable:: scratch_size(:)
:
allocate (scratch_size(size),stat=errno)
:
end subroutine sub


Note in the above example that ALLOCATE can also return an error status. This allows the program to handle an allocation error condition.

Also note that if an allocatable variable has local scope in the routine (i.e., it is not a global variable contained in a module), a DEALLOCATE is not needed at the end of the routine. The compiler is required, by Fortran-95, to automatically deallocate local allocatable arrays in order to prevent memory leakage.

Allocatable arrays can be made globally accessible by placing them in a module.  In this case, no garbage collection is possible.

Fixed vs free source form

Traditional fixed source form was oriented around 80 column punch cards. The last time I used a punch card was around 1982. (And even then under duress.)

One advantage of free form over fixed involves the potential of a typgraphical error not being detected by the compiler - even with IMPLICIT NONE. It is possible to have a variable name going over column 72 and getting truncated, yet still being a legal name. For example what if variable IVALUE accidentally went beyond the magic 72nd column and the 'VALUE' portion was treated as a comment. The compiler would use 'I' as the variable name and bad results would occur. With free format, this error can not occur.

Source code can be written so it can be compiled as both fixed and free by following a few simple rules:

  1. Use ! in column one instead of C and * for comments
  2. Use & in column 73 and then again in column 6 of the next line to do continuations
  3. Fix any 'significant blank' issues

Why should I use modules?

  1. Modules can contain both data and contained procedures
  2. Scope of module data and procedures may be specified as public or private
  3. The USE statement can control namespace problems by using renaming or only using selected items
  4. Modules can also contain specs for derived types and interfaces (both generic and otherwise.)
  5. None of the above can be done with INCLUDE

The SAVE statement and global (COMMON/MODULE) data

A little known (or implemented) feature of COMMON, and now MODULE, data, is that, even though it is global, it can still go out of scope. Consider the following program:
 
program test
common /block1/ a,b,c
:
call sub1
call sub2
:
end program
subroutine sub1
common /block1/ a,b,c
common /block2/ d, e, f
:
end subroutine sub1
subroutine sub2
common /block1/ a,b,c
common /block2/d, e, f
:
end subroutine sub2


In the above, common block /block1/ is defined in the main program and both callees. However /block2/ is only defined in the callees. If data needs to be shared between sub1 and sub2 via /block2/ there could be a problem. However if they are merely sharing scratch space there will not be a problem.

Why?

Well the Standards, all of them, allow /block2/ to go out of scope between the calls to sub1 and sub2. So that if the two subroutines need to share data, it may not be coherent. Either a SAVE statement should be used, or /block2/ should be declared at a point in the call tree where it won't go out of scope at the wrong time. The /block1/ common block was declared in the main program so never goes out of scope.

Module data is treated in a very similar fashion to COMMON data, and can also go out of scope. So there must be a SAVE statement in the module, or a USE statement in a program unit high enough in the call tree that problems are not encountered.

This concept was placed in the Standard because it allows yet another mechanism for overlaying data. This form of overlaying is rarely seen in modern implementations, but was quite common in the olden days.

The old 'BLOCK DATA in a library' problem.

Name the BLOCK DATA. Then, in the main program or one of the subprograms, insert an EXTERNAL statement and refer to the block data. E.g.:
 
block data bd
(insert commons and data initializations here)
end block data
program junk
external bd
:
end program junk


Note that the ability to name a block data was introduced in Fortran-77.

Note also that BLOCK DATA routines are becoming obsolete with modules. Module variables can be initialized at compile time just like any other data.

What was Hollerith data?

Stunningly, Fortran-66 had no character data type - only numeric types. In order to deal with characters, typeless constants, called Hollerith constants, were used. An example of a Hollerith constant is 3HABC - which specifies a 3 character constant with the value 'ABC' in it.

Since a Hollerith constant was typeless, it could be placed into any numeric data type without type conversion. Per the '66 Standard, Hollerith constants could only be used in 3 places:

  1. In DATA statements
  2. As actual arguments in a CALL statement - e.g., CALL SUB (3HABC)
  3. In FORMAT statements
Virtually all compilers extended the above to allow Hollerith constants to be used in expressions.

Of interest, consider the case where a data type could hold more than the number of characters specified in the Hollerith constant. In this case, the compiler was required to left-justify the characters and 'blank fill' the unused bits.  Note that 'zero-fill' variants, with both right- and left-justification, were common extensions to most compilers.

The only really portable use of Hollerith constants was to store 1 character per integer. Of course this was quite wasteful of memory because integers could typically hold from 3 to as many as 10 characters each. Considering the restricted memory sizes of the time, there was quite a bit of pressure to pack multiple characters into each integer. Then highly non-portable masking+shifting code was needed to extract/insert characters.

Thankfully the situation was rectified in Fortran-77 with the CHARACTER data type. Hollerith constants were moved to an appendix in the '77 Standard, and were completely gone in the '90 Standard.
 
 

What are the POSIX Fortran Bindings?


The Posix 1003.9-1992 Fortran bindings are a standardized set of library calls for making various low level requests of the operating system. There are dozens of calls available - documented on many systems in the intro_pxf man pages. Three of the most popular calls are PXFGETENV (get environment variables) and PXFGETARG (get command line arguments) calls, and the IPXFARGC (get argument count) function.

Some people dislike the Posix bindings because they are not 'Fortran-90-like'. The calls were standardized in the early 1990s when Fortran-77 was still prevalent. The committee therefore took the conservative approach that, with the exception of long external names, all calls had to be usable in a Fortran-77 environment.

For example, consider the case where a program needs to know the size of a given file.  The PXFSTAT routine is used.  However, first PXFSTRUCTCREATE must be called to create an appropriate data structure for return values.  A handle is passed back to the user for reference.  The user then calls PXFSTAT - giving the desired file name and the handle as input arguments.  The PXFSTAT routine updates the structure.  Then the user calls PXFINTGET to extract the desired field from the structure.  Finally, PXFSTRUCTFREE is called to release the structure.

Old Fortran vs Fortran-90 calling sequence

Pointers in Fortran

Two main pointer types:

Fortran-90 pointers
Cray pointers (non-Standard, but commonly used)

Function pointers in Fortran

The POSIX Fortran standard defines a pair of procedures called PXFGETSUBHANDLE and PXFCALLSUBHANDLE which are sufficient for simple single-argument calls.

If the POSIX Fortran routines are not available, or if more advanced calls are needed, a pair of simple C routines may be written to obtain the address of an external name, then call it.  Many compilers follow the convention that Fortran EXTERNAL names are passed by value.  So the C routine to return the address can be written simply as:
 

long get_address_ (void (*external)()) {
  return ((long) external);
}

 

 

Call the above with something like:
 

EXTERNAL :: MY_SUB
INTEGER(KIND=big_enough_for_pointer_kind) :: get_address, my_sub_address
:
my_sub_address = get_address (my_sub)

 

 

A second C routine can be written to call a subroutine - given a pointer to it.  The following passes two call-by-reference arguments, one integer and one real:
 

void call_sub_ (void (**external)(int*, real*), int *arg1, real *arg2) {
    (*external)(arg1, arg2);
}

 

 

To call this from Fortran:
 

:
CALL CALL_SUB (my_sub_address, arg1, arg2)

 

 
 
 

Calling C routines from Fortran

The conventions for calling C routines from Fortran date back to the original f77 compiler on 7th edition unix, and these conventions have formed a defacto standard in many environments. Since Fortran is case-insensitive, and unix systems tend to like things in lower case, the original compilers folded Fortran external names to lower case. Then, to distinguish the Fortran namespace from the C namespace, an underscore character was appended to the end of the Fortran name. So calling routine XYZZY would result in an external name of 'xyzzy_' - which is still a legal C name. Common block names followed similar conventions. SGI systems follow the above conventions.

For historical reasons, some systems diverge from the above. For example, in the Cray environment, when unix came along, existing Fortran compilers, libraries, and linkers were ported from the proprietary OS to Unicos. In order to ease the conversion, the naming conventions were not changed. On these systems the names are in upper case with no underscore characters. Other systems are known to place underscores before the name.

Argument passing is another place where problems lie. Fortran implementations generally, but not always, depend on call-by-reference. The address of the actual argument is passed by value to the callee, who must then dereference the argument. Since call by reference is easy to emulate in C, there are few problems passing numeric variables.

Character variables are problematic. Fortran character data has a length associated with each datum. In C, there is no such thing as a character string - only arrays of char. So the length must be passed in via some mechanism. The defacto standard is to add an additional actual argument for each character variable in the argument list containing the length of a character datum. The C callee can then use the extra value(s) to properly handle the strings.

Some implementations use other mechanisms to pass the necessary character string information. For example, some Cray implementations use a Fortran Character Descriptor (FCD) with both address and length passed in a single word. A special header file and macros are used to access the address and length. Again this dates back to pre-unix implementations carried forward into Unicos.

Fortran-90 and C++ further confuse issues.

Calling Fortran routines from C

Calling a Fortran routine from C is simply the opposite of calling C routines from Fortran. Generally one calls the Fortran name with the proper case and underscore convention, pass addresses of each of the arguments, and pass lengths of character strings.

Why is Fortran code faster than C code (aliasing)

In a C function, pointers are unrestricted.  That is, they can point to any location in memory without restrictions.  Also, multiple pointers can point to a single object.  And C uses pointers for many things - including passing arrays and data structures into sub-functions.  Lets imagine that we are cruising through some code that looks like the following:

       *b = *a+ 23;
       c = *a * 42;

A good optimizing compiler would like to dereference a from memory once, and use it in both expressions.  However if there is any possibility that a and b point to the same location in memory, this would cause erroneous results.

In Fortran, the need for pointers is greatly lessened by various language features.  Arrays and data structures may be created dynamically and passed into and out of subprograms without using user-visible pointers.  The compiler knows at compile time that objects with different names point to unique places in memory.  The only exception is a variable which has the target attribute.  So in a Fortran equivalent to the above code, the compiler is free to optimize the memory reference (i.e., keep the data in a register for reuse) unless a and b are targets.

A C protagonist may counter that it is easy to write the above code as:

      temp = *a;
      *b = temp +23;
      c = temp * 42;

However consider a non-trivial application (say 20-100k lines of code) where speed is important.  Since C requires user-visible pointers for so many things, there may be tens of thousands of instances where the compiler is forced to be conservative.  Is the C programmer really going to look for all those situations?  The Fortran programmer need never worry - he will always get good optimization.

C9x introduces a new restrict attribute for pointers - which says that the pointee is not aliased by other pointers.  Use of this attribute can help optimization by allowing the pointee to remain in a register for reuse.  But note that this is akin to locking a barn door after the animals have escaped.  The default action is wrong (for speed), and once again few programmers will have the desire to look for every case where restrict can be used.  The Fortran action is to go fast by default and make potential aliasing problems explicit via targets.

Page created August 29, 2000

Updated August 30, 2001