Update features documentation for -O2/-OC

2026-07-10 01:53:32 +00:00 · 2016-10-31 18:51:38 +00:00 · 2016-10-31 18:51:38 +00:00 · 6a9861355f
commit 6a9861355f
parent 4f74364b59
6 changed files with 134 additions and 356 deletions
--- a/ReadMe.md
+++ b/ReadMe.md
@ -171,10 +171,6 @@ See also [Features](/doc/Features.md).

 See [History](/doc/History.md).

-## Roadmap
-
-See [Roadmap](/doc/Roadmap.md).
-
 ## Contributors

 Joseph Templ developed ofront as a tool to translate Oberon-2 programs into semantically equivalent
--- a/doc/BasicTypeSize.md
+++ b/doc/BasicTypeSize.md
@ -1,172 +0,0 @@
-## Cross Platform Compatibility and Basic Type Sizes in Vishap Oberon
-
-###### Abstract
-
-Vishap Oberon needs to support 32 and 64 bit machine architectures. 16 and
-possibly 8 bits would be good too.
-
-Currently Vishap Oberon has different INTEGER, LONGINT and SET sizes on 16
-and 32 bit architectures. While this enables memory management code to use
-LONGINT on all architectures, it breaks library and user code which makes
-assumptions about type sizes.
-
-The goal is to specify changes to the Vishap compiler and library to allow C
-code generation for multiple machine architectures without breaking existing
-code, and to allow serialized data to be interchangeable between machine
-architectures.
-
-###### Motivation
-
-Current type sizes are loosely specified and vary between implementations. There
-are conflicting general assumptions, for example: that LONGINT is large enough
-to contain any machine address; but also that LONGINT always take 32 bits when
-serialised to files. (See Oakwood guidelines appendix A 1.2.5.4.)
-
-The compiler has ended up with a number of INTEGER types, each with its own
-set of code to handle declaration, access, storage etc. There is a good
-opportunity to refactor and simplify the current duplicated code.
-
-Neither C's basic types, nor Oberon's are fixed in size. Yet for cross platform
-compatability we need fixed size types.
-
-###### Basis of implementation - integers and sets
-
-In the generated C code we use these types for all integer and set variables:
-
-| Unsigned    | Signed      | Sets   |
-| ----------- | ----------- | ------ |
-| INTEGER_U8  | INTEGER_S8  | SET_8  |
-| INTEGER_U16 | INTEGER_S16 | SET_16 |
-| INTEGER_U32 | INTEGER_S32 | SET_32 |
-| INTEGER_U64 | INTEGER_S64 | SET_64 |
-
-SYSTEM.H uses conditional compilation to derive these types from the types
-available in the C compiler we are building on.
-
-Then, with suitable compiler options we control the mapping of compiler types
-to these C types.
-
-There are three strategies that clients may wish to use:
-
-1) Emphasizing compatability with serialised data and existing code. Here
-   we fix Oberon type sizes across platforms, and introduce a new LONG64 type
-   as follows:
-
-| Oberon type | Size             |
-| ----------- | ---------------- |
-| BOOLEAN     | 8  bits          |
-| BYTE        | 8  bits unsigned |
-| SHORTINT    | 8  bits signed   |
-| INTEGER     | 16 bits signed   |
-| LONGINT     | 32 bits signed   |
-| SET         | 32 flag bits     |  
-| LONG64      | 64 bits signed   |
-
-   This gives a set of sizes that are available on all platforms (even SDCC
-   supports 64 bit ints), and which have fixed characteristics (e.g. the size of
-   character array sufficient to support any LONGINT values is fixed.)
-
-   Note that these sizes match current Vishap Oberon behaviour on x86.
-
-2) Emphasizing performant maxima. Here we make e.g. LONGINT the largest
-   efficient size available. On x86 we stick with the sizes as above, but for
-   x64 we make changes to INTEGER, LONGINT and SET as follows:
-
-| Oberon type | Size on x64      |
-| ----------- | ---------------- |
-| INTEGER     | 32 bits signed   |
-| LONGINT     | 64 bits signed   |
-| SET         | 64 flag bits     |  
-
-3) Supporting system code, especially memory management.
-
-   With SYSTEM imported, we extend the parsing of type INTEGER to accept a
-   subsequent qualifier which may be U8, S8, U16, S16, U32, S32, U64, S64 or
-   ADDRESS.
-
-   Thus the type `INTEGER ADDRESS` takes over the role of `LONGINT` in existing
-   memory management code. The compiler will map `INTEGER ADDRESS` to the
-   relevant `INTEGER_U32` or `INTEGER_U64` in generated C code.
-
-   Additionally the fixed size qualifiers U8, S8, U16, etc. allow the writing of
-   Oberon source code that generates the same C code regardless of compilation
-   options used.
-
-###### Cross platform libraries
-
-Many integral input parameters are currently coded as LONGINT with the intention
-of accepting any size of integer. E.g. Texts.WriteInt. All such code needs
-upgrading to accept LONG64 with implementation changes where necessary to
-account for the larger values. Boring, but straightforward.
-
-Some integral output parameter are currently coded as `VAR LONGINT`, for example
-the integer value field `i` in RECORD type `Scanner`. This is a problem:
-
-Assuming scenario 1 - LONGINT is always 32 bits.
-
-  - If retained as LONGINT, Scanner won't be able to handle 64 bit integers.
-  - If changed to LONG64, existing code will compile with type compatibility
-    errors.
-
-So neither option is possible on its own.
-
-The simplest workaround is to add a new field `l` and a new scanner class
-Long64 (similar to the pair of REAL and LONGREAL values already in Scanner).
-
-Existing code will continue to work with values in the 32 bit range (which is
-OK, because that's all the existing code can generate). New code can allow for
-thye LongReal case.
-
-(Ugly but workable).
-
-Oakwood says that INTEGER must be stored as 2 bytes little endian, so Files.Mod
-must use 16 bits on file for Files.ReadInt and Files.WriteInt. So what happens
-in scenario 2 above? Since INTEGER is 32 bits in scenario 2, it would be
-necessary to call Files.WriteLInt Files.ReadLInt. This is not obvious, and will
-need the coder to work around the apparent type incompatibility.  
-
-If only the type compatibility of passing a smaller integer variable to a larger
-value parameter also worked for a larger var parameter.
-
-Would this be possible?
-
-e.g.
-
-```Modula-2
-  PROCEDURE p(VAR x: LONGINT); BEGIN ... END p;
-
-  PROCEDURE q;
-  VAR r: INTEGER;
-  BEGIN p(r) END q;
-```
-
-q passes an `INTEGER` to the `VAR x: LONGINT` parameter of p. Normally this
-would be a type compatability error.
-
-If we want to defer value range checking until runtime, the compiler would have
-to behave as if q was written with a temp LONGINT variable like this:
-
-```Modula-2
-  PROCEDURE q;
-  VAR r: INTEGER; temp: LONGINT;
-  BEGIN p(temp); r := SHORT(temp) END q;
-```
-
-Not simple enough.
-
-
-###### IMPORT SYSTEM
-
-With SYSTEM imported, we allow the type INTEGER to be followed by a size and
-sign specification consiting of a letter (U for unsigned or S for signed)
-followed by a numeric bit count which may be 8, 16, 32 or 64. Additionally
-INTEGER may be followed by the word ADDRESS to request an unsigned integer type
-of the same size as a machine address.
-
-Thus we could define
-
-###### Not supported
-
-This solution does not seek to handle architectures such as the 8086/80286 where
-a generalised address is not a single numeric value. TopSpeed Modula handled
-this nicely, but we don't go that far.
--- a/doc/Features.md
+++ b/doc/Features.md
@ -1,25 +1,130 @@
-#### (Work in progress)
+### Features
+
+#### 32 bit and 64 bit systems vs integer, set and address size.
+
+The Oberon language specification sets explicit lower bounds on the maximum and minimum
+values supported by SHORTINT, INTEGER and LONGINT, and the maximum number of items supported
+by SET.
+
+Most Oberon systems implemented these lower bounds, however a few more recent systems allow
+wider ranges of values.
+
+While it may seem safe to compile code developed on earlier systems with the newer, larger
+integer and set types, it is not. Some examples:
+
+ - Code that uses MIN(INTEGER), MAX(INTEGER) etc. as a flag values will run into problems if
+   it tries to store the flag value to a file using standard library functions. The Oakwood
+   guidelines specify that INTEGER be stored in 16 bits on file regardless of it's size in
+   memory*.
+ 
+ - Code that assumes that INTEGER values wrap around at known values will fail. For example
+   i: SHORTINT; ... i := 127; INC(i); will produce -128 on original systems, but +128 on 
+   systems with a larger SHORTINT representation. 
+   
+ - Bit manipulation code that uses SYSTEM.VAL to access parts of values will access the
+   wrong number of bits. For example, the implementation of REAL and LONGREAL library functions
+   use SYSTEM.VAL(SET, realvalue) to access and change the sign, mantissa and exponent of REALs.
+   
+Therefore we provide compilation options to select the representation of SHORTINT, INTEGER, LONGINT and SET.
+
+\* It makes sense for Oakwood to insist on fixed sizes for the standard types as this is a pre-requisite
+for stable file exchange between different builds of applications, and between different applications following a standard file format.
+
+
+#### Compiler options for integer and set sizes.
+
+The -O2 and -OC compiler options select nbetween the two most commonly used integer and set
+type implementations.
+
+| Type     | -O2 option (default) | -OC option |
+| ---      | ---                  | ---        |
+| SHORTINT | 8 bit                | 16 bit     |
+| INTEGER  | 16 bit               | 32 bit     |
+| LONGINT  | 32 bit               | 64 bit     |
+| SET      | 32 bit               | 64 bit     |
+

 The following Oberon types are independent of compiler size:

-| Types          | Size   |
-| -----          | -------|
-| CHAR, SHORTINT | 8 bit  |
-| REAL           | 32 bit |
-| LONGREAL       | 64 bit |
+| Types    | Size   |
+| -----    | -------|
+| REAL     | 32 bit |
+| LONGREAL | 64 bit |
+| HUGEINT* | 64 bit |
+| CHAR**    | 8 bit  |

-The following type sizes follow the built compiler size:
+\* The additional type HUGEINT is predefined as a 64 bit integer, providing 64 bit support even
+in -O2 compilations.

-| Types          | 32 bit builds | 64 bit builds |
-| -----          | ------------- | ------------- |
-| INTEGER        | 16 bit        | 32 bit        |
-| LONGINT, SET   | 32 bit        | 16 bit        |
+\** No built-in support is provided for the UTF-16 or UCS-2 Unicode encodings. UTF-8 is the recommended Unicode encoding for text. 
+ - 16 bits has been insufficient for the Unicode character repetoire for at least 15 years. 
+ - Writing systems often require more than one unicode codepoint to represent a single character (and what constitutes a character can vary according to context).
+ - UTF-8 is now widely used.

-HALT/exit code has been simplified. Exit now just calls the system exit API rather than calling the kill API and passing our own process ID. For runtime errors it now displayes the appropriate error message (e.g. Index out of range).
-
-Compilation errors now include the line number at the start of the displayed source line. The pos (character offset) is still displayed on the error message line. The error handling code was already doing some walking of the source file to find start and end of line - I changed this to walk through the source file from the start identifying line end positions, counting lines and caching the position at the start of the last error line. The resultant code is simpler, and displays the line number without losing the pos. The performance cost of walking the source file is not an issue.
+See [UTF-8 Everywhere](http://utf8everywhere.org/) for much more background on this recommendation.


- - In his latest specs (around 2013) Wirth removed the 'COPY(a, b)' character array copy procedure, replacing it with 'b := a'. I have accordingly enabled 'b := a' in voc as an alternative to 'COPY(a, b)' (COPY is still supported.).
+#### SYSTEM.Mod support for fixed size integers and sets.
+
+SYSTEM.Mod includes the following additional types:
+
+| Type         | Size   | Range  |
+| ---          | ---    | ---    |
+| SYSTEM.INT8  | 8 bit  | -128 .. 127 |
+| SYSTEM.INT16 | 16 bit | -32,768 .. 32,767 |
+| SYSTEM.INT32 | 32 bit | -2,147,483,6478 .. ‭2,147,483,647‬ |
+| SYSTEM.INT64 | 64 bit | -‭9,223,372,036,854,775,808 .. ‭9,223,372,036,854,775,807‬ |
+| SYSTEM.SET32 | 32 bit | 0 .. 31 |
+| SYSTEM.SET64 | 64 bit | 0 .. 63 |
+
+Integer literals are recognised within the full signed 64 bit integer range MIN(SYSTEM.INT64) to MAX(SYSTEM.INT64). Additionally the parsing of hex notation allows negative values to be entered as a full 16 hex digits with top bit set. For example, -1 may be represented in Oberon source as 0FFFFFFFFFFFFFFFFH.
+
+
+#### The SHORT and LONG functions
+
+SHORT() of LONGINT and INTEGER values, and LONG() of SHORTINT and INTEGER values behave as
+originally specified by Oberon-2.
+
+In -O2, where LONGINT is 32 bits, LONG() now accepts a LONGINT value returning a HUGEINT value.
+
+In -OC, where SHORTINT is 16 bits, SHORT() now accepts a SHORTINT value returning a SYSTEM.INT8 value.
+
+
+#### Pointers and Addresses
+
+Most Oberon systems have implicitly or explicitly assumed that LONGINT is large enough to hold
+machine addresses. With the requirement to support 32 bit LONGINT on 64 bit systems, this is no 
+longer possible.
+
+The type SYSTEM.ADDRESS is added, a signed integer type equivalent to either SYSTEM.INT32 or SYSTEM.INT64 according to the system address size.
+
+The following SYSTEM module predefined functions and procedures now use SYSTEM.ADDRESS instead of LONGINT.
+
+*Function procedures*
+
+| Name                 | Argument types                   | Result Type    | Function |
+| ----                 | ---                              | ---            | ---            |
+| SYSTEM.ADR(*v*)      | any                              | SYSTEM.ADDRESS | Address of argument |
+| SYSTEM.BIT(*a*, *n*) | *a*: SYSTEM.ADDESS; *n*: integer | BOOLEAN        | bit *n* of Mem[*a*] |
+
+*Proper procedures*
+
+| Name                         | Argument types                                                    | Function        |
+| ----                         | ---                                                               | ---             |
+| SYSTEM.GET(*a*, *v*)         | *a*: SYSTEM.ADDRESS; *v*: any basic type, pointer, procedure type | *v* := Mem[*a*] |
+| SYSTEM.PUT(*a*, *x*)         | *a*: SYSTEM.ADDRESS; *x*: any basic type, pointer, procedure type | Mem[*a*] := *v* |
+| SYSTEM.MOVE(*a0*, *a1*, *n*) | *a0*, *a1*: SYSTEM.ADDRESS; *n*: integer                          | Mem[*a1*..*a1*+*n*-1] := Mem[*a0*..*a0*+*n*-1] |
+
+Note that the standard function LEN() still returns LONGINT.
+
+
+#### Runtime error and exit handling
+
+When passed FALSE, ASSERT displays the message 'Assertion failure.'. If a second, nonzero value is passed to ASSERT it will also be displayed. ASSERT then exits to the OS passing the assert value or zero.
+
+HALT displays the message 'Terminated by Halt(n)'. For negative values that correspond to a standard runtime error a descriptive string is also printed. Finally Halt exits to the oprerating system passing the error code.
+
+Bear in mind that both Linux and Windows generally treat the return code as a signed 8 bit value, ignoring higher order bits. Therefore it is best to restrict HALT and ASSERT codes to the range -128 .. 127.
+
+A client application may register a halt handler by calling Platform.SetHalt(p) where p: PROCEDURE(n: SYSTEM.INT32). This procedure will be called before Halt displays it's message. The procedure may suppress the Halt message by calling Platform.Exit(code: INTEGER) directly.

- - Oberon code often writes to Oberon.Log expecting the text to appear on the screen. While voc has an Oberon.DumpLog procedure, I looked into making the behaviour automatic. Interestingly the voc source declares the Text notifier constants replace, insert and delete, but omits implementation of the notifier calls. The implementation turned out to be very little code, and I have used it to echo all text written to Oberon.Log to the console. This has the advantage over DumpLog that text is written immediately rather than only when DumpLog is called, and allows existing program source to work unchanged.
--- a/doc/Roadmap.md
+++ b/doc/Roadmap.md
@ -1,150 +0,0 @@
-
-#### (Work in progress)
-
-#### Machine size issues
-
-I don't see any really good solutions to different machine sizes. Existing code,
-such as the libraries, assumes that INTEGER is 16 bit and LONGINT is 32 bit and
-so is broken on 64 bit builds of voc.
-
-Could the implementation of INTnn types help? It would not solve (for example)
-the need for a type that always matches address size. Nor would it provide
-unsigned types. Implementation of low level memory management needs both.
-
-Wirth's latest spec includes a BYTE type (not SYSTEM.BYTE, just BYTE) that
-behaves as an unsigned 8 bit integer, for use in low level code. BYTE thus
-avoids the need for SYSTEM.VAL when manipulating 8 bit unsigned numeric values,
-making code easier to write and, more importantly, easier to read. A BYTE type
-would be useful for microcontroller C support. So I believe it makes sense to
-add Wirths's BYTE to voc.
-
-Linux/Unix specifies many API datatypes and structure fields in terms of named C
-numeric types, with the result that they vary in size between implementations.
-This is perhaps the strongest driving force for adding support for various
-numeric types to voc - but they would better match the C types than be of fixed
-size.
-
-So maybe one could provide Platform.int, Platform.long, Platform.longlong,
-Platform.unsignedint, Platform.unsignedlong, Platform.unsignedlonglong and,
-importantly for memory management, Platform.uintptr.
-
-Personally I miss Pascal and Modula's subrange variables. As well as being great
-for error detection (assuming value checking code is generated), they can also
-be used to imply variables of arbitrary sizes (e.g. 'VAR mybyte = 0..255;').
-With these one could remove the Platform.int* types and replace them with
-constants Platform.MaxInt, Platform.MaxLong etc. I think this would be a cleaner
-more generalised option - but maybe, probably, it is a step too far. Always
-beware of over-generalising. Wirth found that most programmers did not use, or
-very rarely used, subrange types.
-
-#### More thoughts about 64 bit support and what INTEGER and LONGINT mean
-
-Arguably, because Oberon says LONGINT is big enough for addresses,
-it seems that LONGINT has to be 64 bits on a 64 bit system.
-
-But I'm having second thoughts.
-
-There's a lot of code out there that assumes the size of INTEGER and LONGINT
-and is broken if they are not 16 and 32 bits respectively. Frustratingly a
-lot of the broken code doesn't go wrong until it encounters values outside the
-16 and 32 bit ranges - like Texts.WriteInt which handles values up 2**32 fine,
-and then aborts the program with an index out of range error when the number
-is more than 11 characters long.
-
-I suggest use of LONGINT for addresses is a small subset of use cases of LONGINT.
-
-Instead I propose we
- - keep INTEGER at 16 bits and LONGINT at 32 bits.
- - Add LONG64 for 64 bit signed integers, to be available on both 32 and 64
-   bit systems, (quite possible as C has an int64_t on both systems).
- - add a SYSTEM.ADDRESS type for address manipulation
-   - an unsigned type that always matches the machine address size (32, 64 or even 16 bit).
-   - is compatible with SHORTINT, INTEGER, LONGINT and LONG64.
-
-It means changing the memory management and platform interface code, but it
-means client code does not need changing.
-
-This fixes the current 16 bit hole in the range of INTEGER types on 64 bit systems.
-
-#### Oakwood Guidelines on type sizes
-
-The Oakwood guidelines are interesting.
-
-  - 5.2 requires that e.g. LONGINT is 32 bits *or more*,
-
-but
-  - Appendix A 1.2.5.4 requires that MODULE Files *always* reads and writes
-    LONGINT as 4 bytes.
-
-The restriction for the Files module makes sense as it is intended to produce
-and consume files in a compatible way between platforms. Thus if a system uses
-64 bit LONGINT, it is an error (detected or not) to write
-to MODULE Files any LONGINT values outside the 32 bit range.
-
-To put it shockingly, it is an error to write the vast majority of possible
-LONGINT values - specifically over 99.998% of LONGINT values are invalid for
-MODULE Files.
-
-I see this as another argument in favour of locking LONGINT down as 32 bits.
-
-#### It's all the same to C
-
-It should be possible to make the 32/64 bit compilation a compiler option
-available whether the compiler binary itself was built with 32 or 64 bit C.
-
-Indeed - is there any benefit in a 64 bit compiler binary? A 32 bit compiler
-binary will be smaller and faster. The memory requirements of the compiler are
-orders of magnitude less than those that would need a 64 bit implementation.
-
-The only need for a 64 bit compiler binary is for systems that can only run
-64 bit binaries.
-
-Point being - the bit size of the compiler binary should be independent of the
-bit size of the target machine of the C code being generated.
-
-So the compiler options could be:
-
- 1. Generated binary bit size - 32 or 64 bit. Determines bit size of
-    SYSTEM.ADDRESS. Add 16 bit option for controllers.
- 2. Size of INTEGER, SET and LONGINT. Defaulting to 16,32,32 the parameter would
-    also allow 32/64/64.
-
-The libraries would be written and compiled to handle all cases. e.g.
-  - A WriteInt routine needs it's value parameter to accept integers of all
-    sizes and would be coded as LONG64.   
-  - ReadInt is slightly more difficult because the parameter is VAR. Make the
-    parameter ARRAY OF BYTE and process according to SIZE(param).
-
-#### A feature I'd really like to see
-
-We should report .Mod file name and line number at fault when exiting abnormally,
-e.g. due to index out of range. Followed by a stack trace.
-
-Wirth's original Pascal (Pascal 6000 on the CDC mainframe at ETHZ) had this at
-least by 1975. This could be achieved by including a table of .Mod file line
-number vs code address, and having the runtime seach this table for the failure
-address. It would be quite a lot of work!
-
-The current position tracking code in the compiler is buggy - for example the
-position at the end of the `expr` in `WHILE expr DO stmt END` is recorded as
-the position of the END when it should be of the 'DO'. This makes compiler error
-reporting a bit unhelpful, but it's worse for runtime error reporting as we end
-up with duplicate entries in the line number table. The position handling code
-is somewhat obscure as it uses a convenient but misnamed spare integer field in
-the symbol record and it's difficult to follow just when it patches it.
-
-#### Oberon 07/15 mode
-
- - Add standard BYTE type being an unsigned integer between 0 and 255.
- - Structured value parameters become read-only and get passed the same way as
-   VAR parameters - i.e. no copying.
- - CASE statements only support INTEGER (with low positive values) and CHAR.
- - Reject LOOP statements.
- - All imported variables are read-only.
-
-See [Difference between Oberon-07 and Oberon](https://www.inf.ethz.ch/personal/wirth/Oberon/Oberon07.pdf).
-
-#### To be left out?
-
-Work on other compatibility layers is in progress.
-voc team also works on bindings to existing C/Pascal libraries.
--- a/src/compiler/OPT.Mod
+++ b/src/compiler/OPT.Mod
@ -1323,13 +1323,12 @@ BEGIN topScope := NIL; OpenScope(0, NIL); OPM.errpos := 0;
  EnterProc("NEW",    sysnewfn);
  EnterProc("MOVE",   movefn);

-  syslink := topScope^.right;
+  syslink  := topScope^.right;
  universe := topScope; topScope^.right := NIL;


  EnterTyp("BOOLEAN",  Bool,   1, booltyp);
  EnterTyp("CHAR",     Char,   1, chartyp);
-(*EnterTyp("SET",      Set,   -1, settyp);*)    (* Size set in Compiler.PropagateElementaryTypeSize *)
  EnterTyp("REAL",     Real,   4, realtyp);
  EnterTyp("LONGREAL", LReal,  8, lrltyp);
  EnterTyp("HUGEINT",  Int,    8, hinttyp);
--- a/src/runtime/Platformwindows.Mod
+++ b/src/runtime/Platformwindows.Mod
@ -545,7 +545,7 @@ BEGIN IF l<0 THEN errch('-'); l := -l END; errposint(l) END errint;
 PROCEDURE DisplayHaltCode(code: LONGINT);
 BEGIN
  CASE code OF
-  | -1: errstring("Rider ReadBuf/WriteBuf transfer size longer than buffer.")
+  | -1: errstring("Assertion failure.")
  | -2: errstring("Index out of range.")
  | -3: errstring("Reached end of function without reaching RETURN.")
  | -4: errstring("CASE statement: no matching label and no ELSE.")