forum home page
register faq member list calendar search
MacShock.com - Apple Forums
Reload this Page
Old 01-14-2012, 07:40 AM
ict@ccess
Guest
 
Posts: n/a
Default Virtual String Memory for Applesoft

If one really thinks about it, text files are a kind of virtual
memory. You can add strings, retrieve strings. And if the strings
are of a fixed length, then you can even replace strings in a text
file.

But I found the text file commands of DOS3.3 and Basic.system very
slow. Some recommendations were to save your strings to a BIN file
instead. So I combined the two and created a sort of semi-Virtual
memory for strings

Applesoft is limited with what it can do with variables. The maximum
value for any dimension is 32767, and the actual string space
available for strings is far less.

Using my virtual string array add on, you can define up to 65536
strings including string zero. And the maximum length of any string
is 255 characters (not 256 because string zero cannot be used since it
is used as an end of string marker)
Strings used this way can reach Prodos's maximum file size of 16 Mb
limit (65536 x 255 = 16711680)

Advantages -
Only one virtual string array variable is created in memory with 255
characters, one master block and one single byte block is also loaded
with the pointers (the 1 block that is loaded with the actual text
uses basic.systems buffer so does not take any additional space. If
the string spans 2 blocks, then the first part of the string is moved
to the string space, then the 2nd block is loaded)

Text files of Prodos's maximum files size can be created
Multiple Virtual String Array files can be created and used within the
same program
Virtual Array Text files can be created using any Word Processor that
can create large files (I wrote a short applesoft program that
creates a header from the text files so any string can be jumped to
automatically as if it were the first string in the list)


Disadvantages - every string must be retrieved from disk although with
the advent of flash cards has made the speed of drives more tolerable.
One extra file is created for the pointers.


The commands to use

& DEF VA$(0) - creates a virtual string variable and sets its length
to $FF (255) and makes space in string space
& GET VA$(0-65535),"Virtual String Array filename" - gets string from
file
& STORE X$,"Virtual String Array filename" - appends the string X$ to
the end of the file. If file does not exist then create it. If file
is not a Virtual Array file then return with error


Also a separate short applesoft program was created to make a header
for the pointers to every string. I used a pretty neat trick to
compress the header so that only one byte is needed to point to each
string instead of three and a one block master header to point to the
start of each string block.

Long story short, hopefully. One block can point to 512 strings. so
only 128 blocks would be needed to point to 65535 strings plus one
master block. The master block holds the two high bytes of the start
of each of the 128 blocks so only one master block is needed. Once
one of the 128 blocks has been chosen by a little math (# of the
string divided by 512 gives one of the 128 blocks)(coincidently, this
same result also points to the hi bytes in the master block), a short
search is then made in the found block so that each time the next byte
is less than the preceding byte, a counter advances the high bytes
until the start of the correct string is found. Remember, these
blocks can point to 512 strings of 255 characters, so the master
pointers may also need to be incremented. There, short and sweet.

A three byte pointer header would take 384 blocks to point to each
string to reach the 16 MB limit.

One last thought for a use of Virtual String Arrays. One String Array
can have 65535 strings and multiple Virtual files can be created.
Each file can be designated as a column. Strings can become
formulas. Are you getting the picture.

A Virtual Spreadsheet with 65535 columns by 65535 rows. There is a
good program in Nibble magazine Vol6 No8. Dynamic functions. It puts
formulas into a string and then uses redefined DEF and VAL commands to
evaluate the formula.

Allowing Applesoft to play with the big boys.

Rob
  Reply With Quote
Old 01-16-2012, 06:40 PM
Michael J. Mahon
Guest
 
Posts: n/a
Default Virtual String Memory for Applesoft

ict@ccess wrote:
> If one really thinks about it, text files are a kind of virtual
> memory. You can add strings, retrieve strings. And if the strings
> are of a fixed length, then you can even replace strings in a text
> file.
>
> But I found the text file commands of DOS3.3 and Basic.system very
> slow. Some recommendations were to save your strings to a BIN file
> instead. So I combined the two and created a sort of semi-Virtual
> memory for strings
>
> Applesoft is limited with what it can do with variables. The maximum
> value for any dimension is 32767, and the actual string space
> available for strings is far less.
>
> Using my virtual string array add on, you can define up to 65536
> strings including string zero. And the maximum length of any string
> is 255 characters (not 256 because string zero cannot be used since it
> is used as an end of string marker)
> Strings used this way can reach Prodos's maximum file size of 16 Mb
> limit (65536 x 255 = 16711680)
>
> Advantages -
> Only one virtual string array variable is created in memory with 255
> characters, one master block and one single byte block is also loaded
> with the pointers (the 1 block that is loaded with the actual text
> uses basic.systems buffer so does not take any additional space. If
> the string spans 2 blocks, then the first part of the string is moved
> to the string space, then the 2nd block is loaded)
>
> Text files of Prodos's maximum files size can be created
> Multiple Virtual String Array files can be created and used within the
> same program
> Virtual Array Text files can be created using any Word Processor that
> can create large files (I wrote a short applesoft program that
> creates a header from the text files so any string can be jumped to
> automatically as if it were the first string in the list)
>
>
> Disadvantages - every string must be retrieved from disk although with
> the advent of flash cards has made the speed of drives more tolerable.
> One extra file is created for the pointers.
>
>
> The commands to use
>
> & DEF VA$(0) - creates a virtual string variable and sets its length
> to $FF (255) and makes space in string space
> & GET VA$(0-65535),"Virtual String Array filename" - gets string from
> file
> & STORE X$,"Virtual String Array filename" - appends the string X$ to
> the end of the file. If file does not exist then create it. If file
> is not a Virtual Array file then return with error
>
>
> Also a separate short applesoft program was created to make a header
> for the pointers to every string. I used a pretty neat trick to
> compress the header so that only one byte is needed to point to each
> string instead of three and a one block master header to point to the
> start of each string block.
>
> Long story short, hopefully. One block can point to 512 strings. so
> only 128 blocks would be needed to point to 65535 strings plus one
> master block. The master block holds the two high bytes of the start
> of each of the 128 blocks so only one master block is needed. Once
> one of the 128 blocks has been chosen by a little math (# of the
> string divided by 512 gives one of the 128 blocks)(coincidently, this
> same result also points to the hi bytes in the master block), a short
> search is then made in the found block so that each time the next byte
> is less than the preceding byte, a counter advances the high bytes
> until the start of the correct string is found. Remember, these
> blocks can point to 512 strings of 255 characters, so the master
> pointers may also need to be incremented. There, short and sweet.
>
> A three byte pointer header would take 384 blocks to point to each
> string to reach the 16 MB limit.
>
> One last thought for a use of Virtual String Arrays. One String Array
> can have 65535 strings and multiple Virtual files can be created.
> Each file can be designated as a column. Strings can become
> formulas. Are you getting the picture.
>
> A Virtual Spreadsheet with 65535 columns by 65535 rows. There is a
> good program in Nibble magazine Vol6 No8. Dynamic functions. It puts
> formulas into a string and then uses redefined DEF and VAL commands to
> evaluate the formula.
>
> Allowing Applesoft to play with the big boys.


In several programs I've written, I needed to be able to process
large files (>>64KB) while building several tables of an unpredictable
size.

I found that a single, sparse, binary file, accessed using BLOAD and
BSAVE with B (byte-offset) parameters worked quite well. The binary
file has a potentially huge size, but its sparseness greatly reduces
the actual disk allocation.

I generally use this approach with a pool of fixed-size buffers that
cache the most-recently-used chunks of the file, so the overall
performance is not too bad.

Applesoft and ProDOS make a pretty powerful combination!

-michael

NadaNet 3.1 for Apple II parallel computing!
Home page: http://home.comcast.net/~mjmahon/

"The wastebasket is our most important design
tool--and it's seriously underused."
  Reply With Quote
Old 01-17-2012, 02:40 AM
ict@ccess
Guest
 
Posts: n/a
Default Virtual String Memory for Applesoft


> In several programs I've written, I needed to be able to process
> large files (>>64KB) while building several tables of an unpredictable
> size.
>
> I found that a single, sparse, binary file, accessed using BLOAD and
> BSAVE with B (byte-offset) parameters worked quite well. *The binary
> file has a potentially huge size, but its sparseness greatly reduces
> the actual disk allocation.


I had use the BLOAD function as well and used the B paramter. B equal
record # times record length

I should explain a little bit of the reason I wrote this virtual
memory for strings. I was enjoying playing Wades fantastic game
Leadlight, so I decided to copy it over to a disk that I use for
work. I could then jump back and forth, to work, to play, to work, to
play.

I had used Prodos 8 to copy the files when I noticed that his text
files for his rooms, descriptions and extras were quite large. The
descripts text file was a whopping 300 blocks while on his disk image
it only took up 189 blocks. I knew it was a RAT file so I figured it
was saved as a sparse file. I opened it up in a word processor to see
what it was all about. I had finished the game already, so no secrets
were lost, but some secrets were exposed how this great game was
written.

So, for sh_ts and giggles (where the heck did that saying originate
from?), I wanted to see how much space was wasted by all the filler
characters. I started out removing them manually, but that quickly
ended, so I wrote a program that input'd each line and printed them
back to another text file using the sequential method.

Walla! The text file saved in just 89 blocks which is still quite a
savings over a sparse file. But now the kicker. How to access each
line without having to read each previous line first.

Answer. With pointers of course. 3 bytes would be needed to point to
strings >64 kb. Using the compressed pointer method stated above, for
the 89 block text file with around 500 lines of text, only two extra
blocks were needed for a total of 91 blocks compared to 189. That is
over 50% hard drive savings of a sparse text file.

Or another way to look at it. You can fit twice as much text in the
same size file.

And the greatest advantage is you can use a word processor to make
these files since it is not easy to pad each line to be the same
length.

Rob

Allowing Applesoft to play with the big boys
  Reply With Quote
Old 01-23-2012, 08:30 AM
ict@ccess
Guest
 
Posts: n/a
Default Virtual String Memory for Applesoft

To recap, I have created a couple short programs to turn a Random
Access Text File (RATF) into a sequential text file to save space that
a RATF would need since each line of text is padded to be all the same
length. But a sequential file has all of its text lines in a
sequential order of varying length and with a terminator character to
mark the end of the line. The only way to get to each line of text is
by reading each previous line to get to the line that you may want.

Question: How do you access each line quickly without the time
consuming operation of reading each previous line?

Answer: With pointers to each line.

This has the advantages of allowing one to create sequential files
with a text editor but using the text file as a RATF. And the file
size is substantially reduced due to the removal of padded characters.


To continue on with this topic of creating even smaller RATF files.

A set of short programs can compress a large text file even further.

How, you ask?

Good question. Here's how.

By creating a library of words from the text file and having a
pointer, called a descriptor, to each word from the library. The
average word length in our English language is between 4-5
characters. There are only a couple of single letter words (I and a)
and also not too many two letter words. (an at is if as ad ah am be by
do go my no so to we). At the worst even with a 2 byte descriptor, we
are basically at a break even point for compression.

Now keep in mind, that the first occurrence of a word is stored in a
library, and you don't get any compression until that word shows up
again. And also a one, two or three byte descriptor takes up memory
as well. So on the first occurrence of a word, you actually may lose
a couple of bytes. If a document only contained one occurrence of
each word, then this compression technique would be an utter disaster.

Originally, only one byte is needed to start, to point to up to 256
words from a library. This would soon be used up as most text
documents easily have more than 256 distinct words. So once the value
255 ($FF) is encountered, this would signal the decompression program
that it now requires 2 bytes to point to a word from the library.
This would give us access of up to 65536 words from our library. This
is still pretty good compression if the majority of the words stored
in the library are 3 characters or more.

For the most part, a two-byte descriptor and 65536 words should cover
most of the words in a reasonable sized text file. And if needed a
third byte can be used with the descriptor to access up to 16,777,216
words in our library. Even the use of this third byte can generally
reduce a file by 25% or more since a three-byte descriptor would be
used to point to an average 4 to 5 byte word.

Now, punctuation is a slightly different story. Punctuation would
have to be considered one-byte words due to their usage and they may
follow any character of the alphabet or any word in the library. You
can not add punctuation to a word since that would make a new word,
and the occurrence of that same word with the added punctuation mark
would rarely be used again throughout a document. So, for the best
compression, punctuation should be added to the library first so as to
be used by the single-byte descriptor without any loss of hard drive
space since a one-byte descriptor uses one byte and the punctuation
mark takes up one byte.

But, there may be one boon to saving space with punctuation.
Normally, spaces are not compressed and are not needed since pointers
only point to a whole word, and it is assumed that a word is followed
by a space. If a word is followed by a punctuation mark, then that
space would have to be recanted and the punctuation mark inserted.
Also, usually, a punctuation mark is followed by one space, and that
space would not need to be included as part of the punctuation word
either, but some punctuation marks may have 2 or more spaces following
it. Since spaces are not captured as a word unless there is more than
one consecutive space, you could then include those extra spaces as
part of the punctuation word. The chance of a re-occurrence of the
punctuation mark with extra spaces would be quite high, therefore
space can be saved and the punctuation word can be compressed, or at
the very least a break even point.

Another instance of repetitive punctuation is at the end of each
paragraph. A period can be combined with one or two carriage returns
to form a word. Space savings can be fairly good when punctuation
marks are repeated frequently in a document. A one-byte descriptor
with a three-byte punctuation is a 66% savings.

So for the most part, compression of a fair size text document should
exceed the 50% mark and may reach as high as 75%. The document would
have to be quite large to recognize any significant savings. For
example, this short lecture would not see that much of a savings in
compression but it may be worth to compress even it as there are some
quite large words that are used a number of times. (Compression,
characters, descriptor, document, library, punctuation). If each of
these words had a two-byte descriptor would result in 60 - 80%
compression for those words.

One last note is with capital letters. Words that contain capitals
would not have to be distinguished apart from its lower case
counterpart if the word is at the beginning of a sentence, but if the
word falls in the middle of a sentence, and contains any capital
letters, then it will have to be added to the library as a separate
word.

To recap.
-A RATF file with padded characters and no compression can be quite
large.
-Close to a 50% reduction in file size can be recognized by being
saved as a sparse file.
-Another 50% or more can be seen by removing padded characters and
saving as a sequential file and using pointers.
-And lastly, the text document may be compressed to save another
50-75% by using a library of words and having a one, two or three byte
descriptor to describe the word being used from the library.

All-in-all. A 300 block RATF file could potentially be reduced down
to about 30 blocks. That is a 10 times savings. A must need on a
computer with limited memory or hard drive space. A lot of
information can be stored in a small amount of space.

That's it for now

Rob


Allowing Applesoft to run with the big boys





  Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:38 PM.
Copyright ©2007-2008 MacShock.com. Powered by vBulletin
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.