Microsoft Extensible Firmware Initiative fat32 File System Specification

Yüklə 225,69 Kb.

səhifə	9/9
tarix	11.10.2017
ölçüsü	225,69 Kb.
	#4468

1 2 3 4 5 6 7 8 9

Name Limits and Character Sets

Short Directory Entries

Short names are limited to 8 characters followed by an optional period (.) and extension of up to 3 characters. The total path length of a short name cannot exceed 80 characters (64 char path + 3 drive letter + 12 for 8.3 name + NUL) including the trailing NUL. The characters may be any combination of letters, digits, or characters with code point values greater than 127. The following special characters are also allowed:

$ % ' - _ @ ~ ` ! ( ) { } ^ # &
Names are stored in a short directory entry in the OEM code page that the system is configured for at the time the directory entry is created. Short directory entries remain in OEM for compatibility with previous versions of MS-DOS/Windows. OEM characters are single 8-bit characters or can be DBCS character pairs for certain code pages.
Short names passed to the file system are always converted to upper case and their original case value is lost. One problem that is generally true of most OEM code pages is that they map lower to upper case extended characters in a non-unique fashion. That is, they map multiple extended characters to a single upper case character. This creates problems because it does not preserve the information that the extended character provides. This mapping also prevents the creation of some file names that would normally differ, but because of the mapping to upper case they become the same file name.

Long Directory Entries

Long names are limited to 255 characters, not including the trailing NUL. The total path length of a long name cannot exceed 260 characters, including the trailing NUL. The characters may be any combination of those defined for short names with the addition of the period (.) character used multiple times within the long name. A space is also a valid character in a long name as it always has been for a short name. However, in short names it typically is not used. The following six special characters are now allowed in a long name. They are not legal in a short name.

+ , ; = [ ]
Embedded spaces within a long name are allowed. Leading and trailing spaces in a long name are ignored.
Leading and embedded periods are allowed in a name and are stored in the long name. Trailing periods are ignored.
Long names are stored in long directory entries in UNICODE. UNICODE characters are 16-bit characters. It is not be possible to store UNICODE in short directory entries since the names stored there are 8-bit characters or DBCS characters.
Long names passed to the file system are not converted to upper case and their original case value is preserved. UNICODE solves the case mapping problem prevalent in some OEM code pages by always providing a translation for lower case characters to a single, unique upper case character.

Name Matching In Short & Long Names

The names contained in the set of all short directory entries are termed the "short name space". The names contained in the set of all long directory entries are termed the "long name space". Together, they form a single unified name space in which no duplicate names can exist. That is: any name within a specific directory, whether it is a short name or a long name, can occur only once in the name space. Furthermore, although the case of a name is preserved in a long name, no two names can have the same name although the names on the media actually differ by case. That is names like "foobar" cannot be created if there is already a short entry with a name of "FOOBAR" or a long name with a name of "FooBar".
All types of search operations within the file system (i.e. find, open, create, delete, rename) are case-insensitive. An open of "FOOBAR" will open either "FooBar" or "foobar" if one or the other exists. A find using "FOOBAR" as a pattern will find the same files mentioned. The same rules are also true for extended characters that are accented.
A short name search operation checks only the names of the short directory entries for a match. A long name search operation checks both the long and short directory entries. As the file system traverses a directory, it caches the long-name sub-components contained in long directory entries. As soon as a short directory entry is encountered that is associated with the cached long name, the long name search operation will check the cached long name first and then the short name for a match.
When a character on the media, whether it is stored in the OEM character set or in UNICODE, cannot be translated into the appropriate character in the OEM or ANSI code page, it is always "translated" to the "_" (underscore) character as it is returned to the user – it is NOT modified on the disk. This character is the same in all OEM code pages and ANSI.

Naming Conventions and Long Names

An API allows the caller to specify the long name to be assigned to a file or directory. They do not allow the caller to independently specify the short name. The reason for this prohibition is that the short and long names are considered to be a single unified name space. As should be obvious the file system's name space does not support duplicate names. In other words, a long name for a file may not contain the same name, ignoring case, as the short name in a different file. This restriction is intended to prevent confusion among users, and applications, regarding the proper name of a file or directory. To make this restriction transparent, whenever a long name is created and the no matching long name exists, the short name is automatically generated from the long name in such a way that it does not collide with an existing short name.
The technique chosen to auto-generate short names from long names is modeled after Windows NT. Auto-generated short names are composed of the basis-name and an optional numeric-tail.

The Basis-Name Generation Algorithm

The basis-name generation algorithm is outlined below. This is a sample algorithm and serves to illustrate how short names can be auto-generated from long names. An implementation should follow this basic sequence of steps.

1. The UNICODE name passed to the file system is converted to upper case.

2. The upper cased UNICODE name is converted to OEM.

if (the uppercased UNICODE glyph does not exist as an OEM glyph in the OEM code page)
or (the OEM glyph is invalid in an 8.3 name)
{
Replace the glyph to an OEM '_' (underscore) character.
Set a "lossy conversion" flag.
}

3. Strip all leading and embedded spaces from the long name.

4. Strip all leading periods from the long name.

5. While (not at end of the long name)

and (char is not a period)
and (total chars copied < 8)
{
Copy characters into primary portion of the basis name
}

6. Insert a dot at the end of the primary components of the basis-name iff the basis name has an extension after the last period in the name.

7. Scan for the last embedded period in the long name.
If (the last embedded period was found)
{
While (not at end of the long name)
and (total chars copied < 3)
{
Copy characters into extension portion of the basis name
}
}

Proceed to numeric-tail generation.

The Numeric-Tail Generation Algorithm

If (a "lossy conversion" was not flagged)

and (the long name fits within the 8.3 naming conventions)
and (the basis-name does not collide with any existing short name)
{
The short name is only the basis-name without the numeric tail.
}
else
{
Insert a numeric-tail "~n" to the end of the primary name such that the value of the "~n" is chosen so that the
name thus formed does not collide with any existing short name and that the primary name does not exceed eight characters in length.
}

The "~n" string can range from "~1" to "~999999". The number "n" is chosen so that it is the next number in a sequence of files with similar basis-names. For example, assume the following short names existed: LETTER~1.DOC and LETTER~2.DOC. As expected, the next auto-generated name of name of this type would be LETTER~3.DOC. Assume the following short names existed: LETTER~1.DOC, LETTER~3.DOC. Again, the next auto-generated name of name of this type would be LETTER~2.DOC. However, one absolutely cannot count on this behavior. In a directory with a very large mix of names of this type, the selection algorithm is optimized for speed and may select another "n" based on the characteristics of short names that end in "~n" and have similar leading name patterns.

Effect of Long Directory Entries on Down Level Versions of FAT

The support of long names is most important on the hard disk, however it will be supported on removable media as well. The implementation provides support for long names without breaking compatibility with the existing FAT format. A disk can be read by a down level system without any compatibility problems. An existing disk does not go through a conversion process before it can start using long names. All of the current files remain unmodified. The long name directory entries are added when a long name is created. The addition of a long name to an existing file may require the 8.3 directory entry to be moved if the required adjacent directory entries are not available.
The long name entries are as hidden as hidden or system files are on a down level system. This is enough to keep the casual user from causing problems. The user can copy the files off using the 8.3 name, and put new files on without any side effects
The interesting part of this is what happens when the disk is taken to a down level FAT system and the directory is changed. This can affect the long name entries since the down level system ignores these long names and will not ensure they are properly associated with the 8.3 names.
A down level system will only see the long name entries when searching for a label. On a down level system, the volume label will be incorrectly reported if the true volume label does not come before all of the long name entries in the root directory. This is because the long name entries also have the volume label bit set. This is unfortunate, but is not a critical problem.
If an attempt is made to remove the volume label, one of the long name directory entries may be deleted. This would be a rare occurrence. It is easily detected on an aware system. The long name entry will no longer be a valid file entry, since one or more of the long entries is marked as deleted. If the deleted entry is reused, then the attribute byte will not have the proper value for a long name entry.
If a file is renamed on a down level system, then only the short name will be renamed. The long name will not be affected. Since the long and short names must be kept consistent across the name space, it is desirable to have the long name become invalid as a result of this rename. The checksum of the 8.3 name that is kept in the long name directory provides the ability to detect this type of change. This checksum will be checked to validate the long name before it is used. Rename will cause problems only if the renamed 8.3 file name happens to have the same checksum. The checksum algorithm chosen has a relatively flat distribution across the short name space.
This rename of the 8.3 name must also not conflict with any of the long names. Otherwise a down level system could create a short name in one file that matches a long name, when case is ignored, in a different file. To prevent this, the automatic creation of an 8.3 name from a long name, that has an 8.3 format, will directly map the long name to the 8.3 name by converting the characters to upper case.
If the file is deleted, then the long name is simply orphaned. If a new file is created, the long name may be incorrectly associated with the new file name. As in the case of a rename the checksum of the 8.3 name will help prevent this incorrect association.

Validating The Contents of a Directory

These guidelines are provided so that disk maintenance utilities can verify individual directory entries for 'correctness' while maintaining compatibility with future enhancements to the directory structure.
1. DO NOT look at the content of directory entry fields marked 'reserved' and assume that, if they are any value other than zero, that they are 'bad'.

2. DO NOT reset the content of directory entry fields marked reserved to zero when they contain non-zero values (under the assumption that they are "bad"). Directory entry fields are designated reserved, rather than must-be-zero. They should be ignored by your application.. These fields are intended for future extensions of the file system. By ignoring them an utility can continue to run on future versions of the operating system.

3. DO use the A_LONG attribute first when determining whether a directory entry is a long directory entry or a short directory entry. The following algorithm is the correct algorithm for making this determination:
if (((LDIR_attr & ATTR_LONG_NAME_MASK) == ATTR_LONG_NAME) && (LDIR_Ord != 0xE5))
{
/* Found an active long name sub-component. */
}
4. DO use bits 4 and 3 of a short entry together when determining what type of short directory entry is being inspected. The following algorithm is the correct algorithm for making this determination:
if (((LDIR_attr & ATTR_LONG_NAME_MASK) != ATTR_LONG_NAME) && (LDIR_Ord != 0xE5))
{
if ((DIR_Attr & (ATTR_DIRECTORY | ATTR_VOLUME_ID)) == 0x00)
/* Found a file. */
else if ((DIR_Attr & (ATTR_DIRECTORY | ATTR_VOLUME_ID)) == ATTR_DIRECTORY)
/* Found a directory. */
else if ((DIR_Attr & (ATTR_DIRECTORY | ATTR_VOLUME_ID)) == ATTR_VOLUME_ID)
/* Found a volume label. */
else
/* Found an invalid directory entry. */
}
5. DO NOT assume that a non-zero value in the "type" field indicates a bad directory entry. Do not force the "type" field to zero.

6. Use the "checksum" field as a value to validate the directory entry. The "first cluster" field is currently being set to zero, though this might change in future.

Other Notes Relating to FAT Directories

Long File Name directory entries are identical on all FAT types. See the preceeding sections for details.

DIR_FileSize is a 32-bit field. For FAT32 volumes, your FAT file system driver must not allow a cluster chain to be created that is longer than 0x100000000 bytes, and the last byte of the last cluster in a chain that long cannot be allocated to the file. This must be done so that no file has a file size > 0xFFFFFFFF bytes. This is a fundamental limit of all FAT file systems. The maximum allowed file size on a FAT volume is 0xFFFFFFFF (4,294,967,295) bytes.

Similarly, a FAT file system driver must not allow a directory (a file that is actually a container for other files) to be larger than 65,536 * 32 (2,097,152) bytes.

NOTE: This limit does not apply to the number of files in the directory. This limit is on the size of the directory itself and has nothing to do with the content of the directory. There are two reasons for this limit:

Because FAT directories are not sorted or indexed, it is a bad idea to create huge directories; otherwise, operations like creating a new entry (which requires every allocated directory entry to be checked to verify that the name doesn’t already exist in the directory) become very slow.
There are many FAT file system drivers and disk utilities, including Microsoft’s, that expect to be able to count the entries in a directory using a 16-bit WORD variable. For this reason, directories cannot have more than 16-bits worth of entries.

Yüklə 225,69 Kb.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9