-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathAbout Groups.txt
155 lines (115 loc) · 5.83 KB
/
About Groups.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
Groups files are text files that define zero or more groups, each one
specifying a list of attributes to match to files found on a web browser
or plugin's cache. There are two types of groups:
- File groups, which act on a file's properties (file signature, MIME
type, and file extension). These are used to label files according to
their file format. For example, files that are used by the Flash Player.
- URL groups, which act on a cached file's URL. These are used to label
files according to their original location on the internet. For example,
files that were hosted on a web portal like Newgrounds.com.
Groups files must always end in a .group extension. The Web Cache Exporter
only loads groups from files with this extension located in this directory.
This about file, for example, isn't loaded. Group files are loaded in
alphabetical order. In each file, groups are loaded from top to bottom.
The following section will show you how to define each type of group. Lines
that start with a semicolon ";" are treated as comments and are not processed.
You can only use spaces and tabs as whitespace.
; This defines the beginning of a file group with the name "HTML".
; Each group or list of attributes ends when it reaches the "END" keyword.
BEGIN_FILE_GROUP HTML
; This defines a list of file signatures. These are composed of one
; or more bytes represented in hexadecimal. Each value ranges from 00
; to FF. Only one file signature is allowed per line.
; You can also use "__" as wildcards that match any byte.
BEGIN_FILE_SIGNATURES
; This represents the following characters: "<html".
3C 68 74 6D 6C
; This represents the following characters: "<HTML".
3C 48 54 4D 4C
; This represents "<HT" followed by any other two bytes.
3C 48 54 __ __
END
; This defines a list of MIME types. You can add multiple values in
; each line.
BEGIN_MIME_TYPES
text/html
application/xhtml
END
; This defines a list of file extensions. You can add multiple values
; in each line.
BEGIN_FILE_EXTENSIONS
htm html
xhtml
END
; This defines a file extension that is added to the end of any cached
; file without a name (i.e. a file with a generic name assigned by the
; application). This feature makes it easier to use external programs
; to view common file formats since the file may match the group due to
; its signature and MIME type.
; If a default file extension is not specified, but the group lists
; exactly one file extension, then that one will be used. Default file
; extensions have precedence over lists with a single element.
DEFAULT_FILE_EXTENSION html
END
Although these three types of lists may appear in any order and more than
once, you should specify each type once and in this order: file signatures
before MIME types before file extensions. This order matches the precedence
that the Web Cache Exporter uses, making the group files easier to read.
For example, if a cached file matches a file signature in a group, it
won't bother checking the remaining MIME types and file extensions.
Matching Rules:
0. All comparisons are case insensitive.
1. File signatures are matched exactly, taking into account any wildcard
bytes. If the file's size is smaller than the file signature, it will never
match it. Each file signature can specify at most 1024 bytes.
2. MIME types match the beginning of the text. For example, a group that specifies
"text/java" will match "text/javascript" and "text/javascript; charset=UTF-8".
3. File extensions are matched exactly. For example, a group that specifies
"htm" will match .htm files, but not .html.
URL groups are very similar:
; This defines the beginning of a URL group with the name "Cartoon Network".
BEGIN_URL_GROUP Cartoon Network
; This defines a list of domains. You can use the forward slash "/"
; character to separate the host segment from the path. Only one domain
; is allowed per line.
; You can also add ".*" to the end of a host to match any top or second
; level domain.
BEGIN_DOMAINS
; This defines the host "cartoonnetwork.com" and no path.
cartoonnetwork.com
; This defines the host "turner.com" and the path "toon".
turner.com/toon
; This defines the host "cartoonnetwork" with any top or second
; level domain and no path.
cartoonnetwork.*
; This defines the host "turner" with any top or second level
; domain and the path "toon".
turner.*/toon
END
END
Matching Rules:
0. All comparisons are case insensitive.
1. The host segments match the ending of a URL's host. For example, a group
that specifies "example.com" will match the following:
- "http://www.example.com/index.html"
- "mms://cdn.example.com:80/path/video.mp4",
- "https://download.example.com/path/updates/file.php?version=123&platform=ABC#top"
This check allows you to match multiple subdomains.
2. If the path exists, it must also match the beginning of a URL's path.
Using the same example, a group that specifies "example.com/path" will
match the last two URLs. If it instead specifies "example.com/path/updates",
it will only match the last one.
3. If the host segment ends with ".*", it will match the ending of a URL's
host that contains any top or second level domain. For example, a group that
specifies "example.*" will match the following:
- "http://www.example.com/index.html"
- "http://cdn.example.net/index.html"
- "http://download.example.co.uk/index.html"
Groups files use UTF-8 as their character encoding, meaning you can use
any Unicode character in the various lists. In the Windows 98 and ME builds
these values are converted into ANSI strings, meaning only a subset of Unicode
characters may be used. Because of this, any offical group files that are
bundled with this tool will always use ASCII characters to maintain compatibility
with every Windows version. If you're making your own group files, and are
only running this application on Windows 2000 or later, you can use any Unicode
characters.