ADC Protocol Draft 0.6

 

 

1      Intro. 2

1.1       About 2

1.2       Credits. 2

2      Structure. 2

2.1       General 2

2.2       Message Layout 3

2.2.1        Message types. 3

2.2.2        Client identification (CID) 3

2.2.3        Files. 4

3      BASE implementation. 4

3.1       Client – Hub communication. 4

3.2       Client – Client communication. 4

3.3       Actions. 4

3.3.1        STA.. 4

3.3.2        SUP. 5

3.3.3        INF. 5

3.3.4        MSG.. 7

3.3.5        SCH.. 7

3.3.6        RES. 7

3.3.7        CTM... 8

3.3.8        RCM... 8

3.3.9        GPA.. 8

3.3.10      PAS. 8

3.3.11      QUI 8

3.3.12      DSC.. 9

3.3.13      GET. 9

3.3.14      GFI 9

3.3.15      SND.. 9

3.3.16      NTD.. 9

4      Examples. 9

4.1       Client – Hub connection. 9

4.2       Client – Client connection. 10

5      Standard Extensions. 10

5.1       TTH.. 10

5.2       REGEX.. 10

5.3       ZLIB.. 10

5.3.1        ZLIB-FULL. 10

5.3.2        ZLIB-GET. 11

 


1         Intro

1.1      About

This is a text protocol for a DC style network that I could support. What I'm after is a simple protocol that doesn't require very much effort neither in hub nor client, and is yet extensible. It addresses some of the issues in the NMDC protocol, the most interesting being extensibility and hub bandwidth. The same protocol structure is used both for client-hub and client-client communication. This document is split into two parts, the first shows the structure of the protocol, while the second implements a specific system using this structure. ADC stands for anything you would like it to stand for, Advanced DC is the first neutral thing that springs to mind, apart from the obvious =).

1.2      Credits

Many ideas for this I’ve taken from Jan Vidar Krey’s DCTNG draft, others come from the DC dev hub people (notably cologic, fusbar and sedulus). Oh, and not to forget, Jon Hess for the original DC idea.

2         Structure

2.1      General

2.2      Message Layout

 

The typical message looks like this:

 

XYYY p1 p2 ... pn\n

 

X

Message type

Y

Action

p1 … pn

Parameters

 

Since action is separated from the message type, the client should ignore the type, and only look at the three action letters, although some sanity check filtering should be done to ensure proper operation even with buggy clients / hubs. This allows clients to support features sent in new ways without changing the hub (search targeted at one user for instance).

It is valid to send unknown messages, but it is preferred that they’re preceded with proper SUP to avoid sending garbage that nobody understands anyway. Other clients can be notified of extended features by adding flags to the INF.

Messages that expect an answer to the sender must keep the return CID as the first parameter, apart from the D message type, where it must come second.

<flags> in the parameter list means that a set of named flags can be added to the command. Each named flag has the form XXyyyyy where XX are two upper-case letters and yyy some arbitrary data associated with the flag. Flags are used to add special processing options to commands, and if a flag requires that the other party interprets the command in a non-standard way (compression for instance), a SUP is required to make sure both parties understand the flag correctly.

2.2.1      Message types

A

Active broadcast. Message should be broadcast to all UDP active clients.

B

Broadcast. Hub should send message to all connected clients.

C

Client message. All TCP client-client messages are sent like this.

D

Direct message. The target CID must be inserted before the other parameters of the action. Apart from sending the message to the target, an exact copy must always be sent to the source to confirm that the hub has correctly processed the message.

I

Info message. This message originated from the hub.

H

Hub message. This message is intended for the hub only, not relayed to other clients.

P

Passive broadcast. Message should be broadcast to all UDP passive clients.

U

UDP message. Message is sent directly with UDP to the target client (hubs will never see this type).

 

2.2.2      Client identification (CID)

Connected clients are identified by a CID (Client IDentification), which globally identifies a particular user. Clients should also use the same CID when connecting to multiple hubs. If clients offer different shares on different hubs, they must keep track of where a connecting client comes from so that the correct files always will be available. Clients should also strive to keep the same CID between sessions, to ease the implementation of favorite users and queue handling.

CID’s are 64 bits in length, and should be generated using the DCE UUID standard (several libraries exist for this) and then XOR’ing the high and low 64-bit parts together.

It is up to the hub developer to decide whether to base hub registration on CID or nickname (during login, the client (usually) provides both), but the latter is probably more convenient for the users.

2.2.3      Files

Filenames are relative to a fictive root in the user’s share. The ‘/’ is used to separate directories, and each file or directory name must be unique in a case-insensitive context. Any viewable characters (including space, char code >= 32) are valid names for files, the ‘/’ is escaped with the ‘\’. Clients must then take care to properly filter the filename for the target file system, but must be ready to request filenames from other clients according to these rules. The special names ‘.’ and ‘..’ may not occur as a directory or filename, any file list received containing those must be ignored. The shared files are identified relative to the unnamed root ‘/’ (“/dir/subdir/filename.ext”), while extensions can extend on this namespace by adding a named root (“TTH/dir/subdir/filename.ext” could for example specify the full Tiger Tree Hash of file filename.ext to be downloaded in a GET command), preferably using their SUP name. Rootless filenames are treated as special, and can be used to supply binary transfers of arbitrary data, avoid polluting the namespace by using a named root if feasible. The special, rootless, filename “files.xml” specifies the file listing, uncompressed, in XML using the uft-8 encoding. Clients can then compress this list and offer the compressed one on a SUP basis. I recommend bzip2 or generic zlib compressed transfers for this task, although the uncompressed list must always be available.

2.2.4      files.xml

This is the list of files intended for browsing. It has the following general structure:

 

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

<!--

Version is not intended to change unless a breaking change is done to the structure of the file.

Generator is for statistical and informative purposes only and should not be used for extra content discovery.

-->

<FileListing Version="1" Generator="DC++ 0.401">

  <Directory Name="share">

    <Directory Name="DC++ Prerelease">

      <File Name="DCPlusPlus.pdb" Size="17648640"/>

      <File Name="DCPlusPlus.exe" Size="946176"/>

    </Directory>

    <File Name="ADC.doc" Size="154112"/>

  </Directory>

</FileListing>

 

More information may be added to the file by extensions, but is not guaranteed to be interpreted by other clients.

3         BASE implementation

Each message is specified as the action code and the message type contexts under which it is valid. This particular implementation is known as BASE, as far as protocol identification is concerned. All ADC clients/hubs should support this minimum of functionality, extending as necessary. The connecting party will from now on be known as client, the other as server. It is always the server that controls state transitions.

The message types are merely a pointer to where the commands are most likely to appear, but clients should be prepared that they might arrive in other ways (for example type D or C searches to search a particular client).

For client-client communication, this protocol is identified by the string “ADC-BASE/1.0”.

3.1      Client – Hub communication

During login, the client goes through a number of stages. An action is valid only in the NORMAL stage unless otherwise noted. The stages, in login order, are PROTOCOL (feature support discovery), IDENTIFY (user identification, static checks), VERIFY (password check), NORMAL (normal operation). Any error in hub communication means disconnection, hopefully preceded by an ERR action.

3.2      Client – Client communication

The client – client messages use essentially the same stages as client – hub, but probably without VERIFY (Client access passwords are not supported in BASE), and an additional DATA state.

3.3      Actions

3.3.1      STA

STA <code> <param1>…<paramN> <description>

Types: C, D, I

States: All

Code

Status code in the form “xyy” where x specifies severity, and yy the specific error code. The severity and error code is treated separately, i e the same error could occur at different severity levels.

 

Severity values:

0 Success (used for confirming commands)

1 Recoverable (error but no disconnect)

2 Fatal (disconnect)

 

Error codes:

00 Generic, show description

11 Hub full

12 Hub disabled

13 Registered users only

21 Nick invalid

22 Nick taken

23 Invalid username / password combination

31 Permanently banned

32 Temporarily banned, param1is an integer specifying the number of seconds left until it expires (This is used for kick as well…).

41 Protocol unsupported, param1 is the CID of the sender, param2 the token param3 the protocol string.

42 Required INF field missing, param1 specifies the field.

51 File not available

52 File part not available

53 Slots full

 

Description

Text description of the error, suitable for viewing directly to the user

 

Even if an error code is unknown by the client, it should display the text message alone. Error codes are used so that the client can take different action on different errors. Most error codes don’t have parameters and only make sense in C and I types. Error responses should not be sent for obvious errors (a passive client sending a CTM for example).

 

3.3.2      SUP

SUP [+|-]<feature1>…[+|-]<featureN>

Types: C, H, I

States: All

This command identifies which features a specific client / hub supports. Each feature has a FOURCC code, using only upper case letters and numbers [A-Z0-9]. A central register of known features should be kept, to avoid clashes. All ADC clients should support the BASE feature (unless a future revision takes its place), which is this protocol. The resulting features used by two peers should be the intersection of features sent by the respective parties.

This command can also be used to dynamically add / remove features, ‘+’ meaning add and ‘-’ remove. For those commands that break or modify compatibility in some way (compression for example), the receiving end must verify with an equivalent SUP command, and the new feature set will be valid from that point. No other commands must be sent until the response has been received, identified as valid or no by the + or -.

When the server receives this message the first time, it should reply with the same, send an INF about itself and move to the IDENTIFY state. The client, when it receives it should send an INF about itself.

 

3.3.3      INF

INF <CID> <field1>…<fieldN>

Types: B, C, I

States: IDENTIFY, NORMAL

This command updates the information about a client. Each time this is received, it means that the fields specified have been added or updated. Each field is identified by two characters, directly followed by the data associated with that field. A field (and the effects of its presence) can be canceled by sending the field name without data. Clients should ignore any fields they don’t know, so that fields safely can be added in the future. Most of these fields are only interesting in the client-hub communication, during client-client this command is mainly used for identification purposes. Hubs can choose to require or ignore any or all of these fields; clients must work without any of them. Many of these fields, such as share size or client version, are purely informative heuristics, and should be taken with a grain of salt, as it is very easy to fake them. On the other hand, clients should strive to provide accurate data for the general health of the system, as providing invalid information probably will annoy a great deal of people. Updates are made in an incremental manner, by sending only the fields that have changed.

Fields:

I4

IPv4 address without port. A zero address (0.0.0.0) means that the server should replace it with the real IP of the client.

I6

IPv6 address without port. A zero address ([0:0:0:0:0:0:0:0]) means that the server should replace it with the real IP of the client.

U4

Client UDP port. Sending this field to the hub with a port means that this client wants to run in active mode for UDP. If this field is missing (or empty if changing modes), it means that the client should be treated as passive.

U6

Same as U4, but for IPv6.

SS

Share size in bytes, integer.

SF

Number of shared files, integer

VE

Client identification, version (client specific, recommended a short identifier then a float for version number). It is important that hubs don’t discriminate clients based on their VE tag but instead rely on SUP when it comes to which clients should be allowed (for example, “we only want clients that can hash”). VE is there mainly for informative reasons, and can perhaps be used to warn users that they’re using a known buggy or vulnerable client.

US

Maximum upload speed, bits/sec, integer

SL

Upload slots open, integer

AS

Automatic slot allocator speed limit, bytes/sec, integer. This is the recommended method of slot allocation, the client keeps opening slots as long as its total upload speed doesn’t exceed this value. SL then serves as a minimum number of slots open.

AM

Maximum number of slots open in automatic slot manager mode, integer.

EM

E-Mail, string.

NI

Nickname, string. Hub must ensure that this is unique (case insensitive) in each hub, to avoid confusion. Valid are all displayable characters (char code > 32) apart from space, although hubs are free to limit this further as they like with an appropriate error message.

DE

Description, string. Valid are all displayable characters (char code >= 32).

HN

Hubs where user is a normal user, integer.

HR

Hubs where user is registered (had to supply password), integer.

HO

Hubs where user is op in, integer.

TO

Token (used with CTM) in the c-c connection.

OP

1=op

AW

1=Away

2=Extended away, don’t care about main chat either (hubs can skip sending MSG commands if they want)

(Other away modes are reserved for the future)

BO

1=Bot

HI

1=Hidden, should not be shown on the user list.

HU

1=Hub, this INF is about the hub itself

LO

1=Login, this INF is sent to login to the hub, this must also be broadcast to all other clients on first INF they receive about the new client.

 

Hubs are welcome to mandate or discard any and all fields, but obviously the more the merrier (and clients could be disconnected for not sending some of them…)

When a server receives this in the IDENTIFY state, it should respond with an INF about itself and proceed to the VERIFY state by sending a PAS request or NORMAL state by starting sending the INF of all clients, where the INF of the connecting client must come last. When the hub that sends an INF about itself, the NI becomes hub name, VE version etc.

3.3.4      MSG

MSG <my-CID> <text> <flags>

Types: A, B, D, I, P

A chat message. The receiving clients should precede it with ‘<’ nick ‘>’, to allow for uniform displaying of messages. The client should not send its own nick in the text.

Flags:

PM<group-CID>

Private message, <group-CID> is optional, private messages should be displayed in a window using <group-CID> as title (for op chats etc). Any responses to the message should also be sent to this CID, not the originating one.

 

3.3.5      SCH

SCH <my-CID> <field1>…<fieldN>

Types: P,U,D,(B)

Search. Each parameter is an operator followed by a term. Each term is a two-letter code followed by the data to search for. Clients that don’t recognize a field should ignore the search.

++, --, EX

String search term, where ++ is include, -- is exclude, and EX is extension. Each filename (including the path to it) should be matched using case insensitive substring search as follows: match all ++, remove those that match any --, and make sure the extension matches at least one of the EX (if it is present). Extensions must be sent without the leading ‘.’.

<=

Smaller than or equal size in bytes

>=

Larger than or equal size in bytes

==

Exact size in bytes

TO

Token, string. Used by the client to tell one search from the other, if received, the responding client must copy this field exactly to each search result.

 

Note that hubs normally only relay searches to passive clients (type P) and clients send searches to active clients by themselves using type U, which should prove a massive bandwidth saver for the hubs. Should ISP’s dislike this, a switch to type B searching is easily done.

3.3.6      RES

RES <my-CID> <field1>…<fieldN>

Types: D, U

Search result, made up of fields similar to the INF ones. It is of course better for the network if the client sends all it knows about a file, unless it’s a lot of data. Search results without size and filename are obviously useless, but if a client has hashing or any other meta-data to add, that’s only good. Passive results should be limited to 5, active to 10.

FN

Full filename including path

SI

Size, in bytes

SL

Slots currently available

TO

Token

 

3.3.7      CTM

CTM <my-CID> <token> <proto> <port>

Types: D

Connect to me. Used by active clients that want to connect to someone, or in answer to RCM. Only TCP active clients may send this (don’t forget to put target-CID before my-CID, as the type D requires). <token> is a number that fits in 32 signed bits that should be unique for each connection attempt, so that the client can identify where the connection came from. <proto> is an arbitrary string specifying the protocol to connect with, in the case of an ADC compliant connection attempt, this should be the string “ADC/1.0”. If this is a response to a RCM, the <token> and <proto> fields should just be copied directly (if the protocol is supported of course). If a protocol is not supported, a DERR should be sent indicating this.

3.3.8      RCM

RCM <my-CID> <token> <proto>

Types: D

Reverse CTM. Used by passive clients to signal that they want a connection token from an active client.

3.3.9      GPA

GPA <data>

Types: I

States: VERIFY

Get Password. The parameter is 192 random binary bits (base32 encoded), used to avoid replay attacks on the password.

3.3.10 PAS

PAS <password>

Types: H

States: VERIFY

Password. The CID (in binary), then the password, followed by the random data, passed through the Tiger hash algorithm (not Tiger Tree) then base32. When validated, this moves the server into NORMAL state.

3.3.11 QUI

QUI <CID> <reason> <param1>…<paramN>

Types: I

A client disconnected from the hub.

Reason can be one of the following:

ND

Normal disconnect

DI

Disconnected (friendly disconnect), param1 = originating CID, param2=message

KK

Kicked (unfriendly disconnect), param1 = originating CID, param2=message

BN

Banned, param1 = originating CID, param2 = seconds banned, -1 = forever, param3=message

RD

Redirected, param1 = originating CID, param2 = redirect address, param3=message

Message is optional.

 

3.3.12 DSC

DSC <CID> <reason> <broadcast-reason> <param1>…<paramN>

Types: H

The reason and parameters are analogous to the QUI command.

Broadcast-reason is chosen from ND and <reason> to select how the disconnect is broadcast to the other users.

3.3.13 GET

GET <type> <identifier> <start-pos> <bytes> <flags>

Types: C, H, I

Requests for a certain file or binary data to be transmitted. <start-pos> counts 0 as the first byte. <bytes> may be set to -1 to indicate that it is unknown. <type> is a [a-zA-Z0-9] that specifies the namespace for identifier, BASE requires that clients recognize the type “file”, where identifier is a filename in the share. Passive clients depend on the “no slots” to be recoverable, if a client gets a recoverable error after a GET command and has nothing else to do, it must send NTD, otherwise the passive client will never get a chance at downloading if  the other client has a file queued. Note that this can also be used for binary transfers between hub and client.

The <flags> parameter is used for specifying extra named options that extensions might use.

3.3.14 GFI

GFI <type> <identifier> <flags>

Types: C

Get File Information, request that the other client returns a RES with relevant file data, for example size. Type and identifier are the same as for GET.

3.3.15 SND

SND <type> <identifier> <start-pos> <bytes> <flags>

Types: C, H, I

State transition to DATA state. The sender will keep on sending until <bytes> bytes of binary data have been sent, and then will put itself back to NORMAL state. <bytes> may be set to -1 to specify that the number of bytes that will be sent is unknown (for compressed data for example, the receiver can then detect end of data by counting the number of uncompressed bytes it has written and subtract that from the number of bytes expected). Either the sender or the receiver must know the size before sending the file (i.e. -1 may only be present in one of the GET and SND commands). The parameters are essentially a mirror of the GET parameters, but bytes must be replaced if it was -1 in the GET request.

3.3.16 NTD

NTD

Types: C

Nothing to do. This is sent by the server, to indicate that it passes control over the NORMAL state over to the other client, effectively making it the server. It is always the server that has the first say in who will transfer files, this way we don’t have to remember if we’re connecting because of a CTM or because we want to download. A client that receives NTD and has nothing to do itself should disconnect.

4         Examples

4.1      Client – Hub connection

Client

Hub

HSUP features

 

 

ISUP features

IINF <Hub-CID> …

BINF <Client-CID> LO1…

 

 

IGPA …

HPAS …

 

 

IINF <all clients>

BINF <Client-CID> …

4.2      Client – Client connection

Client

Server

CSUP features

 

 

CSUP features

CINF <Server-CID>

CINF <Client-CID> TOtoken

 

 

CGET …

CSND

<data>

 

 

CNTD

CGET …

 

 

CSND …

<data>

CNTD

 

 

<disconnect>

 

5         Standard Extensions

5.1      TTH

Tiger tree hashes is the standard way of implementing hashes in ADC.

5.2      REGEX

Regular expressions in searches. Extends the SCH command with the operator RE that takes a regular expression in the (Perl? PCRE? Java? .NET? POSIX?) form.

5.3      ZLIB

ZLib compressed communication. There are two variants of zlib support, FULL and GET, and only one should be used on a each communications channel set up.

5.3.1      ZLIB-FULL

If, during initial SUP negotiation, both ends send “ZLIF” in their support string, it means that all subsequent message passing will be tunneled in one long zlib stream. Care must be taken to partially flush the zlib buffer when needed to ensure that the commands are in a decompressable state when they arrive at the other end.

5.3.2      ZLIB-GET

The alternative is to send “ZLIG” to indicate that ZLib is supported for binary transfers using the GET command, but not otherwise (memory constraints in the hub for example). A flag “ZL” is added to the to the SND command to indicate that the data will come compressed, and the client receiving requests it by adding the same flag to GET. The <bytes> parameter of the GET and SND commands is to be interpreted as the number of uncompressed bytes to be transferred.