This document describes various aspects of running HcNet-core for system administrators(but may be useful to a broader audience).
HcNet Core is responsible for communicating directly with and maintaining the HC peer-to-peer network.
You get to run your own Aurora instance:
note: in this document we use “Aurora ” as the example implementation of a first tier service built on top of HcNet-core, but any other system would get the same benefits.
Level of participation to the network:-
As a node operator you can participate to the network in multiple ways.
watcher | archiver | basic validator | full validator | |
---|---|---|---|---|
description | non-validator | all of watcher + publish to archive | transaction set to include in the next ledger) | to archive |
submits transactions | yes | yes | yes | yes |
supports aurora | yes | yes | yes | yes |
helps other nodes to catch up and join the network | no | yes | no | yes |
Increase the resiliency of the network | no | Medium | Low | High |
From an operational point of view “watchers” and “basic validators” are about the same (they both compute an up to date version of the ledger). “Archivers” or “Full validators” publish into an history archive which has additional cost.
Watcher nodes are configured to watch the activity from the network
Use cases:
Ephemeral instances, where having other nodes depend on those nodes is not desired
Potentially reduced administration cost (no or reduced SLA)
Real time network monitoring (which validators are present, etc)
Generate network meta-data for other systems (Aurora).
The purpose of Archiver nodes is to record the activity of the network in long term storage (AWS, Azure, etc).
History Archives contain snapshots of the ledger, all transactions and their results.
Use cases:
Operational requirements:
Nodes configured to actively vote on the network.
Use cases:
Operational requirements:
Nodes fully participating in the network.
Full validators are the true measure of how decentralized and redundant the network is as they are the only type of validators that perform all functions on the network.
Use cases:
Operational requirements:
Regardless of how you installHcNet-core (apt, source, docker, etc), you will need to configure the instance hosting it roughly the same way.
CPU, RAM, Disk and network depends on network activity. If you decide to collocate certain workloads, you will need to take this into account.
As of early 2018, HcNet-corewith PostgreSQL running on the same machine has no problem running on a m5.large in AWS (dual core 2.5 GHz Intel Xeon, 8 GB RAM).
Storage wise, 20 GB seems to be an excellent working set as it leaves plenty of room for growth.
Interaction with the peer to peer network
Interaction with other internal systems
Note on exposing the HTTP endpoint: if you need to expose this endpoint to other hosts in your local network, it is recommended to use an intermediate reverse proxy server to implement authentication. Don’t expose the HTTP endpoint to the raw and cruel open internet.
Click here https://github.com/HashCash-Consultants/HCNet-core
Before attempting to configure HcNet-core, it is highly recommended to first try running a private network or joining the test network.
All configuration for HcNet-coreis done with a TOML file. By default HcNet-coreloads ./HcNet-core.cfg, but you can specify a different file to load on the command line:
$ HcNet-core--conf betterfile.cfg << /span>COMMAND>
The examples in this file don’t specify --conf betterfile.cfg for brevity.
Nodes are considered validating if they take part in FBAand sign messages pledging that the network agreed to a particular transaction set. It isn’t necessary to be a validator. Only set your node to validate if other nodes care about your validation
If you want to validate, you must generate a public/private key for your node. Nodes shouldn’t share keys. You should carefully secure your private key. If it is compromised, someone can send false messages to the network and those messages will look like they came from you.
Generate a key pair like this:
Place the seed in your config:
and set the following value in your config:
Tell other people your public key (GDMTUTQ… ) so people can add it to their QUORUM_SET in their config. If you don’t include a NODE_SEED or set NODE_IS_VALIDATOR=true, you will still watch FBAand see all the data in the network but will not send validation messages.
The way quorum sets are configured is explained in detail in the example config.
As an administrator what you need to do is ensure that your quorum configuration:
If you are running multiple validators, the availability model of your organization as a “group of validators” (the way people are likely to refer to your validators) is not like traditional web services:
Divide the validators into two categories:
One of the goals is to ensure that there will always be some full validators in any given quorum (from your node’s point of view).
As the way quorum sets are specified using a threshold, i.e. require T out of N entities (groups or individual validators) to agree, the desired property is achieved by simply picking a threshold at least equal to the number of basic entities at the top level + 1.
[QUORUM_SET]
THRESHOLD_PERCENT= ?
VALIDATORS= [.....]
# optional, other full validators grouped by entity
[QUORUM_SET.FULLSDF]
THRESHOLD_PERCENT= 66
VALIDATORS= [.....]
# other basic validators
[QUORUM_SET.BASIC]
THRESHOLD_PERCENT= ?
VALIDATORS= [.....]
# optional, more basic validators from entity XYZ
[QUORUM_SET.BASIC.XYZ]
THRESHOLD_PERCENT= 66
VALIDATORS= [.....]
A simple configuration with those properties could look like this:
[QUORUM_SET]
# this setup puts all basic entities into one top level one
# this makes the minimum number of entities at the top level to be 2
# with 3 validators, we then end up with a minimum of 50%
# more would be better at the expense of liveness in this example
THRESHOLD_PERCENT= 67
VALIDATORS= ["$sdf1", "$sdf2", "$sdf3" ]
[QUORUM_SET.BASIC]
THRESHOLD_PERCENT= 67
VALIDATORS= [.....]
[QUORUM_SET.BASIC.XYZ]
THRESHOLD_PERCENT= 67
VALIDATORS= [.....]
Thresholds and groupings go hand in hand, and balance:
Liveness pushes thresholds lower and safety pushes thresholds higher.
On the safety front, ideally any group (regardless of its composition), can suffer a 33% byzantine failure, but in some cases this is not practical and a different configuration needs to be picked.
You may have to change the grouping in order to achieve the expected properties:
It is generally a good idea to give information to your validator on other validators that you rely on. This is achieved by configuring KNOWN_PEERS and PREFERRED_PEERS with the addresses of your dependencies.
Additionally, configuring PREFERRED_PEER_KEYS with the keys from your quorum set might be a good idea to give priority to the nodes that allows you to reach consensus.
Without those settings, your validator depends on other nodes on the network to forward you the right messages, which is typically done as a best effort.
Sometimes an organization needs to make changes that impact other’s quorum sets:
In both cases, it’s crucial to stage the changes to preserve quorum intersection and general good health of the network:
Recommended steps are for the entity that adds/removes nodes to do so first between their own nodes, and then have people reflect those changes gradually (over several rounds) in their quorum configuration.
Cross reference your validator settings, in particular:
After configuring your database and buckets settings, when running HcNet-core for the first time, you must initialize the database:
$ HcNet-core new-db
This command will initialize the database as well as the bucket directory and then exit.
You can also use this command if your DB gets corrupted and you want to restart it from scratch.
HcNet-core stores the state of the ledger in a SQL database.
This DB should either be a SQLite database or, for larger production instances, a separate PostgreSQL server.
Note: Aurora currently depends on using PostgreSQL.
For how to specify the database, see the example config.
Some tables in the database act as a publishing queue for external systems such as Aurora and generate meta data for changes happening to the distributed ledger.
If not managed properly those tables will grow without bounds. To avoid this, a built-in scheduler will delete data from old ledgers that are not used anymore by other parts of the system (external systems included).
The settings that control the automatic maintenance behavior are:
AUTOMATIC_MAINTENANCE_PERIOD, AUTOMATIC_MAINTENANCE_COUNT and KNOWN_CURSORS.
By default, HcNet-core will perform this automatic maintenance, so be sure to disable it until you have done the appropriate data ingestion in downstream systems (Aurora for example sometimes needs to reingest data).
If you need to regenerate the meta data, the simplest way is to replay ledgers for the range you’re interested in after (optionally) clearing the database with newdb.
HcNet-core stores a duplicate copy of the ledger in the form of flat XDR files called “buckets.” These files are placed in a directory specified in the config file as BUCKET_DIR_PATH, which defaults to buckets. The bucket files are used for hashing and transmission of ledger differences to history archives.
Buckets should be stored on a fast local disk with sufficient space to store several times the size of the current ledger.
For the most part, the contents of both directories can be ignored as they are managed by HcNet-core.
HcNet-core normally interacts with one or more “history archives,” which are configurable facilities for storing and retrieving flat files containing history checkpoints: bucket files and history logs. History archives are usually off-site commodity storage services such as Amazon S3, Google Cloud Storage, Azure Blob Storage, or custom FBA/SFTP/HTTP servers.
Use command templates in the config file to give the specifics of which services you will use and how to access them. The example config shows how to configure a history archive through command templates.
While it is possible to run a HcNet-core node with no configured history archives, it will be severely limited, unable to participate fully in a network, and likely unable to acquire synchronization at all. At the very least, if you are joining an existing network in a read-only capacity, you will still need to configure a get command to access that network’s history archives.
Archive sections can also be configured with put and mkdir commands to cause the instance to publish to that archive (for nodes configured as archiver nodes or full validators).
The very first time you want to use your archive before starting your node you need to initialize it with: $ HcNet-core new-hist << /span>historyarchive>
IMPORTANT:
In addition, your should ensure that your operating environment is also functional.
In no particular order:
After having configured your node and its environment, you’re ready to start HcNet-core.
This can be done with a command equivalent to
$ HcNet-core run
At this point you’re ready to observe core’s activity as it joins the network.
Review the logging section to get yourself familiar with the output of HcNet-core.
While running, interaction with HcNet-core is done via an administrative HTTP endpoint. Commands can be submitted using command-line HTTP tools such as curl, or by running a command such as
$ HcNet-core http-command
The endpoint is not intended to be exposed to the public internet. It’s typically accessed by administrators, or by a mid-tier application to submit transactions to the Hashcash network.
See commands for a description of the available commands.
You can review the section on general node information;
the node will go through the following phases as it joins the network:
You should see authenticated_count increase.
Until the node sees a quorum, it will say
After observing consensus, a new field quorum will be set with information on what the network decided on, at this point the node will switch to “Catching up”:
This is a phase where the node downloads data from archives. The state will start with something like
and then go through the various phases of downloading and applying state such as
When the node is done catching up, its state will change to
HcNet-core sends logs to standard output and HcNet-core.log by default, configurable as LOG_FILE_PATH.
Log messages are classified by progressive priority levels: TRACE, DEBUG, INFO, WARNING, ERRORand FATAL. The logging system only emits those messages at or above its configured logging level.
The log level can be controlled by configuration, the -ll command-line flag or adjusted dynamically by administrative (HTTP) commands. Run:
$ HcNet-core http-command "ll?level=debug"
against a running system. Log levels can also be adjusted on a partition-by-partition basis through the administrative interface. For example the history system can be set to DEBUG-level logging by running:
$ HcNet-core http-command
"ll?level=debug&partition=history"
against a running system. The default log level is INFO, which is moderately verbose and should emit progress messages every few seconds under normal operation.
Information provided here can be used for both human operators and programmatic access.
Run $ HcNet-core http-command 'info' The output will look something like
peers gives information on the connectivity to the network, authenticated_count are live connections while pending_count are connections that are not fully established yet.
ledger represents the local state of your node, it may be different from the network state if your node was disconnected from the network for example.
notable fields in ledger are:
The state of a fresh node (reset with newdb), will look something like this:
Additional fields typically used by downstream systems:
In some cases, nodes will display additional status information:
The peers command returns information on the peers the instance is connected to.
This list is the result of both inbound connections from other peers and outbound connections from this node to other peers.
$ HcNet-core http-command 'peers'
The quorum command allows to diagnose problems with the quorum set of the local node.
Run
$ HcNet-core http-command 'quorum'
The output looks something like:
Entries to watch for are:
In the example above, 6 nodes are functioning properly, one is down (donovan), and the instance will fail if any two nodes out of the ones still working fail as well.
If a node is stuck in state Joining FBA, this command allows to quickly find the reason:
Note that the node not being able to reach consensus does not mean that the network as a whole will not be able to reach consensus (and the opposite is true, the network may fail because of a different set of validators failing).
You can get a sense of the quorum set health of a different node by doing $ HcNet-core http-command 'quorum?node=$sdf1 or $ HcNet-core http-command 'quorum?node=@GABCDE
Overall network health can be evaluated by walking through all nodes and looking at their health. Note that this is only an approximation as remote nodes may not have received the same messages (in particular: missing for other nodes is not reliable).
Maintenance here refers to anything involving taking your validator temporarily out of the network (to apply security patches, system upgrade, etc).
As an administrator of a validator, you must ensure that the maintenance you are about to apply to the validator is safe for the overall network and for your validator.
Safe means that the other validators that depend on yours will not be affected too much when you turn off your validator for maintenance and that your validator will continue to operate as part of the network when it comes back up.
We recommend performing the following steps in order (repeat sequentially as needed if you run multiple nodes).
The network itself has network wide settings that can be updated.
This is performed by validators voting for and agreeing to new values the same way than consensus is reached for transaction sets, etc.
The network settings are:
When the network time is later than the upgradetime specified in the upgrade settings, the validator will vote to update the network to the value specified in the upgrade setting.
When a validator is armed to change network values, the output of info will contain information about the vote.
For a new value to be adopted, the same level of consensus between nodes needs to be reached as for transaction sets.
Changes to network wide settings have to be orchestrated properly between validators as well as non validating nodes:
An improper plan may cause issues such as:
Example here is to upgrade the protocol version to version 9 on January-31-2018.
This section contains information that is useful to know but that should not stop somebody from running a node.
testnet.md is a short tutorial demonstrating how to configure and run a short-lived, isolated test network.
HcNet-core can be started directly from the command line, or through a supervision system such as init, upstart, or systemd.
HcNet-core can be gracefully exited at any time by delivering SIGINT or pressing CTRL-C. It can be safely, forcibly terminated with SIGTERM or SIGKILL. The latter may leave a stale lock file in the BUCKET_DIR_PATH, and you may need to remove the file before it will restart. Otherwise, all components are designed to recover from abrupt termination.
Support Agent
*Powered by HashCash