Basics
Pnyx is built upon a suite of interfaces, all of which can be found in pnyx.net.api
. These interfaces
can be roughly divided into 2 categories: line and row. Pnyx defines a line as a String
, without line-termination.
Row is defined as list of strings: List<String>
, without delimiter or line-termination. In both cases, the
formatting is only used when reading and writing to files / streams.
Internally data is processed independent of the source. For example, CSV and tab delimited files are represented as rows, with
formatting only being used to read from the source. Formatting (CSV vs. Tab), encoding (ASCII vs. UTF), and line-feeds
(Windows vs. Unix) are all used when read and writing, but ignored when filtering and transforming. Formatting, encoding, and
line-feeds are preserved by Pnyx, which frees developers to work on higher level functionality.
Operations
For both lines and rows, there are several operations with built-in support in Pnyx. For each operation, a corresponding interface
is available. When choosing an interface to use, it is best to use the simplest interface that meets your needs, because there are
several higher-level operations for chaining, combining filters. The table below illustrates the core interfaces in Pnyx:
Looking at the table above, the Filter is the simplest operator and Buffering is the most complicated. Filters can be chained together
with boolean operators (and, or, not, xor) and be shimmed (
see Shims). Transforms can shimmed, and Buffering has no special operations
available.
Here is a quick overview of the 3 primary operations:
- Filter - returns true to keep a line/row, and false to ignore. This is by far the simplest operation and is encouraged to
use when a filter fits your needs.
- Transform - returns a line/row, and optionally modifies the content. When
null
is returned, the line/row
is ignored. Use a Transform when the line/row must change.
- Buffering - returns a list of lines/rows, or
null
to skip. When Pnyx finishes with a data source, Buffering is flushed.
Only use Buffering when the resultant data depends on multiple lines/rows or when output exceeds the input.
Processors
Operators are the part of the API that you will most likely implement. On their own, however, they are not much use. That is where Processors come in.
For each of the 6 interfaces listed above, there is a corresponding Processor, which can be chained together to build a complete "pnyx". Normally the
process of chaining together Processors is handled by the
Fluent API. The rest of the documentation will focus on the
Fluent API, but for completeness, the following code illustrates chaining processors together by hand
(same example as Hello World on library page):
StreamInformation streamInformation = new StreamInformation(new Settings());
// Writes to STDOUT
ILineProcessor dest = new LineProcessorToStream(streamInformation, Console.OpenStandardOutput());
// Grep filter / processor pair
ILineFilter grepFilter = new Grep {textToFind = "world", caseSensitive = false};
ILineProcessor grepProcessor = new LineFilterProcessor { filter = grepFilter, processor = dest };
// Sed transformer / processor pair
ILineTransformer sedTransformer = new SedReplace("World", "World, with love from Pnyx..", null);
ILineProcessor sedProcessor = new LineTransformerProcessor { transform = sedTransformer, processor = grepProcessor };
// Reads from source
using (StringStreamFactory streamFactory = new StringStreamFactory("Hello World."))
{
using (StreamToLineProcessor source = new StreamToLineProcessor(streamInformation, streamFactory))
{
source.setNextLineProcessor(sedProcessor);
// Runs
source.process(); // All I/O occurs on this step
}
}
// outputs: Hello World, with love from Pnyx...
Shims
A type may implement more than one interface. For instance, a filter may have a slightly
different algorithm for row data vs. line data, and therefore, implements both ILineFilter
and IRowFilter
. However, it is not
a requirement to implement both row and line versions of the interfaces. For filters and transforms, Line operators can be shimmed to work on row data.
The following example shows how SedReplace
, which is a ILineFilter
, can be used on CSV data. Notice how the commas used as part of
the CSV formatting are preserved while the comma within the row data is replaced with an underscore.
using (Pnyx p = new Pnyx())
{
p.readString("CSV,INPUT!,\"Go, Pnyx Go\"");
p.parseCsv();
p.sed("[,!]", "_", "g"); // replace ,! with _
p.writeStdout();
}
// outputs: CSV,INPUT_,"Go_ Pnyx Go"
Next
Suggested next step:
- Fluent, learn how to use the Fluent API