Reference

Below the reference for all of the methods available on Pnyx.
Input
startLine: Sets a Line source for Pnyx, which provides the inital Line data. Use this for building custom sources, and make sure to implement IDisposable when resources are opened.

For Stream sources, use one of the other Input methods.

  • lineProcessor ILinePart: Custom source
startRow: Sets a Row source for Pnyx, which provides the inital Row data. Use this for building custom sources, and make sure to implement IDisposable when resources are opened.

For Stream sources, use one of the other Input methods.

  • rowProcessor IRowPart: Custom source
  • rowConverter IRowConverter: Used for converting to/from Row and Line data
readStreamFactory: Generic method for reading from a Stream source. Use this method when the timing of opening the Stream is important, otherwise, it is simpler to use readStream.
  • streamFactory IStreamFactory: Factory for building the source Stream
read: Reads from the specified file.

File is opened when Pnyx is processed.

  • path String: Local or absolute path of file to open
readStream: Generic method for reading from a Stream source.

Even though the stream is created outside of Pnyx, it is disposed as part of the standard Pnyx dispose.

  • input Stream: Stream to read from
readString: Sets source from a String. Use this method for testing, or text sources.

There is no associated encoding for String, so the default encoding is used when writing output.

  • source String: String to read from
readStdin: Use STD-IN as source.
cat: Concatenates multiple sources, similar to the Linux utility cat. When combining sources, the first source sets the encoding used for Pnyx.

See State Machine and Input examples.

  • block Action<Pnyx>: Lambda expression for specifying the multiple sources via method calls to Pnyx
asCsv: Reads source as a CSV file. Use this method, as apposed to parseCsv, when the CSV has embedded newlines. Newlines must be wrapped in quotes to be read as part of the Row data.
  • block Action<Pnyx>: Lambda expression for specifying the source.
  • strict bool default=True: When true, formatting error cause exceptions to be raised. When false, formatting issues are ignored.
  • hasHeader bool default=False: When true, the first line is used as the header, and is not filtered / transformed.
head: Filter for stopping process whenever a specified number of lines/rows, similar to the Linux utility head.

As a filter, head can be placed after other filters, and in that case, only counts lines/rows that it receives.

Regardless of where head is placed, processing is terminated whenever the limit is reached. Method is safe to use against really large files.

  • limit int default=1: Number of lines/rows to limit. Must be 1 of greater.
tail: Buffering for restricting output to a specified number of lines/rows. As a buffering, the entire source is processed, before outputting.

When reading really large files, use tailStream instead, which restricts source data.

  • limit int default=1: Number of lines/rows to limit. Must be 1 of greater.
tailStream: Modifies source to read a specified number of lines/rows from the end of the file, similar to the Linux utility tail.

Method seeks the starting position within the stream, and then reads the end of the file. This means the specified number of lines/rows are read twice, but is safe to use against really large files.

Stream must be seekable to tail.

  • block Action<Pnyx>: Lambda expression for specifying the source.
  • limit int default=1: Number of lines/rows to limit. Must be 1 of greater.
Line
linePart: Low level method for setting custom ILineProcessor. Only use when processing that isn't supported by filters, transforms, and buffering.
  • linePart ILinePart: Object that implements both ILinePart, ILineProcessor
lineFilter: Method to add a line filter. Any filter type can be used so long as it adheres to ILineFilter interface.
  • filter ILineFilter: LineFilter
lineTransformer: Method to add a line transform. Any transform type can be used so long as it adheres to ILineTransformer interface.
  • transform ILineTransformer: LineTranform
lineBuffering: Method to add a line buffering. Any buffering type can be used so long as it adheres to ILineBuffering interface.
  • buffering ILineBuffering: LineBuffering
lineFilterFunc: Method to add an ad-hoc line filter.
  • filter Func<String, bool>: Function / lambda-expression to filter line data
lineTransformerFunc: Method to add an ad-hoc line transform.
  • transform Func<String, String>: Function / lambda-expression to tranform line data
Boolean
and: Wraps a group a filters using AND boolean logic.
  • block Action<Pnyx>: Lambda expression for specifying the filters to group.
or: Wraps a group a filters using OR boolean logic.
  • block Action<Pnyx>: Lambda expression for specifying the filters to group.
xor: Wraps a group a filters using XOR boolean logic.
  • block Action<Pnyx>: Lambda expression for specifying the filters to group.
not: Wraps a group a filters using NOT boolean logic. If multiple filters are specified, then NAND boolean logic is used.
  • block Action<Pnyx>: Lambda expression for specifying the filters to group.
Line, Row Conversion
parseCsv: Converts lines to rows by parsing as CSV. Converter can be placed anywhere line operations are legal. However, when placed as the first line operation, this method as like parseCsv by parsing the CSV data directly from the source.
  • strict bool default=True: When true, formatting error cause exceptions to be raised. When false, formatting issues are ignored.
  • hasHeader bool default=False: When true, the first line is used as the header, and is not filtered / transformed.
parseDelimiter: Converts lines to rows by simple delimiter parsing. Converter can be placed anywhere line operations are legal.
  • delimiter String: String to match and consume when parsing line
  • hasHeader bool default=False: When true, the first line is used as the header, and is not filtered / transformed.
parseTab: Converts lines to rows by parsing tabs. Converter can be placed anywhere line operations are legal.
  • hasHeader bool default=False: When true, the first line is used as the header, and is not filtered / transformed.
printDelimiter: Converts rows to lines by joining with specified delimiter. Converter can be placed anywhere row operations are legal.
  • delimiter String: Delimiter used to separate content within line
printTab: Converts rows to lines by joining with tabs. Converter can be placed anywhere row operations are legal.
lineToRow: Converts lines to rows using the specified IRowConverter. Converter can be placed anywhere line operations are legal.
  • converter IRowConverter: Custom converter to parse line data into rows
  • hasHeader bool default=False: When true, the first line is used as the header, and is not filtered / transformed.
rowToLine: Converts rows to lines using the specified IRowConverter. Converter can be placed anywhere row operations are legal.

Note, when passed converter is null, then the original IRowConverter (that parsed line data) will be used.

  • converter IRowConverter default=: Custom converter to write line data, or null to use original converter
print: Formats either row or line data.

Converts rows to lines using a format string. Format string replaces $0 with the whole row, using the original IRowConverter. $1, $2, $xx is replaced with the value from the corresponding columns.

Escape $ character with $$. Output tabs and newlines with \t and \r \n. Escape \ with \\.

Print can also be used to format a line (doesn't need to be row data). However, only the $0 replacement is defined.

  • format String: Format string
Row
shimAnd: Row filter used to shim ILineFilter using the boolean AND operator. For the row to be filtered 'in', each column must match the line filter.
  • block Action<Pnyx>: Lambda expression for specifying the filters to group.
rowPart: Low level method for setting custom IRowProcessor. Only use for processing that isn't supported by filters, transforms, and buffering.
  • rowPart IRowPart: Object that implements both IRowPart, IRowProcessor
rowFilter: Method to add a row filter. Any filter type can be used so long as it adheres to IRowFilter interface.
  • rowFilter IRowFilter: Row filter
rowTransformer: Method to add a row transform. Any transform type can be used so long as it adheres to IRowTransformer interface.
  • transform IRowTransformer: Row transformer
rowFilterFunc: Method to add an ad-hoc row filter.
  • filter Func<List<String>, bool>: Function / lambda-expression to filter row data
rowTransformerFunc: Method to add an ad-hoc row transform.
  • transform Func<List<String>, List<String>>: Function / lambda-expression to transform row data
  • treatHeaderAsRow bool default=False: When true, header is transformed, similar to any other record. When false header is passed through without modification. In both cases, header record is still considered a header record.
rowBuffering: Method to add a row buffering. Any buffering type can be used so long as it adheres to IRowBuffering interface.
  • buffering IRowBuffering: Row buffering
columnDefinition: Aggregate row buffering operator for determining maximum width, minimum width and nullability for each column. Use this method when sizing data for SQL imports.

Use this in conjunction with swapColumnsAndRows to create SQL pseudo code.

Using the limit parameter, this method is safe on really large files. However, without the limit, it will read the entire file.

  • limit int? default=: The number for rows to read before termination processing.
  • maxWidth bool default=False: When true, maxWidth is reported
  • hasHeaderRow bool default=False: When true, header name is output
  • minWidth bool default=False: When true, minWidth is reported
  • nullable bool default=False: When true, column data is tested for whether non-null values exist
swapColumnsAndRows: Aggregate row buffering operator for swapping the positions of columns and rows.

Use this at your own risk. The entire file is read into memory before performing operation.

hasColumns: Row filter for removing rows that are missing data for specific column numbers. If any of the specified columns lack data, then the whole row is filtered out.
  • verifyColumnHasText bool: When true, empty strings are considered missing data, and filtered out
  • columnNumbers int[]: List of columns to check
widthColumns: Row transformer to set the number for columns. If row has less columns, then new columns are inserted using the pad value. If row has too many columns, then they are removed. Output always has the specified number of columns regardless of input.

When columns are added, default header names are used (if headers are present).

  • columns int: The number of columns to output
  • pad String default=: String to fill in added column values
removeColumns: Row transformer to drop specific columns. Column numbers are specified before removing columns, so that removing a column doesn't affect the numbering of subsequent column numbers.

For rows where data is missing columns, those columns are simply skipped.

  • columnNumbers int[]: List of columns to remove
insertColumnsWithPadding: Row transformer to add specific columns. Column numbers are specificed before inserting columns, so that adding a column doesn't affect the numbering of subsequent column numbers.

When columns are added, default header names are used (if headers are present).

Columns can be inserted anywhere before or between existing columns. However, only 1 column can be inserted after existing columns. If more is needed, then use widthColumns method, or make multiple calls to this method. Any column number specified beyound 1 extra column is ignored.

  • pad String default=: Text to insert for new columns
  • columnNumbers int[]: Columns numbers to insert
insertColumns: See insertColumnsWithPadding, using empty string as padding
  • columnNumbers int[]: Columns numbers to insert
duplicateColumns: Row transformer to duplicate specific columns. Column numbers are specificed before inserting columns, so that adding a column doesn't affect the numbering of subsequent column numbers.

When columns are added, header names are duplicated (if headers are present).

For a column to be duplicated, it must be present in passed row, otherwise, it is ignored.

  • columnNumbers int[]: Columns numbers to duplicate
headerNames: Row transformer to set the names of headers, if headers are present. Parameters can be mixed between column numbers (int) and names (String), any other types cause an exception.

To set first 3 headers, headerNames("my header1", "my header2", "my header3")
To set 2nd headers, headerNames(2, "my header2")
To set 2nd and 3rd headers, headerNames(2, "my header2", "my header3")
To set 1st and 3rd headers, headerNames("my header1", 3, "my header3")

  • columnNumbersAndNames Object[]: Column numbers and header names
selectColumns: Row transformer to select specific columns. If specified column does not exist in passed rows, then column is padded with an empty string.

If only 1 column is specified, then output is still row data. Use printColumn to convert to line data.

  • columnNumbers int[]: Column numbers to select
printColumn: Converter to select a specific column, which is converted to line data. If column doesn't exist in passed rows, then resultant lines will contain empty strings.
  • columnNumber int: Column number to select
withColumns: Modifier to narrow filter and transform to operate on a subset of columns.
  • block Action<Pnyx>: Lambda expression for specifying filters and transforms, which operate on a subset of columns
  • columnNumbers int[]: Column numbers to narrow filtering and tranforming
Util
grep: Line filter that keeps rows which contain textToFind. This method is similar to the Linux grep utility.
  • textToFind String: Text to find in line
  • caseSensitive bool default=True: True to match exact case, false to ignore
egrep: Line filter that keeps rows which match the regular expression. This method is similar to the Linux egrep utility.

This filter uses the built in .Net regular express engine. See Reference for more exact syntax.

  • expression String: Regular expression for filtering lines.
  • caseSensitive bool default=True: True to match exact case, false to ignore
hasLine: Line filter to remove empty lines
sed: Line transformer to replace regular expression matches. This method is similar to the Linux sed utility. Group replacements are supported. Standard sed flags are supported too, including ranges.

For example usage, see SedReplaceTest.cs.

This filter uses the built in .Net regular express engine. See Reference for more exact syntax.

  • pattern String: Regular expression pattern, which is replaced.
  • replacement String: Replacement format. Use \1 - \x to replace regex groups
  • flags String default=: String of flags. i - to ignore case, g - for global replace, or numeric range specify the matches to replace.
sedLineNumber: Line buffering to insert the line number before each line.
sedAppendRow: Row buffering to append toAppend row after each row.
  • toAppend List<String>: Row to append
sedAppendLine: Line buffering to append text line after each line.
  • text String: Line to append
sedInsert: Line buffering to insert text line before each line.
  • text String: Line to insert
beforeAfterFilter: Line / Row buffering to return matches before and after the matches of a wrapped filter. Pnyx determine whether to use Line or Row buffering.
  • before int: Count of lines/rows to return prior to a match from the wrapped filter. Value of 0 is a no-op
  • after int: Count of lines/rows to return after a match from the wrapped filter. Value of 0 is a no-op
  • block Action<Pnyx>: Lambda expression for specifying the filter to wrap.
sortLine: Line processor to sort all lines. Processor uses 2 levels of sorting: memory sorting and file sorting. Memory sorting uses a buffer, whose size is specified by parameter, defaulting to Settings.bufferLines. If source exceeds buffer size, then temporary files are used, placed in the specified directory, defaulting to Settings.tempDirectory.

When temporary files are used, a merge sort is performed, which handles an arbitrary number of records. This processor is safe to use on really large files, processing time aside.

  • descending bool default=False: True to sort text in descending order (biggest to smallest), false sorts in ascending order
  • caseSensitive bool default=False: True to match exact case, false to ignore
  • unique bool default=False: When true, duplicate lines are removed. False allows duplicates
  • tempDirectory String default=: Location of temporary files
  • bufferLines int? default=: Number of lines to buffer before using temporary files
sortRow: Row processor to sort all rows. Processor uses 2 levels of sorting: memory sorting and file sorting. Memory sorting uses a buffer, whose size is specified by parameter, defaulting to Settings.bufferLines. If source exceeds buffer size, then temporary files are used, placed in the specified directory, defaulting to Settings.tempDirectory.

When temporary files are used, a merge sort is performed, which handles an arbitrary number of records. This processor is safe to use on really large files, processing time aside.

  • columnNumbers int[] default=: Columns to compare when sorting rows
  • descending bool default=False: True to sort text in descending order (biggest to smallest), false sorts in ascending order
  • caseSensitive bool default=False: True to match exact case, false to ignore
  • unique bool default=False: When true, duplicate rows are removed. False allows duplicates
  • tempDirectory String default=: Location of temporary files
  • bufferRows int? default=: Number of rows to buffer before using temporary files
setSettings: Update settings for this Pnyx. Settings modified here only affect this one instance. Default value for settings are copied from SettingsHome when Pnyx is created.

To update settings for all subsequent Pnyx objects, use:
SettingsHome.settingsFactory = new SettingsHome(globalSettings);

  • tempDirectory String default=: Temporary directory for file used by sort and rewrite
  • bufferLines int? default=: Buffer size for sort
  • defaultEncoding Encoding default=: Encoding used when source is missing encoding
  • outputEncoding Encoding default=: When set, forces the output of all files to use the specified encoding. When not set, the encoding of the input file is used for output.
  • defaultNewline String default=: Newline character(s) to use when source is missing newlines
  • outputNewline String default=: When set, forces the output of all files to use the specified newline. When not set, the newline of the input file is used for output.
  • backupRewrite bool? default=: Flag to create a back before modifying a file with the rewrite method
  • processOnDispose bool? default=: When true, disposing a Pnyx will process it assuming enough parts are present. This option is discouraged in production as exceptions could potientally lead to unintended output.
  • stdIoDefault bool? default=: When true, Pnyx will automatically wire an STD-IN source and an STD-OUT output if not explicitly set. pnyx.cmd forces this flag to true for convenience.
Output
write: Outputs line/row data to a file.
  • path String: Path to write
writeStream: Outputs line/row data to a stream.

Even though the stream is created outside of Pnyx, it is disposed as part of the standard Pnyx dispose.

  • output Stream:
writeCsv: Outputs line/row data to a CSV file.
  • path String: Path to write
  • strict bool default=True: Specifies strict mode, in cases where line needs to be converted to row data
writeCsvStream: Outputs line/row data to a CSV file.

Even though the stream is created outside of Pnyx, it is disposed as part of the standard Pnyx dispose.

  • stream Stream: Stream to write to
  • strict bool default=True: Specifies strict mode, in cases where line needs to be converted to row data
writeStdout: Outputs line/row data to STD-OUT.
writeSplit: Outputs line/row data to files limited by a specific number of records. This is similar Linx split utility.
  • fileNamePattern String: File name pattern. $0 is replaced by a counter
  • limit int: Number of records to limit for each file
  • path String default=: Location to save files
rewrite: Write output to same file used for source data. Method only works in combination read method. Records are written to a temporary file. Once all output is written, the temporary file is moved original file. If backupOriginal is specified, then a copy of the original file is taken, before overwriting the original file. If deleteBackup is true, then backup is removed after a successful overwrite of original.

It is recommend to use this method with caution. Only turned on deleteBackup when confident that Pnyx produces desired output. It is recommended to always keeps backupOriginal on.

  • backupOriginal bool? default=: Flag to determine whether the original file is copied before being overwritten
  • deleteBackup bool? default=: True to remove backups after a successful rewrite. False preserves the backups
captureText: Writes output to a StringBuilder. Use this for testing or for building text with Pnyx.
  • builder StringBuilder: Builder to write output to
tee: The tee processor duplicates the state and data of the current Pnyx and creates a second Pnyx object, which can be used for writing additional output. This method is similar to the Linux tee utility.

This method does not change the state of the current Pnyx.

See tee examples.

  • block Action<Pnyx>: Lambda expression for operations on second copy of data
StateChange
processToString: Writes output to a String. Use this for testing or for building text with Pnyx.

This method is a simpler version of captureText, but has the down side of immediately processing the Pnyx. Also, use this method whenever a custom write method is needed, like writing to a CSV.

  • writeAction Action<Pnyx, Stream> default=: Action, which exposes a MemoryStream, to wrap a specific write method
process: Processes the Pnyx object. All input and output is performed during this method call.

Pnyx should be compiled before calling this method. However, if it hasn't been compiled, this method attempts to compile first, including auto writing STD-IN/STD-OUT when stdIoDefault setting is on.

compile: Compiles the Pnyx object. Pnyx should be in the End state before calling, unless stdIoDefault is enabled.