CsvPath Framework
  • CsvPath
  • DATA PREBOARDING
  • Getting Started
    • Quickstart
    • Organizing Inbound Data
      • Dataflow Diagram
      • The Three Data Spaces
        • Source Staging
        • Validation Assets
        • Trusted Publishing
      • How Data Progresses Through CsvPath Framework
        • Staging
          • Data Identity
          • Handling Variability
            • Templates
            • Named-file Reference Queries
          • Registration API and CLI
            • Loading
            • Going CLI-only
        • Validation and Upgrading
          • Templates
          • Run Using the API
          • Running In the CLI
          • Named-paths Reference Queries
        • Publishing
          • Inspect Run Results
            • Result API
            • More Templates and References
          • Export Data and Metadata
    • Csv and Excel Validation
      • Your First Validation, The Lazy Way
      • Your First Validation, The Easy Way
      • Your First Validation, The Hard Way
    • DataOps Integrations
      • Getting Started with CsvPath + OpenTelemetry
      • Getting Started With CsvPath + OpenLineage
      • Getting Started with CsvPath + SFTPPlus
        • SFTPPlus Implementation Checklist
      • Getting Started with CsvPath + CKAN
    • How-tos
      • How-to videos
      • Storage backend how-tos
        • Store source data and/or named-paths and/or the archive in AWS S3
        • Loading files from S3, SFTP, or Azure
        • Add a file by https
        • Store source data and/or named-paths and/or the archive in Azure
        • Store source data and/or named-paths and/or the archive in Google Cloud Storage
      • CsvPath in AWS Lambda
      • Call a webhook at the end of a run
      • Setup notifications to Slack
      • Send run events to Sqlite
      • Execute a script at the end of a run
      • Send events to MySQL or Postgres
      • Sending results by SFTP
      • Another (longer) Example
        • Another Example, Part 1
        • Another Example, Part 2
      • Working with error messages
      • Sending results to CKAN
      • Transfer a file out of CsvPath
      • File references and rewind/replay how-tos
        • Replay Using References
        • Doing rewind / replay, part 1
        • Doing rewind / replay, part 2
        • Referring to named-file versions
      • Config Setup
      • Debugging Your CsvPaths
      • Creating a derived file
      • Run CsvPath on Jenkins
    • A Helping Hand
  • Topics
    • The CLI
    • High-level Topics
      • Why CsvPath?
      • CsvPath Use Cases
      • Paths To Production
      • Solution Storming
    • Validation
      • Schemas Or Rules?
      • Well-formed, Valid, Canonical, and Correct
      • Validation Strategies
    • Python
      • Python vs. CsvPath
      • Python Starters
    • Product Comparisons
      • The Data Preboarding Comparison Worksheet
    • Data, Validation Files, and Storage
      • Named Files and Paths
      • Where Do I Find Results?
      • Storage Backends
      • File Management
    • Language Basics
    • A CsvPath Cheatsheet
    • The Collect, Store, Validate Pattern
    • The Modes
    • The Reference Data Types
    • Manifests and Metadata
    • Serial Or Breadth-first Runs?
    • Namespacing With the Archive
    • Glossary
  • Privacy Policy
Powered by GitBook
On this page
  • How are references used?
  • Examples
  1. Getting Started
  2. Organizing Inbound Data
  3. How Data Progresses Through CsvPath Framework
  4. Validation and Upgrading

Named-paths Reference Queries

How are references used?

References are used to run individual csvpaths or subsets of csvpaths in named-paths groups. For named-paths references their form is:

$root.csvpaths.name_one.name_two

The four sections are:

  • Root: a named-file name, named-path name, or named-result name

  • csvpaths: the datatype that indicates we are working with named-paths

  • name_one: the most important id/name/date, etc. we're pointing to (name_one is the underlying name of this field, but you won't see it too often)

  • name_three: a secondary id/name/date, etc. that helps determine what the reference is to

References from CsvPath Validation Language that pick out variables from other runs are a case where you can use a # to point to variables from a specific csvpath instance in the other results, or even an earlier csvpath instance in the currently running results.

A reference of this type might look like:

$mypaths#myinstance.variables.city

This reference says to pull the value of the city variable from the myinstance csvpath variables from the most recent mypaths named-results run. If you didn't use #myinstance you would be pulling the city variable from the union of all the variable sets created in the most recent run of mypaths. Since two csvpath instances might both leave behind their own city variable with a different value you might want to be more specific.

Like named-file references, named-paths references use "pointers". A pointer looks like a colon followed by a word or number. Pointers enable dynamic references. For named-paths, the pointers are:

  • :from — used to indicate a run should start from a certain csvpath

  • :to — like :from, but indicating the run should stop with a certain csvpath

  • :n (n = any integer from 0 to 99) — indicates which csvpath to return from the group

As noted above, in some few cases you can split the root, name_one, and name_three path segments using a #. In fact, grammatically, you can always do this; however, the support for # separated words having a good effect is inconsistent and the intended usage has not yet settled.

You might see the values created by a # referred to as root_minor and name_two and name_four.

But again, it is not common. Unless directions say to use the capability, you should not.

Examples

Named-paths references are simple. They only need to give dynamic control over which csvpaths within the named-paths group to run. A named-paths reference does that by using the :from and :to pointers to indicate the starting or stopping csvpaths. If there is no pointer the reference is to the specific csvpath named. You can refer to a specific csvpath by name or by a colon-number pointer.

$invoices.csvpaths.cleanup:from

This reference says to do a run of the invoices named-paths starting from the cleanup csvpath. Or, to be more precise, it references the specific csvpath statements — but that typically means we're setting up a named-paths group run. It is, of course, equally easy to pass a reference to PathsManager.get_named_paths() and get back a list of the csvpath statements.

Let's say that the invoices named-paths looks like:

~ id: print date ~
$[0][ @day = today() print("Today is $.variables.day")]
---- CSVPATH ----
~ id: cleanup ~
$[*][ replace(#city, uppercase(#city)) ] 
---- CSVPATH ----
~ id: add line number ~
$[*][ append("line", line_number())] 

The reference would run the second (cleanup) and third (add line number) csvpaths in the group. The print date csvpath would not run.

$invoices.csvpaths.cleanup:1

Using the same example, this reference would return only the second csvpath from the invoices named-paths group.

$invoices.csvpaths.cleanup:to

And finally, this version of the reference would run the first csvpath (print date) and the second (cleanup), but would not run the third csvpath in the group.

PreviousRunning In the CLINextPublishing

Last updated 2 months ago