Skip to main content

tableschema-swift

Build Coverage Codebase Support

This is a draft Swift language implementation of TableSchema for defining schemas to work with tabular data.

A schema on tabular data defines types, imposes constraints, and creates foreign key relationships on fields as data values move from some physical representation to a logical one and vice versa. For instance, a stored CSV file (physical) can be loaded in-memory along with a corresponding schema descriptor to be transformed from string values to Swift Standard Library types like Date or Int (logical).

Requirements#

  • Source compatibility with Swift 4.2
  • Target platforms
    • Apple platforms, specifically iOS and macOS
      • Full functionality in iOS >= 10 and macOS >= 10.12
    • Linux, limited by features available in swift-corelibs-foundation
  • Apple's Foundation framework is the only dependency
  • Independent from any one particular physical representation

Implementation Status#

Being a draft implementation means APIs have not been solidified and are subject to change. However, much of the foundation has been laid, there is a testing suite to keep what should be working in check, and it is being used in at least one shipping product over a subset of the available features. The approach has been implementing features on an as-needed basis.

Feature Status#

FeatureStatus
Streaming and cast on iterationAvailable
Casting field types and formatsPartial
[De]serializationAvailable in Tabular Data Package
Schema inferenceMissing (Unlikely to implement)
Strict modeMissing
Constraint validationMissing
Foreign key validationMissing
Rich (RDF) TypesMissing (Unlikely to implement)

Casting Field Types and Formats Status#

TypeFormatsAdditional PropertiesForward Status (Physical to Logical)Reverse Status (Logical to Physical)
stringdefault, uri, binary, uuidN/AAvailableAvailable
stringemailN/AUnavailableUnavailable
numberN/AAnyUnavailableUnavailable
integerN/AbareNumber = falseAvailable*Available
integerN/AbareNumber = trueAvailableAvailable
booleanN/AtrueValues, falseValuesAvailableAvailable
objectN/AN/AAvailableUnavailable
arrayN/AN/AAvailableAvailable
dateN/AdefaultAvailable*Unavailable
dateN/Aany, patternUnavailableUnavailable
timeN/AdefaultAvailable*Unavailable
timeN/Aany, patternUnavailableUnavailable
datetimeN/AdefaultAvailable*Available*
datetimeN/Aany, patternUnavailableUnavailable
yearN/AN/AAvailable*Unavailable
yearmonthN/AN/AAvailable*Unavailable
durationN/AN/AAvailable*Unavailable
geopointdefault, array, objectN/AAvailableUnavailable
geojsondefault, topoN/AUnavailableUnavailable
anyN/AN/AUnavailableUnavailable

* Only available on Apple products (namely iOS and macOS) due to an incomplete implementation in swift-corelibs-foundation

Integration into Your Project#

This project is set up using Swift Package Manager. Ideally add it to your project's SPM dependencies or use Xcode's integrated Swift Package Manager. Alternatively, generate your own Xcode .xcodeproj to integrate with your build system using:

swift package generate-xcodeproj --xcconfig-overrides ./Configuration.xcconfig

Example Usage#

Cast on Iteration from a Data Source#

Deserializing of data (from, say, a CSV file) can be accomplished by setting up a Table with an iterator that provides row information using a TableProvider data source. This allows for the data source to stream data rather than necessarily loading everything in-memory. Table is agnostic from the specific data source but expects the data source to convert to String representations.

let sourcePath = "import.csv"
let sourceDialect = DialectalCSV.Dialect()
let fields = [Field("name", type: .string), Field("birthday", type: .date)]
let schema = Schema(fields)
guard let provider = MyTableProvider(atPath: sourcePath, dialect: sourceDialect) else {
fatalError("Oops")
}
let table = Table(provider: AnyTableProvider(provider), schema: schema)
let objects = table.map { $0 }

And defining MyTableProvider together with a CSV parsing library like DialectalCSV:

class MyTableProvider: TableProvider {
private let handler: DialectalCSV.InputHandler
private let streamIterator: DialectalCSV.InputIterator
init?(atPath path: String, dialect: DialectalCSV.Dialect) {
guard let handler = DialectalCSV.InputHandler(atPath: path, dialect: dialect) else {
return nil
}
self.handler = handler
self.streamIterator = handler.makeIterator()
}
// MARK: - TableProvider
var header: Header? {
return self.streamIterator.header
}
// MARK: - Sequence
func makeIterator() -> AnyIterator<[String?]> {
return AnyIterator {
return self.streamIterator.next()
}
}
}

Reverse Casting (Logical to Physical)#

Cast the entire data set in-memory:

let objects = [[Any?]]()
let rows = objects.map { schema.reverseCast(row: $0) }

Or streaming output using a CSV parsing library like DialectalCSV:

let objects: [[Any?]] = [["River Tam", Date(timeIntervalSince1970: 16725225600)],["Simon Tam", nil]]
let destinationPath = "export.csv"
var destinationDialect = DialectalCSV.Dialect()
destinationDialect.nullSequence = "null"
FileManager.default.createFile(atPath: destinationPath, contents: nil)
guard let outputHandler = DialectalCSV.OutputHandler(atPath: destinationPath, dialect: destinationDialect) else {
fatalError("Oops")
}
let header = schema.fields.map { $0.name }
try? outputHandler.open(header: header)
for object in objects {
let row = schema.reverseCast(row: object).map { $0 }
try? outputHandler.append(records: [row])
}
try? outputHandler.close()
Last updated on by roll