Loading and Saving

Loading and Saving datasets

Load from CSV

JuliaDB.loadtableFunction.

loadtable(files::Union{AbstractVector,String}; <options>)

Load a table from CSV files.

files is either a vector of file paths, or a directory name.

Options:

  • indexcols::Vector – columns to use as primary key columns. (defaults to [])

  • datacols::Vector – non-indexed columns. (defaults to all columns but indexed columns)

  • distributed::Bool – should the output dataset be loaded in a distributed way? If true, this will use all available worker processes to load the data. (defaults to true if workers are available, false if not)

  • chunks::Bool – number of chunks to create when loading distributed. (defaults to number of workers)

  • delim::Char – the delimiter character. (defaults to ,)

  • quotechar::Char – quote character. (defaults to ")

  • escapechar::Char – escape character. (defaults to \)

  • header_exists::Bool – does header exist in the files? (defaults to true)

  • colnames::Vector{String} – specify column names for the files, use this with (header_exists=true, otherwise first row is discarded). By default column names are assumed to be present in the file.

  • samecols – a vector of tuples of strings where each tuple contains alternative names for the same column. For example, if some files have the name "vendor_id" and others have the name "VendorID", pass samecols=[("VendorID", "vendor_id")].

  • colparsers – either a vector or dictionary of data types or an AbstractToken object from TextParse package. By default, these are inferred automatically. See type_detect_rows option below.

  • type_detect_rows: number of rows to use to infer the initial colparsers defaults to 20.

  • nastrings::Vector{String} – strings that are to be considered NA. (defaults to TextParse.NA_STRINGS)

  • skiplines_begin::Char – skip some lines in the beginning of each file. (doesn't skip by default)

  • usecache::Bool: use cached metadata from previous loads while loading the files. Set this to false if you are changing other options.

source
JuliaDB.loadndsparseFunction.

loadndsparse(files::Union{AbstractVector,String}; <options>)

Load an NDSparse from CSV files.

files is either a vector of file paths, or a directory name.

Options:

  • indexcols::Vector – columns to use as indexed columns. (by default a 1:n implicit index is used.)

  • datacols::Vector – non-indexed columns. (defaults to all columns but indexed columns)

All other options are identical to those in loadtable

source

Save and Load blobs

Dagger.saveFunction.

save(t::Union{DNDSparse, DTable}, outputdir::AbstractString)

Saves a distributed dataset to disk. Saved data can be loaded with load.

source
Dagger.loadFunction.

load(dir::AbstractString; tomemory)

Load a saved DNDSparse from dir directory. Data can be saved using the save function.

source