The Atomic Unit: ColumnVector

At the heart of Charton's performance lies the ColumnVector. While most visualization libraries treat data as a collection of loose objects or rows, Charton adopts a Columnar Memory Layout. This architecture is inspired by Apache Arrow and Polars, ensuring that data is stored in contiguous memory blocks for CPU cache efficiency and potential SIMD acceleration.

The Anatomy of a Column

A ColumnVector is a specialized enum that encapsulates data types relevant to data science and visualization. Every variant (except for those with intrinsic null representation) follows a dual-structure:

  1. Data Buffer: A Vec<T> containing the raw physical values.
  2. Validity Bitmask: An Option<Vec>` where each bit represents whether a row is "Valid" (1) or "Null" (0).

The Categorical Advantage

One of the most important types for visualization is Categorical. Instead of storing repetitive strings (like "Group A", "Group A"...), it stores u32 keys pointing to a unique dictionary of values. This is essential for rendering large datasets with repetitive labels while keeping memory usage flat.

Manual Construction

Charton provides high-level constructors to turn various Rust string collections into memory-efficient categorical columns automatically.

1. From Raw Strings (No Nulls)

If your data is complete, you can pass collections of String or &str directly. Charton will handle the deduplication and dictionary encoding.

#![allow(unused)]
fn main() {
// Supporting Vec<&str> or Vec<String>
let cities = vec!["London", "Paris", "London", "Tokyo"];
let col = ColumnVector::from_str_as_cat(cities);
}

2. From Optional Strings (With Null Support)

For datasets with missing values, use from_str_as_cat_opt. This version automatically builds the internal Validity Bitmask.

#![allow(unused)]
fn main() {
// Supporting Vec<Option<&str>> or Vec<Option<String>>
let status = vec![Some("High"), None, Some("Low"), Some("High")];
let col = ColumnVector::from_str_as_cat_opt(status);
}

3. Why Use Categorical?

  • Memory Efficiency: 1 million rows of "Male"/"Female" takes ~1MB as Categorical, compared to ~20MB+ as raw String.
  • Encoding Ready: The underlying u32 keys are used directly by Charton's color scales and legend generators.

Why this Layout ?

  • Polars-Friendly: The variants map 1:1 to Polars DataTypes, allowing for near zero-cost ingestion from Polars DataFrames via the load_polars_df! macro.
  • Wasm-Ready: By preserving narrow types like Int8, Charton minimizes memory footprint in memory-constrained WebAssembly environments.
  • Zero-Abstraction Temporal Data: Time data is stored as raw i64 integers, allowing coordinate arithmetic without the cost of high-level object wrapping.

Full Type Mapping Reference

Charton VariantPhysical StoragePolars EquivalentBest Use Case
BooleanboolBooleanBinary flags, True/False categories.
Int8 / Int16i8 / i16Int8 / Int16Memory-efficient small integers (e.g., months).
Int32 / Int64i32 / i64Int32 / Int64General purpose integers or primary IDs.
UInt32u32UInt32Array indices or internal dictionary keys.
UInt64u64UInt64Large hashes or 64-bit unique identifiers.
Float32f32Float32Memory-efficient coordinates for high-density plots.
Float64f64Float64The Standard for most coordinate and value axes.
StringStringString / Utf8Unique labels or long descriptions.
Categoricalu32 Keys + String DictCategorical / EnumRecommended for Legends, Colors, and repeated labels.
Datei32 (days since epoch)DateCalendar-based timelines.
Datetimei64 + TimeUnitDatetimeTime-series data with sub-second precision.
Durationi64 + TimeUnitDurationTime deltas or Gantt chart intervals.
Timei64 (nanos since midnight)TimeDaily cycles and clock-time analysis.