A question about SubString

Hello,
What is the difference between SubString and Indexing? I did:

julia> str = "Where the mind is without fear"
"Where the mind is without fear"

julia> SubString(str, 1, 5)
"Where"

julia> str[1:5]
"Where"

The results of both are the same. When should I use SubString?

Thank you.

Creating a String requires a new allocation on the heap, where the string stores a full copy of its data. A SubString, because it refers to an existing String, only needs to store references, and therefore requires no heap allocation to create.
A SubString is essentially a pointer and two integers.

3 Likes

Hello,
Thank you so much for you reply.
So, when I use str[1:5], does it allocate new memory on the heap?

Yes, it creates a full copy of the string, stored on the heap

1 Like

Slight nitpick: the new String is not semantically required to be allocated on the heap. It’s extremely likely to end up there, but it’s not a semantic requirement.

That being said, the main difference between indexing and calling SubString stands - the former creates a copy, while the latter merely stores the indices into the existing String. SubString is what you get from @view str[1:5].

4 Likes

It may or may not be obvious: the tradeoff of not creating a copy is that the parent string object cannot be freed by the GC. If that string is large and you don’t need to keep it around then you want to copy the part of the data you need.

4 Likes

once we have a moving GC, we could possibly do some tricks here where we could copy the substring and free the original if the original string dies (not sure if we could implement this in Julia, possibly with some sort of weaktef+finalizer?)

3 Likes

Hello,
Thank you so much for your reply.
What is GC?

Garbage collector; roughly speaking it removes objects that are not needed anymore from memory (“freeing” the memory). In Julia this is called automatically. In other languages, such as C, it’s the responsibility of the programmer.

3 Likes