CHASM

Crytpograph-Hash-Algorithm-Secured Mirroring

Reflections on a Mirroring Solution: End of FSoC

Those of you who have followed my blog this summer know that I have been working on CHASM, a Cryptographic-Hash-Algorithm-Secured Mirroring solution. I am sad to announce that the end of FSoC 2010 is upon us. Working on CHASM this summer has been an invaluable learning experience and a great deal of fun.

My mentors Ben Boeckel and Rob Escriva have been a joy to work with and an excellent source of information. We had a good deal of discussion over the summer through which I learned a lot. Not all discussions were about the project, some were off-topic and about Linux in general. In a recent discussion Ben provided me with an overview on how some of the major projects implement versioning, and how it related to Fedora, specifically the NEVRA system used by Koji. Overall, both of them did a _great_ job as mentors and I look forward to continuing our work together.

As for the project, I did not quite finish all that I had set out to. I still need to finish a local domain socket library that can be used by all daemons that constitute CHASM. One critical part of this did get finished and is done as a separate library. We hope it might be useful to other projects and plan to have it packaged for Fedora soon -- I will write a separate post on this later. CHASM is far from complete, but I do not plan on going anywhere; I am committed to helping see this project through and will do my best to help out where I can.

Lastly, I would like to again thank the Fedora Community for making FSoC possible. It was nice to finally be working with other people and with such a great community.

-mfm

The Git Experience

Git is, without a doubt, the SCM of choice these days. I started using Git about a year ago, beginning with my exploration of the Linux Kernel. Eventually I began using git for my own purposes, although I rarely ventured outside of a few basic commands. Working on CHASM has required me to become proficient with Git; fortunately, my summer mentors, Ben Boeckel and Rob Escriva, are well-versed in the use of the program.

A week or two into the summer coding session, I happen to make a true rookie mistake, and did a 'git reset --hard' on my current tree. Initially I was not committing my work because I was afraid of breaking the build, this lead to the stupid decision to keep about 3 days worth of work uncommitted on one branch. Sometime after, I was trying to merge with another branch, ended up not liking the results, and did the reset. That was my Git lesson the hard-way -- I am sure some people can relate.

After that, I took a side excursion to learn a little bit more about Git. One resource that proved itself useful, was a very informative lecture given by Rob. I recommend this as it is an excellent explanation into the way Git works. Rob starts with an overview of Git itself, giving a nice breakdown of the design and the theory on which it is built. He then discusses some commands pointing out how all the concepts gel together, and illuminates these points with concrete examples. Throughout the lecture you gain an insightful look into how Git can best be used; Git users of all levels will find this to be a valuable resource. (Rob does make one fallacious statement based on unsound logic, he states that the editor of choice is Vim, of course we all know Emacs is far superior!) Secondly, and of course not to be forgotten, are the Git manpages . These are some of the most complete I have ever read, and provide a number of examples, along with sample workflows.

-mfm

The Refractions of Design: Part IV

My original intent for this thread happened to be discussing the iterations of my design, so this is how I will conclude.

Only having a working knowledge of C++ definitely added to my inability to settle on a proper design -- or maybe I should say my lack of knowledge in object-oriented and structured programming. The initial design was completely imperative in style, with a lot of shared state and an inadequate use of classes. There was a plan in place and I knew where I wanted to take it, but looking back it was not the best approach for the problem.

For example, the hello message is the first message sent or received in the protocol, so I had a function:

void v0 :: upstream :: priv :: recv_hello(const boost::system::error_code& ec,
                                          const std::size_t bytes)
{
    ...
    // setup recv_msg for hello
    m_recv_msg.reset(new v0::msg_manifest());
    boost::shared_ptr<v0::msg_manifest> msg =
               boost::dynamic_pointer_cast<msg_manifest>(m_recv_msg);
    ...
}

There were similar functions for every message in the protocol, all shared a number of variables that were data members of the upstream class. Two were message variables descending from an abstract base class developed for messages; one was for sending and one was for receiving. The two statements shown above characterize their usage: a pointer to the base class is reset and then dynamically cast into the type of message expected. Every function followed a basic pattern: 1) setup the buffer, and 2) do a read or write on the socket -- if anyone has looked at the code, there is a fatal flaw in the way I read in the msg; however, I feel that could have easily been corrected.

Another major difference between the designs, is that I originally did not expect to know the read or write size, so I used a message parser class. Boost.Tribool came in handy for this purpose and the code had the pattern:

void v0 :: upstream :: priv :: recv_hello(const boost::system::error_code& ec,
                                          const std::size_t bytes)
{
    ...
    boost::tribool result(m_msg_parser.parse_hello(bytes, msg));
    if (result)
    {
        // the next state
        send_manifest_req_to_the_oracle(boost::system::error_code());
    }
    else if (!result)
    {
        m_socket.close();
        start_accept();
    }
    else
    {
        m_socket.async_read_some(boost::asio::buffer(m_msg_parser.buffer()),
                                 boost::bind(&v0::upstream::priv::recv_hello,
                                             shared_from_this(),
                                             boost::asio::placeholders::error,
                                             boost::asio::placeholders::bytes_transferred));
    }
    ...
}

The message parser class was responsible for reading in the raw data using a stateful, somewhat optimistic, algorithm for the specific message that was being received. It would then set the specific fields for the given message type to be used by subsequent functions in the upstream class.

The aforementioned design is what I was referring to as the jumping-off point in Part I. This code was all scrapped but was the genesis for the current design briefly discussed in Part III. A more thorough explanation of the current design can be found in doc/chasm-p2p-design.rst within the repository.

In conclusion, I can not say for a fact that my initial design was good or bad, it happened to not be right for the project. The design process has been an enlightening experience; it has provided an insightful look into the process that I was severely lacking. The open-source development model is such a great way to learn, I can not even imagine ever working on closed-source software again.

-mfm

Copyright © 2010 Robert Escriva ¦ Powered by Firmant