Understanding lifetimes based on CLI project

March 17, 2021

Intro

Besides ownership and borrowing, lifetimes is another concept that I had big problems getting my mind around it. While learning lifetimes I decided to start little CLI project on my own. In this post I'll try to explain basics of lifetimes based on the code that I came up with during coding aforementioned CLI. The idea that I had was to create CLI that would receive URL as parameter, request HTML of that URL, find links and check each href parameter for its status code.

Code

I knew that lifetimes would come up anyway since CLI would work on references. Either to memory that I would want to return from function or code associated with crates code. For this little project I used reqwest and select crates.

The basic idea behind lifetimes is explained in great way in Rust book. It goes like this, imagine set of operations:

  1. I acquire a handle to some kind of resource.
  2. I lend you a reference to the resource.
  3. I decide Iā€™m done with the resource, and deallocate it, while you still have your reference.
  4. You decide to use the resource. That's what happend during my coding. Let's start with the main function and two functions that created this problem.
fn main() {
    let args: Vec<String> = env::args().collect();
    let website_address = &args[1];
    let document = fetch_links_from_website(website_address).unwrap();
    let links = get_links_from_document(&document);
}

fn fetch_links_from_website(address: &String) -> Result<Document, Box<dyn std::error::Error>> {
    let resp = reqwest::blocking::get(address)?.text()?;
    let document = Document::from(resp.as_str());
    Ok(document)
}

fn get_links_from_document<'a>(document: &'a Document) -> Vec<&'a str> {
    let mut links: Vec<&str> = vec![];
    for node in document.find(Name("a")) {
        links.push(node.attr("href").unwrap())
    }
    links
}

Function get_links_from_document without explicitly stating lifetimes won't compile. We state lifetime by annotating syntax 'a that we associate with the parameter of function, and it's return type (in this case, we can also use lifetimes in structs). Since function works on document and returns links both memories need to live as long as the other. By stating lifetimes we make sure that there is no dangling reference to the resource that was already deallocated.

Document will go out of the scope when the function ends and so will document. Because of that links would refer to the memory that was already freed. Rust compiler imposes those rules, by throwing error E0106, which states: function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments. In this case lifetimes can't be elided even though there are many exceptions. Rust ownership system makes it so through stating explicitly lifetime of each memory we won't have any problems with pointing to an invalid resource. Lifetimes basically describe scope that a reference is valid for.

These are bullet points that wrote down during learning about lifetimes:

  • Rust allows one and only one owner of memory (Ownership)
  • Rust allows multiple references
  • Lifetimes enforce a piece of memory is still for a reference
  • Lifetimes are really about ensuring memory does not get cleaned up before reference can use it
  • Lifetimes make you think in scopes and how deallocation works in Rust
  • If there is one param rust compiler assumes that you want to returned lifetime of that param
  • Static means a lifetime that lasts the entire program
  • Constants are static by their nature

Resources