Elixir – Group by Matching ID

This is part six of the nine post series on Processing a Log File with Elixir. If you find this article helpful, please subscribe and share 🚀 Looking at our list of things to do, the next step is to group by video_id.

  • Fetch data from URL
  • Split each new line into a list item
  • Split each line into list items
  • Filter items to only contain the URL and TCP_HIT/MISS
  • Find the six-digit video id from the URL, it should be the first integer in HTTP paths of:
    • "example.com/04C0BF/v2/sources/content-owners/"
    • "example.com/04C0BF/ads/transcodes/"
  • Group by Video ID
  • Get Cache Hit and Misses for each Video
  • Calculate the Cache Hit Misses
  • Sort by video id
  • Print to file

Our data is now looking something like this:

bash

[
  [video_id: 275211, tcp: "TCP_HIT/200"],
  [video_id: 326260, tcp: "TCP_HIT/200"],
  [video_id: 398629, tcp: "TCP_HIT/200"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 351421, tcp: "TCP_HIT/200"],
  [video_id: 12410, tcp: "TCP_HIT/200"],
  [video_id: 339342, tcp: "TCP_HIT/200"],
  [video_id: 414098, tcp: "TCP_HIT/200"],
  [video_id: 160842, tcp: "TCP_HIT/206"],
  [video_id: 160842, tcp: "TCP_HIT/206"],
  [video_id: 160842, tcp: "TCP_HIT/206"],
  [video_id: 160842, tcp: "TCP_HIT/206"],
  [video_id: 160842, tcp: "TCP_HIT/206"],
  [video_id: 160842, tcp: "TCP_HIT/206"],
  [video_id: 160842, tcp: "TCP_HIT/206"],
  [video_id: 160842, tcp: "TCP_HIT/206"],
  [video_id: 367665, tcp: "TCP_HIT/200"],
  [video_id: 367706, tcp: "TCP_HIT/200"],
  [video_id: 414098, tcp: "TCP_HIT/200"],
  [video_id: 312985, tcp: "TCP_MISS/200"],
  [video_id: 414098, tcp: "TCP_HIT/200"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 398629, tcp: "TCP_HIT/206"],
  [video_id: 23261, tcp: "TCP_HIT/200"],
  [video_id: 414098, tcp: "TCP_HIT/200"],
  [video_id: 12410, tcp: "TCP_HIT/200"],
  [video_id: 291986, tcp: "TCP_HIT/200"],
  [video_id: 360634, tcp: "TCP_HIT/200"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, tcp: "TCP_HIT/206"],
  [video_id: 186001, ...],
  [...],
  ...
]

Looking at the Enum.group_by/3 documentation, see that it returns a map of list of items that match the given key in function call. We can write our test to get started:

group-by-matching-id.test.ex

defmodule AccessLogAppTest do
  test "Groups list by video_id" do
    list = [
     [video_id: 1, tcp: "TCP_HIT/200"],
     [video_id: 1, tcp: "TCP_HIT/200"],
     [video_id: 1, tcp: "TCP_HIT/206"],
     [video_id: 1, tcp: "TCP_HIT/304"],
     [video_id: 2, tcp: "TCP_HIT/200"],
     [video_id: 2, tcp: "TCP_HIT/200"],
     [video_id: 2, tcp: "TCP_HIT/206"],
     [video_id: 2, tcp: "TCP_HIT/304"],
     [video_id: 3, tcp: "TCP_HIT/200"],
     [video_id: 3, tcp: "TCP_HIT/200"],
     [video_id: 3, tcp: "TCP_HIT/206"],
     [video_id: 3, tcp: "TCP_HIT/304"],
     [video_id: 4, tcp: "TCP_HIT/200"],
     [video_id: 4, tcp: "TCP_HIT/200"],
     [video_id: 4, tcp: "TCP_HIT/206"],
     [video_id: 4, tcp: "TCP_HIT/304"]
    ]
    result = group_by_id(list)
    assert result ==  %{
      [video_id: 1] => [
       [video_id: 1, tcp: "TCP_HIT/200"],
       [video_id: 1, tcp: "TCP_HIT/200"],
       [video_id: 1, tcp: "TCP_HIT/206"],
       [video_id: 1, tcp: "TCP_HIT/304"]
      ],
      [video_id: 2] => [
       [video_id: 2, tcp: "TCP_HIT/200"],
       [video_id: 2, tcp: "TCP_HIT/200"],
       [video_id: 2, tcp: "TCP_HIT/206"],
       [video_id: 2, tcp: "TCP_HIT/304"]
      ],
      [video_id: 3] => [
       [video_id: 3, tcp: "TCP_HIT/200"],
       [video_id: 3, tcp: "TCP_HIT/200"],
       [video_id: 3, tcp: "TCP_HIT/206"],
       [video_id: 3, tcp: "TCP_HIT/304"]
      ],
      [video_id: 4] => [
       [video_id: 4, tcp: "TCP_HIT/200"],
       [video_id: 4, tcp: "TCP_HIT/200"],
       [video_id: 4, tcp: "TCP_HIT/206"],
       [video_id: 4, tcp: "TCP_HIT/304"]
      ]
    }
  end
end

Our function looks like this:

group-by-matching-id.ex

defmodule AccessLogApp.CLI do
  ...
  def group_by_id(list) do
    Enum.group_by(list, fn [video_id, _] ->
      [video_id]
    end)
  end
  ...
end

Our function simply enumerates over the list groups by the video_id that is passed into the key_fun. Easy peas! That is it for today! Tomorrow we will be wrapping this up with step 7 "Calculating by Percentage". If you like, please share and subscribe!

Launch Your Project

Get your project off the ground with Space-Rocket! Fill out the form below to get started.

Space-Rocket pin icon