Filtering a List Using Regex Match and Elixir

Filtering a List Using Regex Match and Elixir
This is part four of the nine post series on Processing a Log File with Elixir. If you find this article helpful, please subscribe and share 🚀 In the last post, Getting Items from List with Elixir, We trimmed our list to only contain the items we want, the TCP_HIT/MISS, and the URL. Our data is now looking like this:
[
  %{
    http: "http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/275211/v0401185814-1389k.mp4+740005.ts",
    tcp: "TCP_HIT/200"
  },
  %{
    http: "http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/326260/v20169101326-1256x544-3063k.mp4+3713710.ts",
    tcp: "TCP_HIT/200"
  },
  %{
    http: "http://example.com/04C0BF/v2/sources/content-owners/cinedigm-itub/398629/v201711170053-2061k.mp4+4582327.ts",
    tcp: "TCP_HIT/200"
  },
  %{
    http: "http://example.com/04C0BF/v2/sources/content-owners/cinedigm-itub/398629/v201711170053-2061k.mp4+4582327.ts", 
    tcp: "TCP_HIT/206"
  },
...
]
  Let's take a look at where we are on our steps:
  1. Fetch data from URL
  2. Split each new line into a list item
  3. Split each line into list items
  4. Filter items to only contain the URL and TCP_HIT/MISS
  5. Find the six-digit video id from the URL, it should be the first integer in HTTP paths of:
  6. "example.com/04C0BF/v2/sources/content-owners/" and "example.com/04C0BF/ads/transcodes/"
  7. Group by Video ID
  8. Get Cache Hit and Misses for each Video
  9. Calculate the Cache Hit Misses
  10. Sort by video id
  11. Print to file
  We now want to get the Video ID following these formats, where 384055 and 006817 respectively, contain the Video IDs: http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/384055/v201708302148-2273k.mp4+4023936.ts http://example.com/04C0BF/ads/transcodes/006817/2791522/v0402000243-854x480-HD-1401k.mp4+22355.ts But we don't want URLs that don't contain those paths. For example, if you were to dig deep into the 5k lines, you'd see entries like: ... [ http: "http://example.com/80C0BF/subtitles/422e3734-382b-4bb3-a753-e3f003d9cdd6.m3u8", tcp: "TCP_HIT/200" ], ...   Lets start with a test  
# access_log_app/test/access_log_app_test.exs
  test "filters list by strings" do
    list = [
      %{
        http: "http://example1.ts/yep/a/b/c/some-string/123456/01234-56789.1011.ts",
        tcp: "TCP_HIT/206"
      },
      %{
        http: "http://example1.ts/nope/a/b/c/some-string/123456/01234-56789.1011.ts",
        tcp: "TCP_HIT/200"
      },
      %{
        http: "http://example2.ts/yep/d/e/f/some-string/some-string/123456/01234-56789.1011.ts",
        tcp: "TCP_HIT/200"
      },
      %{
        http: "http://example2.ts/nope/d/e/f/some-string/some-string/123456/01234-56789.1011.ts",
        tcp: "TCP_HIT/200"
      }
    ]
    result = filter_list_by_strings(list, ["example1.ts/yep/a/b/c", "example2.ts/yep/d/e/f"])
    assert result == [
      %{
        http: "http://example1.ts/yep/a/b/c/some-string/123456/01234-56789.1011.ts",
        tcp: "TCP_HIT/206"
      },
      %{
        http: "http://example2.ts/yep/d/e/f/some-string/some-string/123456/01234-56789.1011.ts",
        tcp: "TCP_HIT/200"
      }
    ]
  end
  Our solution is to create a function that takes our list and the paths we want to match. We then run the list through Enum.filter, and grab the HTTP value through using the map.key notation. We then do a Regex.match? inside the parenthesis "()", Having the nested parenthesis allows the "or" operator "|" to be used.  
# access_log_app/lib/access_log_app/CLI.ex
  def filter_list_by_strings(list, paths) do
    [a, b] = paths
    Enum.filter(list, fn item  ->
      http = item[:http]
      Regex.match?(~r/http:\/\/((#{a}|#{b}))\//,"#{http}")
    end)
  end
  In this post, we saw how easy it is to filter through a list of maps, grab a value and then perform a simple regex to match the URLs that match the paths of the formats that contain our Video IDs. In the next post, we will use some more regex to get the Video ID from that URL.   If you like this post, please share and subscribe!