This is part four of the nine post series on Processing a Log File with Elixir. If you find this article helpful, please subscribe and share 🚀 In the last post, Getting Items from List with Elixir, We trimmed our list to only contain the items we want, the TCP_HIT/MISS
, and the URL
. Our data is now looking like this:
[
%{
http: "http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/275211/v0401185814-1389k.mp4+740005.ts",
tcp: "TCP_HIT/200"
},
%{
http: "http://example.com/04C0BF/v2/sources/content-owners/sgl-entertainment/326260/v20169101326-1256x544-3063k.mp4+3713710.ts",
tcp: "TCP_HIT/200"
},
%{
http: "http://example.com/04C0BF/v2/sources/content-owners/cinedigm-itub/398629/v201711170053-2061k.mp4+4582327.ts",
tcp: "TCP_HIT/200"
},
%{
http: "http://example.com/04C0BF/v2/sources/content-owners/cinedigm-itub/398629/v201711170053-2061k.mp4+4582327.ts",
tcp: "TCP_HIT/206"
},
...
]
Fetch data from URLSplit each new line into a list itemSplit each line into list itemsFilter items to only contain the URL and TCP_HIT/MISS- Find the six-digit video id from the URL, it should be the first integer in HTTP paths of:
"example.com/04C0BF/v2/sources/content-owners/"
"example.com/04C0BF/ads/transcodes/"
- Group by Video ID
- Get Cache Hit and Misses for each Video
- Calculate the Cache Hit Misses
- Sort by video id
- Print to file
We now want to get the Video ID following these formats, where 384055
and 006817
respectively, contain the Video IDs:
http://example.com/04C0BF/v2/sources/content-owners/cinedigm-tubi/384055/v201708302148-2273k.mp4+4023936.ts
http://example.com/04C0BF/ads/transcodes/006817/2791522/v0402000243-854x480-HD-1401k.mp4+22355.ts
But we don't want URLs that don't contain those paths. For example, if you were to dig deep into the 5k lines, you'd see entries like: [ http: "http://example.com/80C0BF/subtitles/422e3734-382b-4bb3-a753-e3f003d9cdd6.m3u8", tcp: "TCP_HIT/200" ],
Lets start with a test:
test "filters list by strings" do
list = [
%{
http: "http://example1.ts/yep/a/b/c/some-string/123456/01234-56789.1011.ts",
tcp: "TCP_HIT/206"
},
%{
http: "http://example1.ts/nope/a/b/c/some-string/123456/01234-56789.1011.ts",
tcp: "TCP_HIT/200"
},
%{
http: "http://example2.ts/yep/d/e/f/some-string/some-string/123456/01234-56789.1011.ts",
tcp: "TCP_HIT/200"
},
%{
http: "http://example2.ts/nope/d/e/f/some-string/some-string/123456/01234-56789.1011.ts",
tcp: "TCP_HIT/200"
}
]
result = filter_list_by_strings(list, ["example1.ts/yep/a/b/c", "example2.ts/yep/d/e/f"])
assert result == [
%{
http: "http://example1.ts/yep/a/b/c/some-string/123456/01234-56789.1011.ts",
tcp: "TCP_HIT/206"
},
%{
http: "http://example2.ts/yep/d/e/f/some-string/some-string/123456/01234-56789.1011.ts",
tcp: "TCP_HIT/200"
}
]
end
Our solution is to create a function that takes our list and the paths we want to match. We then run the list through Enum.filter
, and grab the HTTP
value through using the map.key
notation. We then do a Regex.match?
inside the parenthesis ()
, Having the nested parenthesis allows the "or" operator |
to be used.
def filter_list_by_strings(list, paths) do
[a, b] = paths
Enum.filter(list, fn item ->
http = item[:http]
Regex.match?(~r/http:\/\/((#{a}|#{b}))\//,"#{http}")
end)
end
In this post, we saw how easy it is to filter through a list of maps, grab a value and then perform a simple regex to match the URLs that match the paths of the formats that contain our Video IDs. In the next post, we will use some more regex to get the Video ID from that URL. If you like this post, please share and subscribe!
Launch Your Project
Get your project off the ground
with Space-Rocket!
Fill out the form below to get started.