A full-stack data pipeline that scrapes Malayalam movie data from IMDb, stores it in MySQL, and visualizes insights using Power BI. Designed to uncover trends in ratings, votes, and runtime in malayalam movie.
Power BI dashboard with dynamic weighting of votes and ratings
From raw IMDb data to interactive Power BI insights:
To balance IMDb ratings and vote counts, the following DAX formula is used:
Normalised Weighted Rating =
Parameter[Parameter Value] * DIVIDE(SUM('imdb movies'[votes]), CALCULATE(MAX('imdb movies'[votes]), ALL('imdb movies')), 0) +
(Parameter[Parameter Value] - 1) * DIVIDE(SUM('imdb movies'[rating]), CALCULATE(MAX('imdb movies'[rating]), ALL('imdb movies')), 0)
This formula allows users to dynamically adjust the influence of votes vs. ratings using a slider in Power BI.
The following M script cleans and transforms the raw movie data:
let
Source = MySQL.Database("127.0.0.1:3306", "imdb", [ReturnSingleDatabase=true]),
imdb_movies = Source{[Schema="imdb", Item="movies"]}[Data],
updated_name = Table.TransformColumns(imdb_movies, {
{"title", each Text.TrimStart(Text.AfterDelimiter(_, " "), ".")}
}),
cleaned_votes = Table.TransformColumns(updated_name, {
{"votes", each Text.Replace(Text.Replace(_, "(", ""), ")", "")}
}),
numeric_votes = Table.TransformColumns(cleaned_votes, {
{"votes", each
try if Text.EndsWith(_, "K")
then Number.From(Text.Remove(_, "K")) * 1000
else Number.From(_)
otherwise null,
type number}
}),
duration_in_minutes = Table.TransformColumns(numeric_votes, {
{"duration", each
let
hours = try Number.From(Text.BeforeDelimiter(_, "h")) otherwise 0,
minutesText = Text.AfterDelimiter(_, "h"),
minutes = if Text.Contains(minutesText, "m")
then try Number.From(Text.BeforeDelimiter(minutesText, "m")) otherwise 0
else 0
in
hours * 60 + minutes,
type number}
}),
FinalTable = Table.RenameColumns(duration_in_minutes, {
{"title", "Title"}
})
in
FinalTable