Greatra Mayana

Career & Employment Opportunities

Data Science & Machine Learning for Non Programmers | Data Science for Beginners Intellipaat


Hi good evening everyone, and good
morning to my US folks and people of other geographies who have logged in to
this live session on YouTube, I would like to welcome you all for this open
discussion and the forum for Data Science specifically designated and
targeted for the non-programming background. So, if you guys are good to go
with that, I think it’s the right time for me to take
you up with this entire conversation. For the next couple of minutes, I would
quickly walk you through with the agenda what we have tried out capturing for
this entire communication and for this session. So, having said that part, let’s
go and have some keywords and the takeaways what we are going to cover and
discuss in today’s session. So, very first part would be to talk a bit on the
data analytics, what the data analytics is all about. We will try to
have a brief understanding of what exactly the data analysis means to us, and why it
is so popular in this 21st century and been called as a sexiest job as of now.
Then, next part of our communication and conversation would be to talk a bit
on the Data Science and Machine Learning because everyone understands that these
keywords are very much saleable these days. So, we must have the right
clarity about what each of these things stands for us, what does that convey. The third key point here, what we actually are here for, is to focus on the
Data Science and Machine Learning with non-programming perspectives, and how this can
be relevant for us to shape our career and to look for a better future in
this space. Then, we’ll spend our time to discuss and learn the differences
between the various keywords related to the Data Science and analytics
industries like Data Science, Data Analysis, Business Analysis, Data Mining,
Machine Learning, Deep Learning, and Artificial Intelligence. So, how these
things are similar or different from each other or
being interchangeably used in the industry because many of the times these
keywords are actually used knowingly or unknowingly interchangeably. So we
actually have to have a right clarity between all these terminologies. It is
very essential so as to get the right clarity. The fifth part would be to
get through and understand the tools that could be relevant for us being
there from the non-programming background and will try to capture few
of the insights and the overview about those tools, how they can be helpful for
you to look for a better career. The last part would be to
talk a bit on the case study which I have identified for generic purpose
which is very much popular and frequently used in the industry. So, these
are these six keywords or you can say the takeaways what we are going to cover
in this session one after the another. So here we go with the first and the very
important aspect, and let’s have a discussion on the analytics as a whole.
Now when I am talking about the analytics, we have to have a right clarity
with what the data analysis is all about because we are living in the world where
each and everything is very much surrounded with the data, and if you will
talk about the current scenario where we have everything quite interconnected
with the Internet these days, so more or less every move of a human being is
generating the data. So, if I would have to talk about what is data analysis, in a
simple sense what you can see on your screen
in this slide, I would ask you to just go ahead with a simple keyword that data
analysis is just like any attempt to make sense of the data.
I hope this makes sense to each and every one. So the point is very
simple: any activity, any attempt with the help of which you are trying to make
sense out of the data. When I say data, it could be structured or unstructured,
whatever the way it is coming to us. So, the
activity what you are doing to drill down with the data in order to get some
sense, some information, some interpretation from that actually
comes under the terminology of data analysis. So, these are the supporting
keywords you can see because we are living in a world which is surrounded by
the data. If we talk about the last couple of years and the future of
the data, I would rather say we have a huge scope because, as of now, if you will
take my experience personally, the data till date is getting generated by the online
sources, the Internet, the social media, and the e-commerce websites, but going
down the line in the next five years this Data Science is going to be more
informative, more resourceful, and more significant for each and every human
being on this planet. I would like to discuss this thing
in detail. The reason being till date, the data is only and
only getting generated with the online sources, but in near future, in
next couple of years, I personally see that the data is also going to be
generated with the offline sources with the sensors. So sensor data is going to
be the next big thing. If you are moving in and out of a retail mall, the sensors
are being placed at the entry gate of that retail mall will be also
generating the data and would further be used for our analysis. So having said
that part, whatever the things we are going to discuss and cover, be it a data
analysis, data science, (we will be discussing about the data science in a
while), so each and everything would be all and all about what these things are for.
So, let me explain you what is data analytics. If I would have to talk
about what is data analytics, I would simply be saying ‘any attempt to make
sense out of the data.’ Now, why we are trying to put a lot of efforts to
make sense out of the data? This is important, I mean, I could have
ended up my conversation just by simply putting this line for you to understand
this is what the data analytics is. Now the point is why we are taking much
efforts to make sense out of the data to get the information out of the data.The answer would be in one single statement: to gain and to get the competitive edge. That’s what
the data is all about. I hope this single definition is necessary and sufficient
for us to identify what is and why is data analytics so important for us, be it
whatever you are doing or to any extent scaleably. So having said that part, this
is what we are going to discuss and cover back and forth. Let me take few of
the more input so as to strengthen our understanding. There are several
techniques used (just a moment let me get some drawing tools so that I can
highlight the areas) Now, the question is
Why data analysis? I think this has already been discussed a couple of
seconds before when I said that in order to get the competitive edge, so that’s
the bottom line of every statement. What you can see on my screen is: to improve
the business requirements, performing real-time market analysis, generation of
the reports and studies, and gathering the hidden insights. So, these are the
four points you can see in order to understand why data analysis is so
important, but at the end of the day, all these four keywords and the statement
just simply highlights one common agenda and the context, that is, to gain the
competitive edge, be it to reduce the cost or the expenses, or to increase the
revenue. At the end of the day, these are the only aspects what each and every
organization is intended to do. I hope that makes sense to each and every one.
So that is what the Data Analysis is all about. Let’s take this forward to
understand few of the other things which would be relevant for us to know. We have
been talking about data analytics, data science, data mining, back
and forth again and again. The question always comes in our mind what actually
the data is, or whether this makes any sense for us to understand these
definitions or not. So, I would like you to understand the clarity and to have
the right understanding with all these key words being the beginner or the
fresher in the industry. So, I would like to start and put the quote that
‘facts and statistics collected together for the reference or for
further analysis’ this is what you can understand in the context of data. If
I would have to quote, I mean, based on my personal experience what the data science and the data analysis is, I would say ‘the process of
beating the data, torturing the data until it speaks by its own,’ I
mean, I can place that as a second quote at the top of it.
I would reiterate that thing once again I mean in case if you want to take this
definition of data science and data analytics on a very lighter note, you can
put it more like ‘torturing the data until it speaks by its own, that means
until it’s giving its own information, just like any culprit speaks the things
in the jail. So, let us now take this forward to understand what is data
science. So, we always have these two key words being used back and forth again
and again interchangeably or synonymously being called as data
science and data analysis. So, I believe, it is the right time for us to
understand what the similarities we have between the data analysis and data
science and what are the differences we have in between these two terminologies.
So, let me put that very clearly here before I walk you through with the slide.
What is data science? Each one of us would be having different
interpretation with this definition because data science, as a whole, is very
subjective. If you take my definition, I would simply say in
a necessary and sufficient key word that ‘science of dealing with
the data is data science.’ I can put it like necessary and sufficient
keyword and then in order to make it very easy and comfortable for you to
understand, I can further extend that. When I say science, science
for all of us is nothing but the collection or a process, collection of
all the methodologies. (kindly ignore the typos guys because I have been using two
or three different laptops interchangeably so I hope you all will
be ignoring that spellings and so far otherwise my typing skills are perfectly
fine) okay. So collection of all the methodologies or techniques, tools or
processes in place, that is what the science is all about. Now I’m just
getting stick with that keyword in place to deal with the data. Now
the second definition or the second part of this definition I am now extending. So when I say to deal with the dat,a what are all the things you can
think in terms of dealing with the data to a certain extent, I can think of
sorting the data, filtering the data, summarizing the data, merging, appending, and before that I can think of
sampling, mining, and transforming. So, these are the things which you can think.
so whatever you generally do with Microsoft Excel or sequel is what comes
under this tagline, or visualizing or maybe modeling at the end of the day.
So having said that part, this is what you can understand in the context of
data science that ‘the collection of all the methodologies, techniques, tools,
process in place with the help of which you
are doing all these things with the data, that means, anything with the data. It’s
what comes under the data science as a keyword. Having said that part, let me
further strengthen your understanding giving the final summary that it’s an
umbrella term or umbrella terminology. when I say umbrella terminology, that
means, be it whatever you are doing with the data that actually counts under the
data science. The reason why I am highlighting all these things back and
forth again and again is because of a simple reason that most of the people
here in this entire conversation might be having a confusion or a perception
that data science is all and all about dealing with the high and advanced
analytics concepts like machine learning deep learning or artificial intelligence
or maybe to deal about the unstructured volumous Big Data, but my dear friends,
this is a complete myth. When I say it’s a complete myth, this explains each and
everything that it’s an umbrella term having said that part be it whatever you
are doing maybe you are simply creating a report or an Mis dashboard on the
lower side to the high end machine learning or artificial intelligence or
deep learning concepts everything actually comes under the umbrella of
data science this is what we actually have to understand clearly and in a
write fashion and I hope this makes sense to each and every one as well so
just to break this myth data science is not only and only about dealing with the
volumous data putting some high-end advanced analytics using the core
statistical softwares and the modeling techniques be using SAS R or Python
focusing on only these areas no this is not only about data science when I say
data science it is simply or even a single additional formula in there
Microsoft Excel itself and I hope this makes sense to each and
every one coming back to the further conversation to take this forward data
science is the study of the data in a structured manner okay putting the
things in a right framework is what the data science is all about it involves
developing the methods of recording storing and analyzing the data so I hope
you can get the context from this line that what the data science is all about
this is later used to extract the useful information and to further use it for
the analysis part let us now take this forward so as to understand few of the
additional keywords and to take a deep dive with the data science part quantitative
to help steer strategic business decisions that means data analysis could
be qualitative or quantitative on the other side quantitative data analysis
further helps us to steer when I say steer that means to derive the business
decisions to strengthen them and you can actually optimize the resources sales
force marketing strategies the bandwidth available and the rest of the other
things so I would like to I would like you to understand that with the help of
data science you can actually fine-tune lot of areas in your business be it in
terms of allocating the resources or be it in terms of allocating the human
resources of workforce they are being engaged so this aspect of data science
is about uncovering the fine uncle uncovering the findings from the data
that means hidden patterns hidden trends or the ideas from the data this is what
generally what we are using the data science for this is further use for
understanding the complex business behaviors trends and the inferences it’s
about surfacing hidden insights that can help enable companies to make smarter
business decisions so if you will just sum up with this overall point one thing
is very same and contextual that at the end of the day data science is again
used for us to drive our business in a right direction so as to optimize the
budget allocations fundings resources and the sales force when I say sales for
that means the human power engaged with that so just to give you a right clarity
okay not going much beyond into that detail say for an example if you have
four seer rupees for an organization and if you are just allocating that one CR
one CR one CR into four zones like for print media for digital media for the
social media and for the other activities so if you are blindly putting
that once here for all these four channels
there might be a scenario that print media may not give you that much return
okay out of one CR it may be in a loss giving only 50 like return but on the
other side if you are putting one CR money for the digital media or for the
social media campaigning or for the marketing it might be possible that it
can give you a return to force year so this is something which you are doing
without analysis but with the help of analysis you could have reduced this
funding on the print media it could have further added the part there in the
print media so these are all the areas where you can think of using the
analysis and the data science part and having said that part I would say data
analytics and data science is more like a fun job these days because you
actually like and try to see the way how the inferences are getting generated I
mean I would like to share one of my example particularly I did one project
recently when I was in USA for the leading retail industry that’s called I
would probably not give a name to that so I actually did the detailed analysis
on the data which I was supposed to mention and talk about and to give the
culture intensity to the retail largest retail channel to tell them where a
particular product has to be kept in the entire retail store being spread in and
the I mean very large space what should be the position of that particular
product at what height a particular product should be kept at what angle a
particular product should be kept so if you can see like if a store is having a
space of 10,000 square feet that’s a big area really okay then having a range of
somewhere close to 3,000 products when I say 3,000 it’s like 3,000 varieties
different different products so we actually did the read detailed in size
and the analysis where we were telling at what location in the entire store a
particular product has to be kept I mean in which shelf it’s at which position
should that be at the end of the store or should should that be right at the
beginning of these stores the retail model and at what height and
at what which angle that means we were just creating a 360 degree program for
that so that is where and to that extent you can do the data analysis taking this
forward let’s try to put some more understandings with the data product so
you can have a lot of examples I mean most of the companies are actually
having and driving their business because of the evolution of the data
science itself right so if I will keep on talking about the application areas
of data science and data analysis I mean this one hour would be very very less
for that part if you are just moving here and there your Netflix Spotify all
the mobile apps I mean one thing you would still be wondering right I should
not actually be talking about all these things but one thing is very common
these days if you are downloading any application even that application is
focused on the data or not let’s suppose if you are downloading any news
application right so they would ask you they would probably be asking you to get
all the access to the locations the folder the photo print media so whatever
you have in your mobile they will ask you to get the access to that right why
is it so because they want each and every minut information of you being an
individual if I would know not go far beyond all these conversations three or
four years before there was a rumor in the market that Flipkart was planning to
shut down their url-based website and was planning to go 100% on
the mobile app if you tell me the reason why is it so I mean you will find
thousands of the reason but most of you would be correct in their understanding
there was a one particular strong reason why they were planning to get complete
hundred-percent operating operational with the mobile
application because the web URL based services could not be that personalized
but if a person is using their smartphones and purchasing something
that piece of information that means that data is
more reliable and you know this has been commonly said if there was going to be
any fourth World War that is for sure be happening because of the data only right
if you are just going through with the daily news channel so us a couple of day
before yesterday I was watching through the news USA has canceled the license
for whom why Chinese based company because of the data privacy issues only
right so this has been commonly said Jesus if there is going to be fourth
world war that is surely be going to happen because of the data issues so
having said that but I just wanted you to understand the importance of the data
the peels of the data I mean not going far beyond there was a couple of weeks
before there was in popular app right I mean the app was to show you how old are
you looks after a period of time right and there that app that F was AI ml base
rep I don’t remember the name of that application but that application was a
DI and ml based application but there was a huge controversy and debate over
that application like how relevant is there for a user of that app because
eventually you are actually sharing most relevant data and the piece of
information with the company so if you will scroll out to that part you will
find thousands of the controversial statements and the blogs and the
articles of what that particular app it says so data is of course important your
personal information is very very fruitful and I would say price priceless
for for the organizations to help them they’re identifying their business
strategies that’s what all I can talk about let’s take this forward
I mean you are drilling down the organization’s are drilling down with
the data with the help of which they are defining the data product what they
should be coming into the next part right this all happens because of the
insights and the analysis all right having said that part let’s
take this forward and have us further conversation over one of the most common
topic and the widely used term in logic these days called machine learning as
you can see on my screen it is very very crucial for us to understand the machine
learning in a simple sense machine learning is a term which is widely used
in almost all the fields ranging from simple optimization of advertising of
all the ways of parting the quickest space path and navigation systems to
Mars so I mean be it whatever you are talking about machine learning has been
a part and parcel of our conversation having said that part if I would suggest
you if you would ask me to tell you what is machine learning for for me I would
simply say the way how a human being learns from our experiences similarly is
what you are actually making the algorithms or you’re creating such
algorithms which can learn by their own this is what the machine learning is all
about let me put that very quickly so if we will talk about what is machine
learning this is all about creating such programs or you can say algorithms or
processes in place which can learn and evolve by their own when I say Walt
Byron that means if you will take the analogy just like the way we human being
human being learn from the experiences and evolves this is what ecology machine
learning is that means you are actually creating those dynamic and generic
programs which can evolve by their own without any human intervention that is
important what do you need to know that’s without human intervention okay see this machine learning is an
application of artificial intelligence so I would be talking about all these
differences what is statistical modeling data analysis predictive modeling
machine learning and how it is different from the artificial intelligence machine
learning is actually a part and parcel of artificial intelligence that provides
the system the ability to automatically learn when I say automatically learn so
be it whatever you are using let us take an example of YouTube or Netflix they
are if you are watching any particular your honor of the videos in YouTube you
will get the recommendations only on that part right say for an example if a
person X is there and he is consistently watching a lot of sports related videos
there on our YouTube channel so YouTube will keep on recommending him or her the
videos related to sports only right but maybe after a V this X
person change his taste and started looking for the music video so recommend
nation engine will also start recommending him the music video that
said that this algorithm was the same simple single algorithm no one has
actually did define tuning from turning from sports to music channel okay
sundar Pichai has not actually allocated a one particular data scientist to take
care of your likings and disliking that earlier last week you were interested
for sports and also there was one person sitting behind and feeding this portion
you’ll know that’s not the scene all I want you to understand is that the
algorithm is same which got a world by its own without any human intervention
that means there is no as such a dedicated data scientist I saw sitting
in the headquarters of Google there for you to take care of your likings and his
likings right so this is a standard program a general program which is
having the ability to automatically learn and I hope this example is good
enough for you to understand the capabilities what this machine learning
brings to our business and to strengthen our decision-making after learning they
improved experiences without being explicitly programmed that means Google
has not done any explicit programming for your recommendation engine to turn
it from source to news music recommendations right it’s the same l
Gordon this technology focuses on the development of computer programs that
access the data I use it learn for themselves
so Google driverless cars Robo takes all are the implementations of the machine
learning which actually enhances learns with the data they are experienced with
this is what typically the machine learning is all about let’s take and
focus on the ideas and before I jump into that conversation it is very very
crucial for me to highlight the core idea talking about a data science and
machine learning because machine learning is actually a part of your data
central data science is an umbrella term right so if I would actually help you to
understand all these things you can put it more like a Venn diagram where you
have statistical models the general modeling criterias then you have this
one overlapping with the machine learning so machine learning is the
automation of the individual fine-tuned programmed model which we just have
discussed if you want to attain or achieve more accuracy in your machine
learning model then you actually go with the deep learning so this is your deep
learning guys deep learning is nothing but having multiple layers of machine
learning algorithms is what the deep learning is and collectively your ml and
DL part being associated with the computer science softwares and the
hardware equipments is being called as artificial intelligence so I can very
well say that artificial intelligence is actually an amalgamated form of the ML
machine learning and DL deep learning at the end of the day so but all these
things be it whatever you are doing is coming under the umbrella of data
science so we are focusing on data science so the point is who is a data
scientist from my side a mathematician a person who is having us a bit
understanding with the mathematics plus statistics more importantly he should be
a problem solver and that means you should have the problem-solving
abilities and the thought process that is what more important we have in place
plus story teller plus visualiser who can visualize the data
and most importantly he should have a detailed understanding with the domain
so I am highlighting this part just for a simple reason that I mean there is one
thing left which is what we are focused here guys programmer okay this is what
has been commonly set but there is a myth a person coming from non
programming background can also or I mean this data science is not only and
only about the programming part ok then let this is what I wanted to clear get
clear with this statement so only programming is not what it is
required to be a data scientist a person should have a fine and at least 10th
standard level of mathematics understanding a bit of statistics core
understanding the basics of some way of understanding the platform’s problem
solving abilities what makes it important to retailer visualization and
the business domain this is what I wanted you to capture this is what I
wanted to put that in the single frame and this has also made that a person
coming from the programming background can also can only be a better data
scientists know that submit that’s a complete myth I would probably be
putting that in a more stronger message so this business domain is very
important and that’s the reason most of the doctors pharmacists financial ca’s or space I have
of lawyers me my friends here are doing wonders in their respective fields being
the data scientist why because you need not to be a hardcore programmer only you
need not to be a statistician only you need not to be a mathematician all it is
what required to be a overall business understanding and the ability to put
that into a framework that means you should only be having the basic basic
understanding dealing with you Microsoft Excel and sequel part that’s all what it
is required to be a data scientist because if you will talk about these
statistical platforms we have says our or peyten primarily being used in this
current situation but pupils are people are doing wonders even they are not
coming from the non programming background so this is what the next
discussion area what I would be having in place in order to get you the clarity
with the market requirement and the overall scope in the industry so let me
take this forward programmers prefer to understand the data set and work with it
as efficiently they can programmers make it easy for users further down the chain
that means they can write their programs to write the extensive logics implement
their logics for the various purposes like for the high hard core data
visualization aspects or for the storyboarding but what is there with us
we have being from the non programming background that is what the question
mark is right and having said that part of is you could understand this reason let me tell you so before I come into
all these conversations let me take you back a bit only ok if you remember the
data science in this excel file I captured two things right that means
reporting Mis – both till the high end machine learning part correct that means
data science or data analysis is actually of two parts one is descriptive I am talking about the predictive part
so these are the two sides of the coin we have descriptive analytics is just to
use the store achill data and to get the meaningful insights of what it has
already happened and for that you only need not even the programming background
you only need simple class sixth or maybe like I can on a higher so I can
put class tenth mathematics that’s it which primarily requires your
aggregation and summarization isn’t it so like finding the total mean minimum
value maximum value average and so on and so forth
correct this is what generally we do on the descriptive side just to make it
very easily understandable even if you are looking of a balance sheet of a very
big corporate MNC then on their only they are using the basic averages totals
and the mean values even though that is put in a very large scale of finding the
quarter-on-quarter result or maybe year-on-year result isn’t it so but at
the end of the day if we look into the detail they are anyhow just summarizing
the data and finding the total or maybe to certain extent averages mean value
maximum value and so on and so forth so that only requires the simple class ten
mathematics in order to get survive with the industry a lot of examples could be
there was an average salary of an employee in a company and I don’t think
we need to have a programming background and the understanding so as to get these
insights that I have the salary of your company the attrition ratio of your
organization the pass percentage of the students in your college the placement
ratio of the students in your college so all these things primarily does not
require any programming background isn’t it so it may be done this just by using
the simple basics class and mathematics and for that reason we have two basic
tools to talk about that is Excel to certain extent you can target on this
equal and on a higher side tableau for visualizing those basic
so that is what I am going to reiterate here that you need not to be champion in
programming or it is not required for you to come in from the hard core
programming background having a paddle I’m in every understanding of C C++ Java
no that’s not really required if it would have been the scenario then my
dear friends all the people working in the IT industry like taking few of the
names like TCS way Pro and forces must be doing wonders who are the Hardcore
Champion in java.net Perl PHP right so they must be doing wonders in the data
science base no but that’s not the current correct fact the fact is for you
to be a better data scientist you have to have a problem-solving abilities you
have to have a business understanding and the basic idea about how to survive
in the industry and that is what it makes it important so for non programming background who
are actually looking to attain easy analysis and visualization of the data
without having an in-depth knowledge of coding so there are I would say if we
will talk about the proportions still in the current situation 60 to 70% of the
work still lies with with those areas where a person probably need not to have
a programming background 60 to 70% of the work in the industry the
opportunities in the industry still lies to those areas where it is really not
required for you to come from the programming background they are actually
looking out for the tools and analytics which have an easy user interface to
work with and avoid them the enticed output let
and did identify the differences between the core keywords like the data science
and machine learning I believe I have already discussed that a couple of
minutes before that machine learning is actually a part and parcel of a data
science whereas data science is actually an umbrella terminology machine learning
is a specific way with the help of which you are creating algorithms which need
not to have a human interference this is generally done on the large volume of
the data where you are developing the proof of concepts optimizing the
requirement tuning the algorithms furnishing the proof of concepts further
and innovating the existing data whereas data science is the overall terminology
where you are focusing on the career growth enhancing these skill sets
whatever it is required for you to understand and analyze the data
operational research experimental design and the data economics so if you are not
able to grasp all these or maybe not able to relate all these terminologies I
would still be liking you to get stick with the previous conversation that
machine learning is actually under the umbrella terminology of data science
where I can say this is one another context out of that we have deep
learning as well which is the multiple layers of machine learning is what we
call it as a deep learning and then we have AI which is nothing but the
inclusion or amalgamation of machine learning plus deep learning with the
computer science concepts or the hardware equipments like robotics
driverless cars or everything is what what we can see as a part and parcel of
our AI and all these things comes under the data science so data science is
actually a umbrella terminology I hope this is very much clear with the
everyone here let me now take this forward with whatever the time remaining
we have let us try to get more comparisons between the data scientist
and the business analyst people working in the BPO industry is also being
designated most of the times as business analysts and people who are coming from
the AIIMS or IT backgrounds coming in the leading MNC is indifferent and
analytics actors are also given the designation of business analysts so what
does that really mean ask because the earlier business
analysts focuses more more on the database design part project management
data optimization and designing the report that means this person is more
inclined or having the detailed business understanding with the help of it they
actually try to put the approach how the saw problem will be solved and based on
that we further use the data science methodologies so business analysis
business analyst is most likely a person which actually tries to understand the
problem statement of a specific domain be it like the financial industry the
legal industry the healthcare industry the retail industry and then try to
identify the solution using the data science skill sets I hope that is what
you need to understand in a write session so business analyst is a person
who try to look the overall problem statement in a wider perspective and
then try to find the solution using the data science concepts data scientists as
I discussed is more about leveraging the skill sets and finding the solutions so
as to help the business analysis or the business analyst right so this is very
much there in front of you taking this forward I am looking to spend the last
10 minutes form a case study the one which you would probably be liking there so for data analysis specifically for
those those who are coming from the non technical background Microsoft Excel is
one thing which everyone should be equipped with with or without any
concern this is something being called as the basic hygienic practice in the
industry ok people will not ask you to get skilled with the Excel specifically
in the job description but if you don’t know Microsoft Excel being there as a
data analyst or data science professional that is still very serious
and it’s more like a crime to be there in the industry right so this is famous
and used to any extent widely used for the statistics and the data modeling and
to easily learn that it can actually integrate with the datasets easily so
most of the times I would rather say that it
percent of the work in the industry still happens on the Microsoft Excel
even if you will include the big force when I say big for the big organizations
like Deloitte KPMG II nyn Deloitte KPMG environed I just missed the last one I
mean sorry for that PwC yeah alright so we have another tool that’s being called
a stab view so these are all the tools what you can think in under the umbrella
of data analytics this is free tool which connects to any data source the
creator data visualization and maps supposed to relate from the difference
so this these these are the tools which is primarily used for the data handling
and data visualization purpose specifically the tableau we have other
tools along with the tableau which is very much popular these days like power
bi QlikView and these are the tools which
requires some working with the data used for manipulation and the topmost part is
says SAS is one of the most widely and frequently used statistical software for
us to handle menu plate and analyze the data the reason why it is so popular is
because it has all the capabilities available within that which primarily a
statistical tool should have be it to process the data transform the data or
to visualize the data or to statistically do the machine learning
and other things there so we have all the available capabilities of in there
incest and that is the reason this is one of the most favorite tool among the
professionals even though it is having a have heavy licensing cost so one thing
is very important that this SAS is very very fast when I say very very fast I
can share my personal experience I have worked on a one single SAS data set when
I say data said that means that most it’s more like a table of 40 GB that is
the speed and the extent what says cells to you then we have rapid – rapid – again very
frequently used tool which is integrated with the data science platform so you
can do the statistical modeling into that it can actually integrate it with
the other software’s and the platforms where or the our DBMS I would say like
Excel Oracle or maybe sequel Tara data in order to get the data and to do the
statistical analysis or maybe machine learning analysis for that can make the
use of real life data – when I say real life data that means I have it I can
also pass into that and that’s the beauty of the rapid miner tool we have
in order to get you comfortable with the overall conversation I would like to
quickly walk you through with one of the very basic and very interesting key
study that is primarily based on the segmentation and clustering on the RFM
part the RFM is I would talk about the RFM let me make it very clear the RFM
stands for the recency I would first of all give the context in the next five
minutes and then I will walk you through with the steps which you can think of
implementing be it whatever the tool either it could be done on simple Excel
also or may be you can write the solution in our platform or may be in
Python later on that’s not a challenge so before I jump into that part my dear
friend the entire requirement in the industry is all about the data science
right skill sets required there in the data science when as a data science
means how you are putting that in place it hardly matters for a professional
whether you are doing that which says or with Excel or with R or Python these are
just these different different platforms and the tools you have each one of them
is having their own advantages and disadvantages maybe who knows tomorrow
you have something better than these tools right
but these skill sets required in the data science will always remain same
that means how you are analyzing the data it hardly matters which platform
you are using to analyze this data so my intention is to get you there so as to
help you understand the importance the granularity and the significance of
this concept of data science and for that only reason I am talking about the
RFM case study so as to make it very very when not to get it specific with a
particular platform so let us now try to try to understand the business problem
statement and the solution of that in terms of RFM which is primarily based on
the segmentation and clustering I will talk about the what the segmentation
clustering is in a while but let me tell you the frequency the RF M stands for
the recency frequency and monetary correct this is what the full form you
have now before I jump and walk you through with the problem statement in a
lucid manner or an intuitive understanding let us try to understand
the idea what is clustering then we will talk
why clustering and then how clustering clustering or segmentation are used
interchangeably so not going into that part everyone here being professional in
respective areas must be aware that each and every business organization or the
enterprise wants to reach out their customer or wants to serve their
customer in their very specific way right they would like like a Airtel
would like to entertain me would like to serve me in the best possible manner to
whatever the way it is possible for them with their available resources and
bandwidth based on my requirement isn’t it so being there in the industry let’s take
an example of Airtel would that be possible for Airtel to get 130 koror
individual post paid plans for each of us individually no that is not possible
even though I am saying that my requirement is 535 GB per month for the
data plan I only want hundred minutes of outgoing call and want to have only an
only 50 sms’s so this is what my exact requirement is as far as my usage is
concerned as I told you each and every organization wants to reach out their
customer want to serve their customer in a very specific and the best possible
manner so would that be possible for Airtel to create a one unique postpaid
plan for me with all these unique specifications for sure not so what they
want to do or what they generally do is they actually group the data in such a
way that people having their similar likings properties preferences choices
and characteristics should be allocated into one group so what I can do is
instead of creating 130 crore postpaid plans I can probably roll out 10 or 20
different prospect plans right so that is what the clustering is that means you
you are clubbing or grouping the similar clients customers in one group so that
you can reach out to them in a specific manner rather than reaching each of them
individually because that is not possible because of the costing issues
bandwidth resources issues right so this is what clustering is all about having
said that part let us try to understand our problem statement now everyone is
clear what is clustering that means distributing the data dividing the data
in in different different clusters such that each cluster is homogeneous
internally and heterogeneous with another per one heterogeneous
why clustering because each organization I mean I have I can walk you through
with this one each every business units wants to reach out the customer in a
perfect customized dedicated fashion but this is not possible right because of
the resources constraints so they want to cluster the overall data into n
groups so that instead of entertaining total k samples they can target the
created end be it like three four or five groups collectively this is why the
clustering is important so this is the example what we are now talking about
and the case of recency recency means how recently a person has purchased the
item so if this is generally being done on the transactional data and very very
popular for the retail and e-commerce industry very popular because this is
very easy but very significant let me tell you honestly I will walk you
through with that part reason why it is so popular because it is easy simple
plus very significant and that is the reason why I have picked up with this
for our conversation significant so if you have the transactional data let us
suppose I have this data can we get the transactional data grouped or summarized
and can identify how what was the last time he did the transaction with us this
simply requires the PI working in Excel or maybe you can do the group by summary
in Microsoft SQL or you can even put that into the Microsoft Excel pi word
table is required this is something which you can directly retrieve after
doing some basic processing in the data or maybe if you are getting the data and
the client level details you can simply get that having said that part I have
captured the numbers like this is the last time this customer ID has done some
transaction with us this is the frequency that means in
here four times he has done the transaction with us and this is the
total monetary value of the amount he has done repurchasing collectively these
three parameters or attributes makes a lot of sense for me to identify how
value how valuable each of these customers is so that I can focus on the
relevant areas let me try to help you understanding this scenario I hope
everyone in this conversation understands the percentiles decides in
the quartiles percentile means the same percentile what you are getting in your
competitive examinations right if you have appeared in your competitive
examinations you must have score 98 percentile marks right so compression
tile is distributing the data in 200 equal parts this is what we call it as a
percentile this Eiling is distributing the data into ten equal parts and then
we have quartiles which is very very important and robust for the statistical
perspective which is all about distributing the data into four equal
parts and I am probably using this quad tiling feature for doing this analysis
this can be done in the Microsoft Excel or maybe incest are peyten I have done
this deliberately in Excel for this conversation so that you can identify
the flow and can understand the numbers there because my idea is not to get you
through with the programs my idea is to get you through with the problem
statement and with the solution part I would like to take five more minutes to
conclude my conversation because this is very very easy simple and
straightforward now what I have done is in this excel file I have captured the
various quartiles you can see this is a simple quartile function I have placed
there in the Microsoft Excel so this is the quartile of recency if you will look
into that I can actually put that number the reason why I have used the Excel for
this particular problem statement is because everyone more or less understand
the Excel right so this quad tiling is done on the recency first of all then on
frequency first of all and then on the monetary value I have marked all these
values like if a number in the recency is
between zero to twenty six this is first quartile 27 to 93 is second quartile 94
to 225 is third quartile to 26 to 474 58 is fourth quartile using this basic
simple Excel formula you have highlighted each of these quartiles in
front of that recency and similarly is what you have done for the frequency and
monetary this is done in Excel with the simple if-then-else formula as far as
the programming methodology is concerned same is what the logic you can write be
it whatever the platform we have in place B it says R or Python all you have
to understand is the way how the solution has to be drafted out and as
far as this s is concerned R or Python is concerned your basic preliminary data
handling understanding would be sufficient for you to write the code
there so that is the reason I have been telling to you from the very first
minute that programming skill sets are not essentially required all you need to
have in place is like what needs to be done and how that has to be done that
means approach and algorithm should be clear as far as the syntaxes are
concerned you should not be worried about that even Google Baba is enough
for you to tell this in Texas right all right coming into that conversation what
I want you to understand is that I have marked all the quartiles like u 1 Q 2 Q
3 Q 4 where I can say Q 1 recency means this particular customer is a group
which has done the transaction very much recently
so having said that part just focus on this conversation can I say that the
person if it is falling under the Q 1 Q 1 means first quartile our very active
customers being them considering as active customer can i channelize the
cross-sell or upsell promotional activities to them because if a person
has done the active transaction very much recently then he is most likely the
active customer means he is loyal customer correct compared to a customer
who has done a transaction with us maybe like 6 months before who can guarantee
whether he is there in India or not has been shipped to to some
other place so q1 means very much recently transaction I can consider them
as active so let’s go for that can I say q1 if a person has falling under the
recency of q1 or q2 or q3 they are risky customer because it’s been so long that
they have not done any transaction with us so they are risky customer can I
start the retention campaigning for that of course yes I should most or moreover
if a person is falling into the fourth quartile of recency only can you see
that this is recency right can I say this is something called the churn
customer that means it seems after a particular time period maybe like 12
months afterwards or 360 days if your as per your business definition they can be
considered as shown customer so you can trigger the deactivation campaigning
isn’t it so so this is one dimension of understanding this problem statement
where you can identify which one is the active customer which one is the risky
customer which one are the current customers a very simple one isn’t it so
all you are doing is just simply going and splitting the data into various
clusters or being called as quartiles primarily using the basic simple
statistical I mean you must have covered the quartiles percentages in dissonance
in your schooldays let me further extend that for the next one more minute being
focusing on the exit of customer just focus on this area green-colored
if a person has done the reasons I mean transaction very much recently that
means falling under the q1 first quartile of the recency and having very
much high monetary value that means falling under q4 this one of monetary
and also falling with the q4 of frequency that means very very frequent
so this person is very active being first in phillipe recency very much
frequent being in the fourth frequency and very much high valued so sure and I
offer them are pre mean like if you are doing this analysis for the credit card
company so don’t you think this category of
the customer should be offered the premium card compared to what I have
been getting the call from the credit card companies like hey I am giving you
the silver card as compared to what probably many of you must be getting the
calls form the premium card why because might be a scenario you are doing a lot
of transactions and that too very much frequently and with the highly valued
ticket size and this is how you can understand the problem statement so I
can identify simply that to whom I should so this is a premium category of
the customer so out of that can’t I do this just a second guys so I can clearly
define the status of the customer like I can put a new column very much clearly
with these certain if-then-else conditions like whether a customer’s
active or risky or turn cut by just following these certain basic rules even
if it is active I can decide whether I should offer him a premium or you can
say if it is in the context of credit card whether I should offer him a
platinum card or a gold card or simply a silver card based on is other locations
of the quartiles so this is where now you can reach out to the individual
customer and can target them accordingly so don’t you think this simple analysis
all I have done is just basic quarreling and then I am simply putting the
if-then-else basic formula in Maxell and same as what you could have done in SAS
or are as well or maybe in Python and now you are in a position where you can
have a decision which one is valuable customer to whom I should offer a
Platinum Card to whom I should offer a gold card to whom I should offer a
silver card which are the risky customers so don’t you think you are
saving a lot of business if you are starting your retention campaigning to
all those customers who are risky don’t you think you are getting back all those
customers who are on a verge of getting shown so this is this intention the
significance of doing the analysis this is just one example my dear friends in
industry you have thousands of such problem statements and the solutions can
either be done on Excel or maybe says our Python or maybe sequel these are the
platforms we have in place of course programming is not essentially required
people coming from the non programming backgrounds are also doing wonders in
the industry beat like I have lot of doctors my friends they are using all
these capabilities or maybe like lawyers pharmacists financial marketing experts
in the MBA backgrounds coming from the I am so doing the wonders so this is a
complete myth that people coming from the programming background can only do
the wonders in the design space no that’s a complete myth and this is one
proof I have tried my level best to put that in a simple sense so that you can
understand that intensity with that set part I am ending up with my conversation
I hope everyone must have enjoyed this conversation and in case if you want to
reach out to purchase any course specifically with a discount of 30% you
can connect with the Intel Abed team and can focus accordingly after the
conversation ok so with that conversation I am ending up with my call
and have a good evening to my Indian folks and all my US and other folks very
good morning to you thank you so much thanks for your time and have a nice
journey in your career happy learning all right welcome welcome back guys and
good evening before we want though ahead and wind up the session I would like to
thank encore which who are who was our instructor and I would like to quickly
show you my training program which is the data science certification training
so if you look at my website if you want to go ahead and browse through down you
would see there is a Browse section under which you would see there is a
category called data science and under that particular section you would see
the first program which says the data science certification training cost so
this is the training that we generally recommend for someone who irrespective
of whether you are from a programming or from a non programming background if you
want to go ahead and master your skill sets in the field of data science this
is what you should be going for and getting started with this as if there
was all your aspects of data analysis data exploration data manipulation data
planning all of that in detail and in order to perform these analysis we also
go ahead and we teach you our as well in this particular training so just to
quickly wind up this is a 40 years of training that I have with the live
instructor and we have two kinds of schedule that is coming up we have one
weekend class which is which is every Saturday on Sunday at 8 p.m. ist and for
him anyone who’s joining us from from US or Canada it is going to be 10 to 80
a.m. in the morning which would be Eastern Standard Time and the complete
session is of three hours in total every Saturday and Sunday and it goes on
for seven to eight weekends with the instructor and if you want to rather do
it on the weekdays I have an upcoming we did class which would start in from this
from the this coming Monday it would be Monday to Thirsty’s from 9:30 p.m. est
and for our learners from back in India it would be Tuesday to Friday in the
morning from 7:00 a.m. ist and this training session would be of tours in
total and the complete training cost as you can see on my website is has already
been brought down from 22800 to 19,000 8:30
due to the independent state campaign that we are running back in India we
would have a flag 30% discounts for the next two days on this complete training
so if you would like to have some more information about the pricing and and if
you want to avail the discount you can please come in on a live chat and just
help us with your with your details with your name and cell phone number and with
the best time to talk and we can my consultants can reach out to you and
help you with with further details and this offer is exclusively for the next
two days and if you’re going ahead and paying in u.s. dollars for our US or
Canada based learners the complete training again is three hundred and
forty-eight US dollars which after you can go ahead and avail of lack 30
percent discount on the same and you can go ahead and put in your details over a
life chat and someone would go ahead and assess you but with the other options
and if you are taking the instructor-led training one additional thing that you
would get is you would have our self-paced training given absolutely
free of cost there’s no difference as such you would have the self-paced
training which already has covered the recorded sessions which would be an
actual life class recording that you can undergo an aura on a complete training
package you would have a lifetime access you would have a lifetime support and
this all comes up with a lifetime free upgradation so once you sign up with us
you can still come back even at a later date point in time and as and when there
is a version change or there is an operation happens you can come back and
request for a life trust again and again depending on your requirements and you
can keep yourself updated so go ahead and sign up for the program as soon as
possible and I’m sure that this is going to be a benchmark in your career where
we’re in it would definitely help you for a real good transition into the
field of data science and I’m pretty much sure that once this training goes
and really well there would be other options that you would be very much
excited to go for will let it be in the field of you know integrating these
skill sets in the field of tableau or artificial intelligence or SAS a machine
learning so go ahead and come in on a live chat and someone from a team
definitely go ahead and help your so this is jazz Benjamin
I’m from the enterprise team and if you need any further information you can you
can just drop in on our website or give us a call back at the toll free number
or at our Indian number that you’re looking at our website so thank you so
very much for your time and you have a wonderful day
just

4 Replies to “Data Science & Machine Learning for Non Programmers | Data Science for Beginners Intellipaat”

Leave a Reply

Your email address will not be published. Required fields are marked *