DATASCIENCE

Home / DATASCIENCE

Data Science, Python, R, Data Mining, Data Analysis

What do i need to become a Data Scientist?

Data Science, Data Analytics, Big Data, Business Analytics.. these are the latest trend topics in the industry. I have followed up several courses in these matters during my MSc and Udacity program this year and realized the importance of subject. So which way and program should i follow? This is my development program due to my researches and advises which i ve taken from gurus.

1. Statistics:

First and most important is the stats knowledge.
a. Probability distributions. Tip : central limit theorem.

b. Simple Stats and Jargons: 

Qualitative or quantitative data | Ratio/interval/ordinal/nominal data | Difference between population and sample – mean and variance | Skewness and Kurtosis | Standard deviation, mean, quartiles | Cheby Sheff’s Theorem, Coefficient of variation, Bayes law | Least square methods | Various Probability theory – classical/ Relative frequency/ subjective probability theory | Joint, marginal, conditional probability | Exclusive, […]

How Are Precision and Recall Calculated?

How Are Precision and Recall Calculated?

 

In data classification, precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances. And recall (also known as sensitivity) is the fraction of relevant instances that have been retrieved over the total amount of relevant instances. Precision and recall are based on an understanding and measure of relevance.

 

Let us imagine there are 100 positive cases among 10,000 cases. You want to predict which ones are positive, and you pick 200 to have a better chance of catching many of the 100 positive cases.  You record the IDs of your predictions, and when you get the actual results you sum up how many times you were right or wrong. There are four ways of being right or wrong:

  1. TN / True Negative: case was negative and predicted negative
  2. […]

R Reference Card

Here is my own R Reference card or cheatsheet which i ve been using in my MSc and other data science studies.

 

Getting help and info

help(topic)  documentation on topic
?topic same as above; special chars need quotes: for example ?’&&’
help.search(ʺtopicʺ) search the help system; same as ??topic
apropos(ʺtopicʺ) the names of all objects in the search list matching the regular expression “topic”
help.start() start the HTML version of help
summary(x) generic function to give a “summary” of x, often a statistical one
str(x) display the internal structure of an R object
ls() show objects in the search path; specify pat=”pat” to search on a pattern
ls.str() str for each variable in the search path
dir() show files in the current directory
methods(x) shows S3 methods of x
methods(class=class(x)) lists all the methods to handle objects of class x
findFn() searches a database of help packages for functions and returns a data.frame (sos)

 

 

Operators

<‐ Left assignment, binary
‐> Right assignment, binary
= Left assignment, […]

Go to Top