Quantcast
Channel: Matching two data frames in R - Stack Overflow
Viewing all articles
Browse latest Browse all 2

Matching two data frames in R

$
0
0

I have a question about the output I'm getting from using the match function. I have two dataframes which are dissimilar in their number of rows and row names. I wish to obtain two new dataframes from the previous two with equal no of rows/rownames. One way to do this is to match the rownames of one dataframe to the other

Here's my code below so far:

 x_1 <- c("A1", "A1", "B10", "B10", "B10", "B10", "C100", "C100", "C100", "C100") y_1 <- round(seq(1, 24, length = 10), 2) A <- data.frame(x_1, y_1) x_2 <- c("A1", "B10", "C100", "D1", "D200", "G210") y_2 <- round(seq(1, 24, length = 6), 2) B <-  data.frame( x_2, y_2 )

Now, as A and B are dissimilar in rownames, I wish to make new versions of A and B but with all the dissimilar rownames deleted.

 m_1 <- names(table(A$x_1)) m_2 <- names(table(B$x_2)) comb_names <- union(m_1[!(m_1 %in% m_2)], m_2[!(m_2 %in% m_1)]) A_1 <- A[!A$x_1 %in% c(comb_names), ] B_1 <- B[!B$x_2 %in% c(comb_names), ] newB_1 <- B_1[match(A_1$x_1, B_1$x_2), ]

newB_1 is a dataframe of B_1 which has been matched with rownames from A_1

My question is when I type the code names(table(newB_1$x_2)), I'm still getting all the original rownames in B_1 which should have been deleted with this code B_1 <- B[!B$x_2 %in% c(comb_names), ]. However, when I type newB_1, it gives me the right output.

names(table(newB_1$x_2))"A1""B10""C100""D1""D200""G210"newB_1x_2  y_2A1  1.0A1  1.0B10  5.6B10  5.6B10  5.6B10  5.6C100 10.2C100 10.2C100 10.2C100 10.2

In fact, the same thing holds for names(table(B_1$x_2)) which suggests that B_1 <- B[!B$x_2 %in% c(comb_names), ] isnt deleting the names contained in comb_names as given above.

table(B_1$x_2)A1  B10 C100   D1 D200 G210 1    1    1    0    0    0 

The final questions is how can I completely delete the rownames that are not common to both dataframes A and B such that I end up with two dataframes of equal rownames? i.e. I don't want the names D1, D200 and G210 appearing in the new dataframe.

I hope the above makes sense but I would be very happy to clarify any ambiguities. I would like to know how to modify my code to get the desired output but other alternative codes that can replicate the results are also welcome.


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images