Merging Datasets

Merging Datasets: Should I use a Many-to-Many Merge?

Thank you for taking the time to read this! I think my problem is complicated so I'll be detailed. Also, I do not know exactly what the solution will be, so the structure of my question might be convoluted.

The Situation (in points)
I want to run a regression: (person's stock performance rating) = B(nonS&P/S&P500 portfolio concentration) + Ei
...where a person's stock performance is on a 1-10 point scale.

Each person can buy a stock on a start date and sells it on an end date.
Over a 5 year period, their portfolio concentration changes on any given day. In one data set, I have: (1) list of investors (2) with respective stocks (3) days that they bought it (4) days they sold the stock.

I am looking at the S&P500 component list on a month by month basis (the list is compiled on the last day of each month). This second dataset contains: (1) Company Name (2) Company Ticker (3) Month they were in the S&P500.

In order to run the regression I believe I need to merge the people's portfolio to the S&P500 list (merging on the variable "company name" and "ticker symbol"). B/c many people will have the same companies in their portfolio AND because the S&P500 list contains many of the same companies over the course of each month, I was about to do a many-to-many merge. However, I believe this is a bad idea.

Because I want to calculate the portfolio concentration of each person on any given day, I need to take into account the start date and end date of each investment. Right? If so, then how can I use these variables to correctly merge the two data sets together?
Last edited: