RTSP server switch input sources

Topics: Question
Mar 29, 2016 at 11:56 AM
Hi, Thanks for great code. I am trying to implement RTSP server. I have 2 IP cam input sources with RTSP urls. I need to switch the input source to the client every minute without stopping client session. i,e when a client connects to RTSP server he should first see first cam output for a minute and the output should change to 2nd cam for the next minute and loop the same.

Can you please suggest how I can achieve this.

Many Thanks
Mar 29, 2016 at 1:17 PM
Thanks for your interest in the project.

It seems like mixing in a certain notion, e.g. you are mixing stream data from multiple sources.

You could have more than one video track in a media session, you would just need to modify the SDP to indicate that, possibly also with the time descriptions of each session is you desired.

You would basically just let the client switch media after the time specified in the Media Description.

This will work even if the formats are different and will not require splicing the video or mixing the elementary streams and seems to be the easiest way in my opinion unless I am not understanding something.

The client doesn't have to disconnect to switch media or start a new session and would just issue another SETUP / PLAY for the next stream in the SDP at the time description it indicated.

You could also do this at the stream level, e.g. if you want to keep a single session without disconnect or stopping or issuing any other commands and just have it seamlessly keep playing but from different sources your going to have to add another sps and pps in the Media Description, re-write sequence numbers and timestamps on the source packets to make them in order in the destination stream and finally you will only be able to do this at certain points without causing the decoder to lose state information.

The Scalable Video Coding SVC stuff e.g. RFC6190 is probably what you need as it allows coding of multiple streams in a single underlying stream.

Let me know if that helps and what else I can do to assist you!
Marked as answer by juliusfriedman on 3/29/2016 at 6:17 AM
Mar 29, 2016 at 1:36 PM
Thanks so much for your reply and suggestions. It is very helpful.

I prefer to do it in stream level. Can you please suggest which functions I need to look at to modify so that I don't mess up the code and lose state information. Can you please advice the required changes. A sample code will be very helpful.

Many thanks for your help.

Mar 29, 2016 at 2:55 PM
When you say at the stream level you are essentially conceding that you have 2 different streams that you want to source a single underlying stream.

To achieve this you would need to ensure both cameras have the same profile as far as resolution and configuration, if they don't you can't effectively mix the data.

RtspSink would be the best mechanism for this but it's not really ready yet to do this for you automatically, so in the meantime you would setup a RtspSource which was attached to both existing sources and presented the packets received from both sources as it's own.

When packets arrived from each RtspSource you would store them and Depacketize them accordingly but separately according to which stream they belong to.

[25 Packets from source 1] Seq 0 => 25, Timestamp = 0 (X)
[15 Packets from source 2] Seq 0 => 15, Timestamp = 0 (Y)
[15 Packets from source 1] Seq 26 => 40, Timestamp = X
[25 Packets from source 2] Seq 16 => 40, Timestamp = Y

After you determined that one minute has elapsed or a specified number of frames have been played from a source then you would start buffering packets from that source again.

There are now packets in the buffer for the 2nd stream which need to be played for a minute, those packets sequence numbers need to resume where the first stream has stopped, this must occur such that packet 0 from source 2 is sent as Seq = 26, then finally the Timestamp will probably have to be adjusted such that the Timestamp from the secondary stream is not less than the Timestamp in the primary stream to ensure playback does not skip or drop when the next frames arrive.

That's the high level overview of how something like that would be achieved, you also will need to handle cases where one of the streams is inactive or no longer sending data I imagine...

You don't need to modify anything, just to add your logic for your implementation.

You would start by deriving from RtpSink / RtspSink e.g. 'MixingRtpSink', from there you would have a constructor with the existing sources you wanted to mix or a method to add sources to the 'Mix' dynamically.

You would verify that the sources your mixing are compatible with each other by looking at the SDP, if they were not you can't mix them.

Then proceed as indicated above to take the packets from the sources and process them for sequence number and Timestamp changes and finally send them out where they will be delivered to all consumers of the stream.

Give it a try and post when you get stuck and I will have no problem @ all assisting you with code if needed at that time.
Marked as answer by juliusfriedman on 3/29/2016 at 7:55 AM
Mar 29, 2016 at 4:38 PM
Thanks so much for explaining this. Appreciate your support. I will give a try.

With this stream method, Is there any way we can stream from different sources with different resolutions or format?

My actual requirement is from server end I should be able to set which source to be streamed and with what time gap. I should be able to set only first cam to be streamed or only second cam to be streamed or both cam output with configured time gap like every minute on a single session. According to the configuration the output to be streamed in a single session without disconnection.

So is this something we can handle using SDP or stream level?

Please help.

Mar 29, 2016 at 7:15 PM
Edited Mar 29, 2016 at 7:55 PM
The different sizes is what the SVC would help you achieve but you would also need a decoder which would be able to support it.

You would have to do less work at the Rtp and Rtsp level to achieve that but you would still have to grab both streams to put them into a single SVC stream correctly.

For the SDP, yes you could also have a Time Description which makes one minute streams which repeat every other minute, the syntax for it would be found in

5.10. Repeat Times ("r=")

You must also realize that it would totally also be up to the consumer of the stream to honor the SDP unless you also enforced these same time-descriptions and repeat times at the server side.

For instance if you don't end the stream at the 1 minute mark the client may continue to listen and may or may not check the remaining media descriptions to determine if they are still active.

I know my RtspClient doesn't do do this but I will add some logic for this so thanks for showing me that people intend to use these features.

It can be changed quite easily when receiving the Describe response to check before any Setup requests to ensure that MediaDescription doesn't have a corresponding TimeDescription in the SessionDescription which describes it :).

In short even if you get support in the RtspServer and RtspClient I am not sure that any other client enforces this in real life, the RTSP layer gets the SETUP in most cases even if the Time Description within the SessionDescription doesn't allow it in most cases I have seen and thus which is why I didn't really bother to check it.

It may be easier if you can control the client to use a Rtcp App message to inform the client to switch or possibly even a PUSHED RTSP Message such as ANNOUNCE or PLAY_NOTIFY, you will have to play around to see what works for your application scenario.

What I can say though is that I can see this working more generally as follows:

When the client connects for the DESCRIBE request create a SDP with multiple Media Descriptions, one from each source you want to encompass in the session.
(You may have to remap the payload types if they overlap because some receivers will only use the payload type in the media description and not the SSRC)

You can then use the PacketBuffer to virtually 'Pause' a certain stream e.g. video1 or video2 to prevent packets from going into the RtpClient of the receiver.

The receiver will get packets from a whatever source for your desired time limit.

Unpause the next source and Pause the current source.

Repeat the process of sending (optionally discarding packets which are older if you wanted)

You would also be able to make a small class with a Collection to help with the switching of packets if you ever wanted more than two sources or to do different types of buffering for each source depending on some other state variable.

This would allow all streams to be playing individually and also remain accessible individually but allow only sending data at certain times on a certain session though a certain source, or in short when you wanted to allow via the server or otherwise.

A client such as VLC or QuickTime would not disconnect (Rtcp would still be sent) and SHOULD be able to display all the data within the same view (even different resolutions) in the case of a client tried to manually switch to a source you would be able to handle this also by either allowing it to effect the pause cycle or you could choose to ignore it.

Some viewers might choose to display the different feeds in different sections of the view or some may choose to totally show each stream in the entire view, it's up to the viewer to decide. For example VLC allows you to compose a custom viewport quite easily either with different sources or offsets of another source.

So, using a custom viewport or just including multiple session descriptions should be a much easier way to do what your trying to do no matter what codecs you use and without needing to change anything with the sequence numbers or timestamps, your just simulating multiple parties input to a client (Audio or Video) additionally it should work much more generally than trying to get the Time Descriptions and Repeat Times to be used if they aren't already.

That should achieve what your trying to achieve without any disconnects and should also work under TCP and UDP accordingly.

This just like issuing multiple SETUP requests for multiple streams and choosing to issue PLAY or PAUSE from the client but instead the server is controlling what is playing and what is paused so there is no request other then the SETUP and the logic how to resume from the pause state as the client doesn't have to do anything but SETUP all media in the description which is much easier to achieve in most cases.

Let me know if you have any other questions or if you need further assistance!
Marked as answer by juliusfriedman on 3/29/2016 at 12:16 PM
Apr 1, 2016 at 1:01 AM
Thank you so much for detailed explanation. Really helpful. I will try implement as you suggested and contact you if any further issues.

Thanks Again
Marked as answer by juliusfriedman on 4/1/2016 at 5:32 AM