CSC 213, Fall 2008 :
Schedule : Lab 12
Lab 12: Sensor Network
Goals: Learn about using RPCs by implementing a "push"-based
sensor network, like those used for stock quotes, environmental
monitoring, weather updates, sports scores, etc. You'll also gain
further practice with threads and synchronization.
Background reading:
Collaboration: You will complete this lab in pairs of your choosing.
Overview: You will implement a sensor network where a set of
independent "sources" each pushes its current value at a specified
period to a proxy server, which stores them. A "sink" may connect to
the proxy server and request a subscription to a subset of the
sources. The proxy will then push the sum of the subscribed source
values to the sink at a specified period. To ensure consistency, a
synchronization strategy must be employed in the proxy. Threads will
be needed to handle all the processing in the proxy.
Network Description
As indicated by the picture above, there are three main components
to this sensor network, sources, the proxy, and sinks. An arbitrary
number of sources should be able to register with the proxy and
begin sending updates of their values. This will be done with RPC
calls, where a source acts as a client and the proxy as the server.
A sink will make a subscription request to the proxy, sending its
hostname and a list of sources it would like to receive updates
from. The proxy will then make RPC calls to the sink, pushing updates.
In our network, the proxy's update to the sink is the synchronized
sum of the values of the sources in a sink's subscription. This
means that, while calculating the sum, none of the already-added
values should change until the total sum has been calculated. This
is a consistency requirement that ensures the sum represents a total
of values actually held at some moment in time--even if this comes
at the expense of making updates wait.
The proxy should accept updates from sources in parallel and process
the update asynchronously. This means that when a proxy receives an
update, it should spawn a new thread for the update and immediately
return (from the RPC call) to the source. Within this thread, the
proxy stores the new value in a synchronized fashion, so as not to
disrupt any sum being calculated for a sink's subscription. (Be sure
to distinguish the semantics of an asynchronous function return, and
the synchronous, i.e., mutexed, modification of the sensor value in
the proxy.
The proxy should also process sink subscriptions in parallel. Thus,
when a sink requests a subscription, the proxy should spawn a new
thread to handle the periodic sum and push.
The source updates should be "timestamped" so that the proxy only
sets its local value to the most recent update. (Note that no "time"
is actually necessary to guarantee this.)
To minimize overhead, network connections should not be repeatedly opened
and closed, but only opened once when they are needed. For example, a
source should only need to open a connection once to the proxy, and
a proxy should only open a connection once to a sink.
Part A
(Re-)Read everything for part A before you begin. It is likely that
your thinking and the design process will benefit from knowing about
your end goals. You may even want to have another look at parts B and C.
-
Design the interface to the proxy needed for the sources. This should
be in the form of a well-commented IDL (Interface Description
Language) .x file. Be sure to also explain how the
proxy will handle source registration requests and keep track of
updates from the sources (i.e., who an update is coming
from). Your protocol need not be "secure" with an unforgeable
identity.
-
Design the interface to a sink needed for the proxy. You should
assume that there is only one sink running an RPC server on any
given host. (Again, an IDL file will do).
-
Design the interface to the proxy needed for a sink to request
subscriptions. (Again, by creating an IDL file). Note that the
proxy will need to know how to connect to the sink (via RPC); you
are welcome to explore/propose alternatives, but one way is to
have the sink provide its hostname to the proxy
(see gethostname(2)). You may assume that a hostname is
no longer than 255 characters (plus a terminating null-byte).
-
You will use the POSIX thread library's mutex type
(pthread_mutex_t) to perform synchronization. Carefully
describe an algorithm for calculating the source updates and
subscribed sum values in a synchronized fashion that will prevent
deadlock. Your pseudo-code algorithm(s) should be accompanied by a
sufficiently convincing explanation of why deadlock cannot occur
(a formal proof is not required).
-
Why do we need to time stamp updates from the sources? (What could
go wrong if we didn't and why/how could that happen?)
Part B
(Re-)Read everything for part B before you begin. The
implementations will be dependent upon one another, and like above,
your thinking and the design process will benefit from a more global
perspective.
-
Implement the source component. The command-line usage might be something like:
./source proxyHost period
where proxyHost would be the name of the server
running the proxy process (i.e., localhost,
euler.cs.grinnell.edu) and period is the
time between updates. To look for interesting behavior later, you
may consider having the period be measured in microseconds
(see usleep(3)). Your source program should:
-
Establish a connection with the proxy and register itself.
-
Exit and print an error if it cannot connect to or register
with the proxy.
-
Push updates (random integers are fine, but you are welcome to
create something more interesting if you like) to the proxy
with the specified period. For now, it may push indefinitely,
or terminate after a specified number of updates (whichever
makes part B.4 easiest for you). Be sure comments in your code
indicate whichever is happening.
-
To make your system more robust, consider tolerating a certain
small number of consecutive errors resulting from a push. Once
this threshold has been reached, your source program
may give up and exit.
-
Implement the parts of the proxy component that are needed to
receive registrations and updates from sources. The command-line
usage will probably be something very simple, like:
./proxy
Remember that if you compile with RPC_SVC_FG defined,
your server process will not fork and background itself. This will
make it much easier to kill and restart, using shell process
management, since you won't have to hunt for the
PID. Your proxy program should:
-
Accept registrations from sources dynamically (up to a maximum
number if you choose).
-
When it receives an update request, spawn a new thread to
process the update and immediately return to the caller, so
long as the request is valid. If it is invalid
for some reason (i.e., from an unregistered source), a failure
should be indicated to the caller.
-
Conduct synchronized updates to stored values using
mutex-based locks.
-
Fully test the running of your source and proxy programs. Explain
what tests you have conducted and why they are sufficient to
demonstrate the correct behavior of this portion of the
system.
You may run these all on the same host. However, if you run
things on different hosts and get strange errors when you try to
start your proxy or connect the source to the proxy, be sure
your RPC program number does not conflict with a classmate's.
-
Demonstrate the operation of this half of your sensor network by
including some diagnostics in the code so that:
-
a source prints the update value and timestamp when being sent
-
the proxy prints the updated value and timestamp when received
-
the proxy prints the result of the update request (i.e.,
predicated on timestamp of the currently stored value and the
update request).
Turn in the results of one run (output from each source and the
proxy) with at least 3 sources, each with a different period so
that it is clear things are behaving as specified. Note the
periods used for each source (and include units!) if it is not
clear. Twenty update requests received by the proxy is
sufficient. Do not include more than fifty.
Part C
-
Implement the sink component. The command-line usage might be
something like:
./sink proxyHost period src ...
where proxyHost would be the name of the server
running the proxy process and period is the requested time
between updates from the proxy. A variable number of sources would
be specified, using whatever scheme you have determined to
identify them (integers are simple and sufficient; see
strtol(3)).
Since the sink is BOTH an RPC client to the proxy (when
the subscription request is made) and an RPC server to
the proxy (as it receives updates), some special care is in
order. When you compile the .x file for the sink's
interface to the proxy, it will generate a _svc.c file
that already has a main(int,char*[]) function. This is
the function that is called when the server (e.g., sink) starts
up and gets ready to receive RPC requests from clients (e.g. the
proxy).
I suggest writing a function start(int,char*[]) in your
own .c file (with the server-side RPC functions for the
sink) that takes care of
- processing the command-line arguments, and
- placing the subscription request with the proxy.
After you have used rpcgen to create
the _svc.c file, you can insert a call to
your start routine with the command-line arguments at
the very beginning of the automatically-generated main
function.
Alternatively, with less head-ache but slightly more effort when
you want to run the system, you could start the sink for
receiving updates first, and write a separate program (with its
own main doing what start does above), to send
the subscription request to the proxy.
-
Implement the proxy component that handles
subscription requests from sinks and periodically pushes updates
to them. Your proxy program should:
-
Spawn a thread for each sink subscription to handle the periodic pushes
-
Calculate the sum requested in the synchronized manner described above
-
Prevent deadlock
-
Only lock the source values needed to calculate the sum for a
particular subscription
Depending on when the connection to the sink is made from the
proxy (with a CLIENT*), you may need to add a delay
(see sleep(3)) while handling the sink's request to allow
the sink's main routine to finish setting up the RPC
server after your added start call. (Hint: Think about
what should be concurrent, what should happen before the thread is
created or in the thread, and when the proxy's RPC function should
return to the sink. The right order of events must take place otherwise the
reverse connection cannot be established.)
-
Fully test the running of your source, proxy, and sink
programs. Explain what tests you have conducted and why they are
sufficient to demonstrate the correct behavior of this portion
of the system. Be sure, also to test for memory leaks! (See the
program top(1) for monitoring memory usage.)
You may run these all on the same host. The same caveat about
RPC program numbers (cf., B.3) applies if you run them on
different hosts.
-
Demonstrate the operation of your complete sensor network by
adding more diagnostics in the code so that:
-
the proxy prints a sum value when being sent to the sink
-
a sink prints the value when received
Turn in the results of two runs (output from each source, the
proxy, and a sink) with at least 4 sources and one sink. Vary the
number of sources in the subscription for the sink between the two
runs. Note the periods used for each source and sink (including
units!) if it is not clear. Try to pick periods that show
"interesting" behavior (for your own definition of
"interesting"). Again, at least 20 and no more than 50
source-to-proxy updates for each run is sufficient.
-
Create your own experiment to test the scaleability of the proxy
by varying some of the following:
- the number of sources
- the source period(s)
- the sink period(s)
- the number of sources in a sink subscription
- the number of sinks (you will need to use one host machine
for each sink)
You might measure:
-
data transfer rate (how long it takes to get a certain number
of updates to a sink)
-
number of updates from the sources ignored/rejected by the proxy
-
time spent calculating the sum for a sink (i.e., time waiting
to acquire locks)
The point is to be creative and look for something interesting,
guided by your knowledge of the network or how you expect it to behave.
Details to Remember
-
As always, you are strongly urged to use Makefiles to
compile your work regularly and easily.
-
Don't forget to add the appropriate files created
by rpcgen to your compile command.
-
You cannot pass arrays as pointers to RPC functions! You must
declare them in the IDL format, so that it knows to call the
appropriate XDR functions.
-
You may place an upper bound on array lengths to simplify RPC
calls and parameter handling if necessary.
-
You may limit system resources (i.e., number of sources manageable
by a proxy) with a compile-time constant. (See the IDL
command const to have it show up as a #define in rpcgen generated code..
-
Release memory for return values to RPC clients
with xdr_free, otherwise you will have a certain memory
leak!
-
Destroy thread attribute objects that are created
when you spawn threads (with pthread_attr_destroy),
otherwise you will have another memory leak!
-
Every time you call rpcgen it will generate a set of
files. So if you've modified one (as suggested), remember
it could be overwritten! (If your modification consists of only
one additional line, as suggested, you should be able to
recover easily.)
Work To Be Turned In
This project is the largest and most comprehensive of the term. It is
highly recommended that you do not fall behind on the milestones
below, and that you continue working toward the next milestone as soon
as you have completed one. Do not wait until the last minute to work
on this project! By now, you will hopefully not be surprised at the
attention to detail required in systems programming.
-
Due Friday, December 5
-
Due Tuesday, December 9
-
Due Friday, December 12
In addition to the material requested above, follow the instructions
for submitting programs.
Jerod Weinman
Created on June 17, 2008